[Bug testsuite/78529] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2

2018-08-24 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529

Joey Ye  changed:

   What|Removed |Added

 CC||joey.ye at arm dot com

--- Comment #36 from Joey Ye  ---
Simply applying __attribute__((noipa)) to memset (and all other C lib
implementations) in chk.c prevents IPA analysis in the local implementation of
memset, thus avoids the issue when it is later replaced by a library copy.

The workaround does pass this case in my experiment, which can be turn into a
patch after additional work and testing. Is it an acceptable workaround to
upstream?

[Bug rtl-optimization/64082] virtual register elimination doing bad for local array

2016-08-19 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64082

Joey Ye  changed:

   What|Removed |Added

 CC||joey.ye at arm dot com

--- Comment #1 from Joey Ye  ---
Code sequence from trunk 20160705 is as following:
bar:
stp x29, x30, [sp, -48]!
add x29, sp, 0
stp x19, x20, [sp, 16]
add x19, x29, 32
mov w20, w0
mov x0, x19
bl  g
ldrbw0, [x19, w20, sxtw]
bl  f
ldp x19, x20, [sp, 16]
ldp x29, x30, [sp], 48

Apparantly issue described in this ticket has been resolved, probably by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173.

Resolved.

[Bug tree-optimization/60172] [4.9/5 Regression] ARM performance regression from trunk@207239

2015-03-13 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #26 from Joey Ye joey.ye at arm dot com ---
Regression disappeared from 4.9 branch since Aug 2014, though the problem
discussed here is not yet confirmed solved.


[Bug rtl-optimization/63718] [5 Regression] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-12-02 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Joey Ye joey.ye at arm dot com ---
Resolved in 218271, which is a work around. A new PR is expected to open for a
complete solution.


[Bug rtl-optimization/63718] [5 Regression] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-19 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #14 from Joey Ye joey.ye at arm dot com ---
Em. Probably a more favorable solution is fix expand_epilogue to precisely
elaborate the side effect?


[Bug rtl-optimization/63718] [5 Regression] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-09 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #9 from Joey Ye joey.ye at arm dot com ---
 
 Indeed, the patch is conservative, but that's not such a bad idea for a
 correctness fix. We can always folllow up with a more optimal patch.
Tom, are you going to submit this patch for review, or are you working on a
more optimal one? Better to have this conservative patch to recover the
bootstrap first.

- Joey


[Bug tree-optimization/63747] New: [5 regression] icf mis-compares switch gimple

2014-11-05 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747

Bug ID: 63747
   Summary: [5 regression] icf mis-compares switch gimple
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

ARM -Os bootstrap breaks. Root cause lies in ipa-icf-gimple.c where
compare_gimple_switch doesn't compare case numbers correctly. Will upload a
reduced test case soon.


[Bug tree-optimization/63747] [5 regression] icf mis-compares switch gimple

2014-11-05 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
/* { dg-options -O2 } */
/* { dg-do run } */

static int __attribute__((noinline))
foo(int i)
{
  switch (i)
  {
case 0:
case 1:
case 2:
case 3:
  return 0;
default:
  return 1;
  }
}

static int __attribute__((noinline))
bar(int i)
{
  switch (i)
  {
case 4:
case 5:
case 6:
case 7:
  return 0;
default:
  return 1;
  }
}

int main()
{
  return foo(0) + bar(4);
}


[Bug tree-optimization/63747] [5 regression] icf mis-compares switch gimple

2014-11-05 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747

--- Comment #3 from Joey Ye joey.ye at arm dot com ---
Created attachment 33906
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33906action=edit
/home/joeye01/patches/icf-switch-testcase-141105.patch

Test case patch


[Bug tree-optimization/63747] [5 regression] icf mis-compares switch gimple

2014-11-05 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747

--- Comment #4 from Joey Ye joey.ye at arm dot com ---
It actually fails on all targets.


[Bug rtl-optimization/63718] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #1 from Joey Ye joey.ye at arm dot com ---
Challenging to reduce a small case, as inlining impacts optimization behavior.
Trying to describe the problem as clear as possible.

Problemetic generated code:
mov r0, r10
mov r1, r3
mov r2, r8
str r3, [sp, #8]
bl  _ZL17bmp_iter_set_init
mov r3, r9
ldr r4, [r0, #12]   --- r0 is used after call, which is clobbered
implicitly

_ZL17bmp_iter_set_init:
...
pop {r4}
pop {r0} --- clobbering r0, which is implicit from RTL view
bx  r0

Prototype of bmp_iter_set_init:
static inline void  --- return void
bmp_iter_set_init (bitmap_iterator *bi, const_bitmap map,
   unsigned start_bit, unsigned *bit_no)

1. thumb1 (arch=armv4t) sometimes clobbers r0-r3 on return, with logic in
arm.c:thumb_exit

2. Behavior in 1 is implicit to RTL. A typical thumb1 return RTL will be 
(jump_insn (unspec_volatile [
(return)
] VUNSPEC_EPILOGUE))

3. Other RTLs in caller do not modifies all r0-r3

4. After 216365 copyprop_hardreg_forward_1 believes r0-r3 are not clobbered
during call. Bang!

Attached reduced RTL dump of cprop_hardreg and the previous pass.


[Bug rtl-optimization/63718] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
Created attachment 33871
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33871action=edit
Reduced rtl dump


[Bug rtl-optimization/63718] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #3 from Joey Ye joey.ye at arm dot com ---
Created attachment 33872
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33872action=edit
Reduced rtl dump previous pass


[Bug rtl-optimization/63718] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #4 from Joey Ye joey.ye at arm dot com ---
Created attachment 33873
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33873action=edit
Preprocessed testcase

Options to reproduce:
-march=armv4t -mthumb -O2


[Bug rtl-optimization/63718] [5 Regression] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #6 from Joey Ye joey.ye at arm dot com ---
(In reply to vries from comment #5)
 Could you try out the patch and see if it fixes things for you?
 
Tom, thanks for the quick action. Apparantly this patch should recover the
bootstrap. I will test it and come back to you (bootstraping thumb1 with qemu
takes hours!)

However, I think the fix is too conservative. There are plenty of chances that
r0-r3 will not be clobbered by return. For example armv6-m will pretty much
never uses r0-r3 implicitly.


[Bug rtl-optimization/63718] [5 Regression] ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-03 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

--- Comment #8 from Joey Ye joey.ye at arm dot com ---
(In reply to vries from comment #5)
 Created attachment 33874 [details]
 tentative patch, adds missing clobbers
This patch does recover thumb1 bootstrap

- Joey


[Bug rtl-optimization/63718] New: ARM Thumb1 bootstrap fail after fuse-caller-save info in cprop-hardreg

2014-11-02 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63718

Bug ID: 63718
   Summary: ARM Thumb1 bootstrap fail after fuse-caller-save info
in cprop-hardreg
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

architecture option: --with-arch=armv4t --with-mode=thumb
BOOT_CFLAGS=-O2 -g

(Stage2) Error message:
src/gcc/trunk/libgcc/libgcc2.c: In function ‘__mulvdi3’:
src/gcc/trunk/libgcc/libgcc2.h:209:20: internal compiler error: Segmentation
fault

First bad version: trunk@216365
Date:   Fri Oct 17 06:36:45 2014 +

Use fuse-caller-save info in cprop-hardreg

2014-10-17  Tom de Vries  t...@codesourcery.com

PR rtl-optimization/61605
* regcprop.c (copyprop_hardreg_forward_1): Use
regs_invalidated_by_this_call instead of regs_invalidated_by_call.

* gcc.target/i386/fuse-caller-save.c: Update addition check.  Add movl
absence check.

[Bug plugins/59335] Plugin doesn't build on trunk

2014-09-05 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #30 from Joey Ye joey.ye at arm dot com ---
Fixed in 214938


[Bug plugins/59335] Plugin doesn't build on trunk

2014-08-24 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #28 from Joey Ye joey.ye at arm dot com ---
Reopened as a new missing header is reported


[Bug libgcc/56846] _Unwind_Backtrace on ARM and noexcept

2014-08-22 Thread joey.ye at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846

--- Comment #5 from Joey Ye joey.ye at arm dot com ---
This issue was predicted back to when _Unwind_Backtrace was enabled on ARM.
http://gcc.gnu.org/ml/gcc/2007-08/msg00235.html

This will keep going if the personality routine returns _URC_FAILURE.
Do you need anything besides _URC_CONTINUE_UNWIND?  The personality
routine in libsupc++ for ARM will return _URC_HANDLER_FOUND even
during forced unwinding but that seems like a bug now that you've
given a meaning to _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND.


[Bug libgcc/56846] _Unwind_Backtrace on ARM and noexcept

2014-05-09 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 CC||joey.ye at arm dot com

--- Comment #1 from Joey Ye joey.ye at arm dot com ---
I cannot reproduce the issue you described with my arm-none-eabi GCC 4.7.4.
Here is my case:

#include stdio.h
void foo() noexcept
{
printf(This is in foo\n);
}

void bar() noexcept
{
printf(This is in bar\n);
foo();
}

int main()
{
bar();  
return 0;
}

When I break and stp at foo, here is the successful backtrace:
(gdb) back
#0  foo () at n.cpp:4
#1  0x81e0 in bar () at n.cpp:10
#2  0x81ec in main () at n.cpp:15

Can you please share a small case to reproduce the issue?


[Bug target/60169] [4.8/4.9 Regression] ICE ARM thumb1 handles far jump

2014-03-02 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Joey Ye joey.ye at arm dot com ---
Fixed in http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=208217


[Bug libgcc/60166] ARM default NAN encoding violates EABI

2014-02-28 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60166

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Joey Ye joey.ye at arm dot com ---
Fixed in trunk 208229


[Bug target/60169] [4.8/4.9 Regression] ICE ARM thumb1 handles far jump

2014-02-27 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
A fix is available here:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01306.html


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #10 from Joey Ye joey.ye at arm dot com ---
(In reply to rguent...@suse.de from comment #9)
 On Mon, 17 Feb 2014, joey.ye at arm dot com wrote:
 
 
 But that doesn't make sense - it means that -fdisable-tree-forwprop4
 should get numbers back to good speed, no?  Because that's the
 only change forwprop4 does.
-fdisable-tree-forwprop4 dooms other transformation and results slightly worse
code than before. So the number isn't back to the best. I think forwprop4 does
some good stuff here and disabling it isn't the solution.
 
 For completeness please base checks on r207316 (it contains a fix
 for the blamed revision, but as far as I can see it shouldn't make
 a difference for the testcase).
I'm playing with r207686 and it is the same for this case.
 
 Did you check whether my hackish patch fixes things?
I did with trunk 20140208. But it didn't make any difference to Proc_8


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #11 from Joey Ye joey.ye at arm dot com ---
Repost from another record. It is annoying that after commenting one record it
automatically jumps to the next.

Here is good expansion:
;; _41 = _42 * 4;

(insn 20 19 0 (set (reg:SI 126 [ D.5038 ])
(ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ])
(const_int 2 [0x2]))) -1
 (nil))

;; _40 = _2 + _41;

(insn 21 20 22 (set (reg:SI 136 [ D.5035 ])
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) -1
 (nil))

(insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ])
(plus:SI (reg:SI 136 [ D.5035 ])
(reg:SI 126 [ D.5038 ]))) -1
 (nil))


;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 29 28 30 (set (reg:SI 139)
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1
 (nil))

(insn 30 29 31 (set (reg:SI 140)
(plus:SI (reg:SI 139)
(reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 31 30 32 (set (reg/f:SI 141)
(plus:SI (reg:SI 140)
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

(insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

After cse1 140 can be replaced by 125, thus lead a series of transformation
make it much more efficient.

Here is bad expansion:
;; _40 = Arr_2_Par_Ref_22(D) + _12;

(insn 22 21 23 (set (reg:SI 138 [ D.5038 ])
(plus:SI (reg:SI 128 [ D.5038 ])
(reg:SI 121 [ D.5036 ]))) -1
 (nil))

(insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ])
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 138 [ D.5038 ]))) -1
 (nil))

;; _32 = _20 + 1000;

(insn 29 28 0 (set (reg:SI 124 [ D.5038 ])
(plus:SI (reg:SI 121 [ D.5036 ])
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 32 31 33 (set (reg:SI 141)
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 33 32 34 (set (reg/f:SI 142)
(plus:SI (reg:SI 141)
(reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

Here cse doesn't happen, resulting in less optimal insns. Reason why cse
doesn't happen is unclear yet.


[Bug tree-optimization/54742] Switch elimination in FSM loop

2014-02-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

--- Comment #36 from Joey Ye joey.ye at arm dot com ---
Please ignore previous comment as it shouldn't be here.


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-17 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #7 from Joey Ye joey.ye at arm dot com ---
(In reply to Richard Biener from comment #5)
 (In reply to Joey Ye from comment #4)
  -fdisable-tree-forwprop4 doesn't help. -fno-tree-ter makes it even worse.
 
 The former is strange because it's the only pass that does sth that is
 changed by the patch?  As said, make sure to include the fix for PR59993
 in your testing.
 
 Does -fno-tree-forwprop fix the regression?

I'm sorry what I meant was: -fdisable-tree-forwprop4 didn't make benchmark
faster. Actually with -fdisable-tree-forwprop4 both revision before/after
207239 get the same lower score.

207239 O2: low
207238 O2: high
207239 O2 -fdisable-tree-forwprop4: low
207238 O2 -fdisable-tree-forwprop4: low


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-17 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #8 from Joey Ye joey.ye at arm dot com ---
Here is tree dump and diff of 133t.forwprop4
  bb 2:
  Int_Index_4 = Int_1_Par_Val_3(D) + 5;
  Int_Loc.0_5 = (unsigned int) Int_Index_4;
  _6 = Int_Loc.0_5 * 4;
  _8 = Arr_1_Par_Ref_7(D) + _6;
  *_8 = Int_2_Par_Val_10(D);
  _13 = _6 + 4;
  _14 = Arr_1_Par_Ref_7(D) + _13;
  *_14 = Int_2_Par_Val_10(D);
  _17 = _6 + 60;
  _18 = Arr_1_Par_Ref_7(D) + _17;
  *_18 = Int_Index_4;
  pretmp_20 = Int_Loc.0_5 * 100;
  pretmp_2 = Arr_2_Par_Ref_22(D) + pretmp_20;
  _42 = (sizetype) Int_1_Par_Val_3(D);
  _41 = _42 * 4;
-  _40 = pretmp_2 + _41; // good
+  _12 = _41 + pretmp_20; // bad
+  _40 = Arr_2_Par_Ref_22(D) + _12;  // bad
  MEM[(int[25] *)_40 + 20B] = Int_Index_4;
  MEM[(int[25] *)_40 + 24B] = Int_Index_4;
  _29 = MEM[(int[25] *)_40 + 16B];
  _30 = _29 + 1;
  MEM[(int[25] *)_40 + 16B] = _30;
  _32 = pretmp_20 + 1000;
  _33 = Arr_2_Par_Ref_22(D) + _32;
  _34 = *_8;
-  _51 = _33 + _41;  // good
+  _16 = _41 + _32;  // bad
+  _51 = Arr_2_Par_Ref_22(D) + _16;  // bad

  MEM[(int[25] *)_51 + 20B] = _34;
  Int_Glob = 5;
  return;


[Bug tree-optimization/54742] Switch elimination in FSM loop

2014-02-17 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

--- Comment #35 from Joey Ye joey.ye at arm dot com ---
Here is good expansion:
;; _41 = _42 * 4;

(insn 20 19 0 (set (reg:SI 126 [ D.5038 ])
(ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ])
(const_int 2 [0x2]))) -1
 (nil))

;; _40 = _2 + _41;

(insn 21 20 22 (set (reg:SI 136 [ D.5035 ])
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) -1
 (nil))

(insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ])
(plus:SI (reg:SI 136 [ D.5035 ])
(reg:SI 126 [ D.5038 ]))) -1
 (nil))


;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 29 28 30 (set (reg:SI 139)
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1
 (nil))

(insn 30 29 31 (set (reg:SI 140)
(plus:SI (reg:SI 139)
(reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 31 30 32 (set (reg/f:SI 141)
(plus:SI (reg:SI 140)
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

(insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

After cse1 140 can be replaced by 125, thus lead a series of transformation
make it much more efficient.

Here is bad expansion:
;; _40 = Arr_2_Par_Ref_22(D) + _12;

(insn 22 21 23 (set (reg:SI 138 [ D.5038 ])
(plus:SI (reg:SI 128 [ D.5038 ])
(reg:SI 121 [ D.5036 ]))) -1
 (nil))

(insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ])
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 138 [ D.5038 ]))) -1
 (nil))

;; _32 = _20 + 1000;

(insn 29 28 0 (set (reg:SI 124 [ D.5038 ])
(plus:SI (reg:SI 121 [ D.5036 ])
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 32 31 33 (set (reg:SI 141)
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 33 32 34 (set (reg/f:SI 142)
(plus:SI (reg:SI 141)
(reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

Here cse doesn't happen, resulting in less optimal insns. Reason why cse
doesn't happen is unclear yet.


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-14 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
Created attachment 32131
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32131action=edit
The function that causes the regression

Attached Proc_8 from dhrystone, header file and good/bad.s

It is the only function that generated code diffs with/without the commit.


[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-14 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #4 from Joey Ye joey.ye at arm dot com ---
-fdisable-tree-forwprop4 doesn't help. -fno-tree-ter makes it even worse.


[Bug libgcc/60166] ARM default NAN encoding violates EABI

2014-02-13 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60166

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
(In reply to Ramana Radhakrishnan from comment #1)
 Isn't this a dup of PR59833.

It isn't. This one is only impacts QNAN.


[Bug tree-optimization/60172] New: ARM performance regression from trunk@207239

2014-02-13 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

Bug ID: 60172
   Summary: ARM performance regression from trunk@207239
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

Dhrystone on Cortex-M4 drops by 1.5% with this patch:

2014-01-29  Richard Biener  rguent...@suse.de

PR tree-optimization/58742
* tree-ssa-forwprop.c (associate_pointerplus): Rename to
associate_pointerplus_align.
(associate_pointerplus_diff): New function.
(associate_pointerplus): Likewise.  Call associate_pointerplus_align
and associate_pointerplus_diff.

* gcc.dg/pr58742-1.c: New testcase.
* gcc.dg/pr58742-2.c: Likewise.
* gcc.dg/pr58742-3.c: Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@207239

Options used: -O2 -fno-inline -fno-common


[Bug plugins/59335] Plugin doesn't build on trunk

2014-02-13 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #11 from Joey Ye joey.ye at arm dot com ---
Reopen per-requested.


[Bug libgcc/60166] New: ARM default NAN encoding violates EABI

2014-02-12 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60166

Bug ID: 60166
   Summary: ARM default NAN encoding violates EABI
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

#include stdio.h
#include string.h
#include math.h
int g;
float i = 0.0 ,j = 0.0 ;

int main()
{
float f = i / j;
memcpy(g, f, sizeof(g));
printf(f=%f, hex=%x\n, f, g);
return 0;
}

When built for ARM thumb1, result is:
f=nan, hex=7fff

While according to the RTABI
(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf)
section 4.1.1.1:

When not otherwise specified by IEEE 754, the result on an invalid operation
should be the quiet NaN bit pattern with only the most significant bit of the
significand set, and all other significand bits zero.

So current libgcc is violating ARM EABI.

I have a patch under testing.


[Bug target/60169] New: ICE ARM thumb1 handles far jump

2014-02-12 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169

Bug ID: 60169
   Summary: ICE ARM thumb1 handles far jump
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

Created attachment 32119
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32119action=edit
testcase

Trunk gcc 20140210:

arm-none-eabi-gcc -mthumb -fomit-frame-pointer -mthumb -fPIC -mcpu=cortex-m0
-mno-lra png.c -c

png.c: In function 'png_do_read_swap_alpha':
png.c:104:1: internal compiler error: in reload, at reload1.c:1058
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


[Bug target/60169] ICE ARM thumb1 handles far jump

2014-02-12 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60169

--- Comment #1 from Joey Ye joey.ye at arm dot com ---
Caused by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html, reason is
that stack layout shouldn't change during and after reload.

I have a patch fixing it under testing.


[Bug plugins/59335] Plugin doesn't build on trunk

2014-02-12 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Joey Ye joey.ye at arm dot com ---
Resolved in trunk


[Bug tree-optimization/59757] Unexpected VN_TOP in SSCVN

2014-01-20 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59757

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Joey Ye joey.ye at arm dot com ---
This is most likely a bug in mingw32 build for GCC 4.2.1, which is the compiler
that I used to build GCC running on Windows. If GCC is built with -O0 it passes
without this ICE.

Switching to later version of mingw32 tools solves it.


[Bug c/59884] New: Unexpected warning pragma GCC target

2014-01-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59884

Bug ID: 59884
   Summary: Unexpected warning pragma GCC target
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

Affected target: arm. (x86/x86_64 passes)
Affected version: trunk 20140109, 4.8, 4.7

~/cases/pragma $ cat p.c
#pragma GCC push_options
#pragma GCC optimize(O2)
int foo(int a){
return a+1;
}
#pragma GCC pop_options

~/cases/pragma $ arm-none-eabi-gcc p.c -c
p.c:6:9: warning: #pragma GCC target is not supported for this machine
[-Wpragmas]
 #pragma GCC pop_options


[Bug target/59884] Unexpected warning pragma GCC target

2014-01-19 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59884

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
(In reply to Andrew Pinski from comment #1)
 Comes from:
   if (p-target_binary != target_option_current_node)
 {
   (void) targetm.target_option.pragma_parse (NULL_TREE,
 p-target_binary);
   target_option_current_node = p-target_binary;
 }
 
 
 The front-end expects the target always to implement these target hooks it
 seems rather than the default.
 
 Really I think the arm back-end should implement them so that thumb2 code
 can be in the same source file as arm32 code and would help out LTO when
 people mix and match them.
It is a useful feature on ARM. I don't know why it isn't support now.

But this warning still need to be fixed as there are always some targets not
supportting this pragma.


[Bug tree-optimization/59757] Unexpected VN_TOP in SSCVN

2014-01-13 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59757

--- Comment #5 from Joey Ye joey.ye at arm dot com ---
Here are some debug and log that might help to investigate.

Following one is suspecious to me, .MEM_18 is said to be defined a stmt that
doesn't look like it should do
(gdb) call debug_tree((*cfun-gimple_df-ssa_names).m_vecdata[18] )
 ssa_name 0x8fcacf8
type void_type 0x89c08a0 void VOID
align 8 symtab 0 alias set -1 canonical type 0x89c08a0
pointer_to_this pointer_type 0x89c0900
visited var var_decl 0x8fcc720 .MEMdef_stmt _11 = _10 (258);

version 18

dump.pre:
SCC consists of: .MEM_18
SCC consists of: _12
Value numbering _12 stmt = _12 = d_2(D)-core.get_parameter;
then crash

Last good dump:
univision_ug2828gfeff01_init (struct CTL_GFX_SEP_DRIVER_t * d, int depth)
{
  struct CTL_GFX_DRIVER_t * _7;
  long int _9;
  long int (*Td17) (int) _10;
  long int _11;
  long int (*Td17) (int) _12;
  long int _13;

  bb 2:
  d_2(D)-core.get_parameter = univision_ug2828gfeff01_get_parameter;
  _7 = d_2(D)-core;
  ctl_gfx_driver = _7;
  MEM[(struct CTL_GFX_DRIVER_t *)d_2(D)].draw_pixel = 0B;
  _9 = univision_ug2828gfeff01_get_parameter (258);
  if (_9 == 8)
goto bb 3;
  else
goto bb 4;

  bb 3:
  d_2(D)-set_bounding = sep_set_bounding_8b;
  goto bb 9;

  bb 4:
  _10 = d_2(D)-core.get_parameter;
  _11 = _10 (258);
  if (_11 == 16)
goto bb 5;
  else
goto bb 6;

  bb 5:
  d_2(D)-set_bounding = sep_set_bounding_16b;
  goto bb 9;

  bb 6:
  _12 = d_2(D)-core.get_parameter;
  _13 = _12 (512);
  if (_13  255)
goto bb 7;
  else
goto bb 8;

  bb 7:
  d_2(D)-set_bounding = sep_set_bounding_16b;
  goto bb 9;

  bb 8:
  d_2(D)-set_bounding = sep_set_bounding_8b;

  bb 9:
  return;

}

Any hint to continue investigating?


[Bug tree-optimization/59757] New: Unexpected VN_TOP in SSCVN

2014-01-10 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59757

Bug ID: 59757
   Summary: Unexpected VN_TOP in SSCVN
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

Created attachment 31796
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31796action=edit
Reduced test case

target: arm-none-eabi
host: Only Windows (crossbuild with i586-mingw32msvc). The same revision
doesn't fail on Linux
GCC revision: trunk 20141009 and 4.8
Option: -O3

 arm-none-eabi-gcc -O3 foo.c
foo.c: In function 'univision_ug2828gfeff01_init':
foo.c:119:1: internal compiler error: tree check: expected ssa_name, have
var_de
cl in vn_reference_lookup_2, at tree-ssa-sccvn.c:1497
 univision_ug2828gfeff01_init(CTL_GFX_SEP_DRIVER_t *d, int depth)
 ^

foo.c:119:1: internal compiler error: Segmentation fault


[Bug tree-optimization/59757] Unexpected VN_TOP in SSCVN

2014-01-10 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59757

--- Comment #1 from Joey Ye joey.ye at arm dot com ---
foo.c: In function 'univision_ug2828gfeff01_init':
foo.c:119:1: internal compiler error: tree check: expected ssa_name, have
var_de
cl in vn_reference_compute_hash, at tree-ssa-sccvn.c:631
 univision_ug2828gfeff01_init(CTL_GFX_SEP_DRIVER_t *d, int depth)

Traced to SSCVN:
Breakpoint 1, vn_reference_compute_hash (vr1=0x798fa94)
at /xxx/src/gcc/gcc/tree-ssa-sccvn.c:598
598 in /xxx/src/gcc/gcc/tree-ssa-sccvn.c

Backtrace is 
vn_reference_lookup

(gdb) call debug_tree(vr1-vuse)
 var_decl 0x8fe4ba0 vn_top.15
type void_type 0x89c08a0 void VOID
align 8 symtab 0 alias set -1 canonical type 0x89c08a0
pointer_to_this pointer_type 0x89c0900
used ignored VOID file foo.c line 119 col 1
align 8

(gdb) call debug_tree(VN_TOP)
 var_decl 0x8fe4ba0 vn_top.15
type void_type 0x89c08a0 void VOID
align 8 symtab 0 alias set -1 canonical type 0x89c08a0
pointer_to_this pointer_type 0x89c0900
used ignored VOID file foo.c line 119 col 1
align 8

For some reason vr1-vuse is VN_TOP, which is unexpected. But I couldn't go
further without understanding SCCVN.

Anyone can help?


[Bug middle-end/59734] New: Simple strict-volatile-bitfields case not working

2014-01-08 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59734

Bug ID: 59734
   Summary: Simple strict-volatile-bitfields case not working
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

$ cat v.c
struct str {
volatile unsigned f1: 8;
};

int foo(struct str *a)
{
a-f1=sizeof(struct str);
}
$ arm-none-eabi-gcc -mthumb -mcpu=cortex-m3 -Os -fstrict-volatile-bitfields v.c
-S

4.6:
ldrr3, [r0, #0]  = Correct. Load word
movsr2, #4
bfir3, r2, #0, #8
strr3, [r0, #0]
bxlr

4.7:
ldrbr3, [r0, #0]@ zero_extendqisi2 = Incorrect. Load byte
movsr3, #4
strbr3, [r0, #0]
bxlr

4.8 and latest trunk (date 20140108, after Sandra and Bernd's fixes):
ldrbr3, [r0]@ zero_extendqisi2 = Incorrect. Load byte
movsr2, #4
bfir3, r2, #0, #8
strbr3, [r0]
bxlr


[Bug target/56997] Incorrect write to packed field when strict-volatile-bitfields enabled on aarch32

2014-01-07 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56997

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Joey Ye joey.ye at arm dot com ---
Resolved in trunk


[Bug lto/59582] LTO discards symbol that defined as weak elsewhere

2014-01-02 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59582

--- Comment #6 from Joey Ye joey.ye at arm dot com ---
duplication of https://sourceware.org/bugzilla/show_bug.cgi?id=15323


[Bug lto/59582] LTO discards symbol that defined as weak elsewhere

2013-12-25 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59582

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Joey Ye joey.ye at arm dot com ---
Thanks HJ. Binutils 2.24 does solve it.


[Bug lto/59582] LTO discards symbol that defined as weak elsewhere

2013-12-25 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59582

--- Comment #5 from Joey Ye joey.ye at arm dot com ---
HJ, do you know which patch fixed this issue? I might need to backport it into
local 2.23 branch.


[Bug lto/59582] LTO discards symbol that defined as weak elsewhere

2013-12-23 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59582

--- Comment #2 from Joey Ye joey.ye at arm dot com ---
Lastest binutils trunk still has this issue. I'm assuming 2.24 the same.


[Bug lto/59582] New: LTO discards symbol that defined as weak elsewhere

2013-12-22 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59582

Bug ID: 59582
   Summary: LTO discards symbol that defined as weak elsewhere
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

~/work/lto_startup_s/3 $ cat Makefile 
CC=gcc
CFLAGS=-flto
EXT_CFLAGS=
e : ext.o main.o
$(CC) $(CFLAGS) $(LDFLAGS) $^ -o $@
ext.o : ext.c
$(CC) $(EXT_CFLAGS) -c -o $@ $^

~/work/lto_startup_s/3 $ cat main.c
int callback() { return 0; }
int main() { return s_func(); }

~/work/lto_startup_s/3 $ cat ext.c
__attribute__((weak)) int callback() { return 1; }
int s_func() { return callback(); }
~/work/lto_startup_s/3 $ make -B
gcc  -c -o ext.o ext.c
gcc -flto   -c -o main.o main.c
gcc -flto  ext.o main.o -o e
`callback' referenced in section `.text' of ext.o: defined in discarded section
`.text' of main.o (symbol from plugin)
collect2: error: ld returned 1 exit status
make: *** [e] Error 1

~/work/lto_startup_s/3 $ gcc -v
gcc version 4.9.0 20131223 (experimental)

Hints to reproduce:
1. ext.c is compiled without lto
2. main.o shows before ext.o in command line


[Bug plugins/59335] New: Plugin doesn't build on trunk

2013-11-28 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59335

Bug ID: 59335
   Summary: Plugin doesn't build on trunk
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: plugins
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joey.ye at arm dot com

trunk 205454 breaks plugin on x86_64 and arm. When gcc is built and installed,
using it to build any plugin with

g++ -fPIC -g -O2 -shared -I `g++ -print-file-name=plugin`/include

will result as:

install/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/plugin/include/config/i386/i386-opts.h:37:24:
fatal error: stringop.def: No such file or directory
 #include stringop.def

install/lib/gcc/arm-none-eabi/4.9.0/plugin/include/builtins.def:881:29: fatal
error: chkp-builtins.def: No such file or directory

It looks that some header files needed for plugins are not installed correctly.
Plugins in testsuite pass as they use headers from GCC source tree.


[Bug target/56997] New: Incorrect write to packed field when strict-volatile-bitfields enabled on aarch32

2013-04-18 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56997



 Bug #: 56997

   Summary: Incorrect write to packed field when

strict-volatile-bitfields enabled on aarch32

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: joey...@arm.com

Target: arm/aarch32





Attached case fails when

Target: all arm aarch32

Optimization level:all optimization levels.

Version: Trunk 197955, 4.7, 4.6



But it passes:

- if -fno-strict-volatile-bitfields is specified, or

- on x86 even if -fstrict-volatile-bitfields is specified



For example:



arm-none-eabi-gcc -mthumb -mcpu=cortex-m0 -Os  a.c -o a.s -S



foo:

ldr r3, .L2

lsl r2, r0, #8  ; First byte of input r0 is truncated

ldr r0, [r3]

uxtbr0, r0

orr r0, r2

str r0, [r3]

bx  lr



Runtime output:

Write FAIL 0x20304



-

#include stdio.h

#include string.h

#include stdlib.h

#ifdef SHORT

#define test_type unsigned short

#define MAGIC 0x102u

#else

#define test_type unsigned int

#define MAGIC 0x1020304u

#endif



typedef struct s{

 unsigned char Prefix;

 test_type Type;

}__attribute((__packed__)) ss;



volatile ss v;

ss g;



void __attribute__((noinline))

foo (test_type u)

{

  v.Type = u;

}



test_type __attribute__((noinline))

bar (void)

{

  return v.Type;

}



int main()

{

  test_type temp;

  int err = 0;

  foo(MAGIC);

  memcpy(g, (void *)v, sizeof(g));

  if (g.Type != MAGIC)

{

  printf(Write FAIL 0x%x\n, g.Type);

  err ++;

}



  g.Type = MAGIC;

  memcpy((void *)v, g, sizeof(v));

  temp = bar();

  if (temp != MAGIC)

{

  printf(Read FAIL 0x%x\n, temp);

  err ++;

}

  return err;

}


[Bug target/56997] Incorrect write to packed field when strict-volatile-bitfields enabled on aarch32

2013-04-18 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56997



--- Comment #1 from Joey Ye joey.ye at arm dot com 2013-04-18 08:12:50 UTC ---

Quoted from

http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Code-Gen-Options.html#Code-Gen-Options:



-fstrict-volatile-bitfields

If the target requires strict alignment, and honoring the field type would

require violating this alignment, a warning is issued. If the field has packed

attribute, the access is done without honoring the field type. If the field

doesn't have packed attribute, the access is done honoring the field type. In

both cases, GCC assumes that the user knows something about the target hardware

that it is unaware of.



There are two issues in current behavior:

1. There is no warning when writing to packed fields requiring strict

alignment. Although there is a warning when reading it.

2. The write access to packed field still honors the field type.


[Bug target/56997] Incorrect write to packed field when strict-volatile-bitfields enabled on aarch32

2013-04-18 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56997



--- Comment #3 from Joey Ye joey.ye at arm dot com 2013-04-18 08:46:36 UTC ---

(In reply to comment #2)

 -fstrict-volatile-bitfields implementation is bogus, as I repeatedly said

 it should now piggy-back on DECL_BIT_FIELD_REPRESENTATIVE.  Note that

 in your testcase no bitfields are involved so how does it relate to

 -f[no-]strict-volatile-bitfields?  Isn't this simply a wrong-code bug

 (eventually caused by the bogus implementation of 
 -fstrict-volatile-bitfields)?



It also looks to me like a wrong-code bug caused by

-fstrict-volatile-bitfields.


[Bug tree-optimization/54742] Switch elimination in FSM loop

2013-03-07 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742



--- Comment #8 from Joey Ye joey.ye at arm dot com 2013-03-08 03:56:38 UTC ---

// A none loop case shows how minor changes impacts current jump thread

behavior



int foo(int state, int check)

{

switch (state) {

case 0:

state = 1;

zoo_0();

break;

case 1:

default:

state = 0;

zoo_1();

break;

}



if (!check) return 0;

//check++;  // Uncomment it results will disable jump thread

//check=foo();  // Uncomment it results will disable jump thread



switch (state) {

case 0:

bar_0();

break;

case 1:

default:

bar_1();

break;

}

return check;

}


[Bug target/54051] [4.7 Regression] Invalid alignment specifier generated for vld3_lane_* and vld3_dup_* intrinsics.

2013-02-04 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54051



Joey Ye joey.ye at arm dot com changed:



   What|Removed |Added



 CC||joey.ye at arm dot com



--- Comment #5 from Joey Ye joey.ye at arm dot com 2013-02-05 07:45:28 UTC ---

This issue also impacts ldrexh/ldrexb as assembler doesn't accept ldrexh r1,

[r0, #0]. Better to backport to 4.7.


[Bug target/54051] [4.7 Regression] Invalid alignment specifier generated for vld3_lane_* and vld3_dup_* intrinsics.

2013-02-04 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54051



--- Comment #6 from Joey Ye joey.ye at arm dot com 2013-02-05 07:48:48 UTC ---

(In reply to comment #5)

 This issue also impacts ldrexh/ldrexb as assembler doesn't accept ldrexh r1,

 [r0, #0]. Better to backport to 4.7.



and 4.6


[Bug lto/54933] 'builtin symbol' referenced in section ... defined in discarded section

2013-01-17 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54933



Joey Ye joey.ye at arm dot com changed:



   What|Removed |Added



 CC||joey.ye at arm dot com



--- Comment #1 from Joey Ye joey.ye at arm dot com 2013-01-18 06:33:20 UTC ---

This issue especially impacts retarget, where library functions are

retargeted to user implementations. Retarget is a common practice in

bare-metal development but runs into defined in discarded section with LTO

enabled.



Here is the retarget case:



# gcc version 4.8.0 20130115 (experimental) [trunk revision 195189] (GCC)

# GNU ld (GNU Binutils) 2.23.51.20130111

$ cat main.c

int main()

{

return puts(Hello);

}



// it works if following line is enabled

// __attribute__ ((used))

int _write(int c)

{ 

/* Do something */

return c; 

}

$ cat lib-a.c 

int puts(const char * str)

{

return _write(*str);

}

$ gcc -flto   -c -o lib-a.o lib-a.c

$ ar rv liba.a lib-a.o

r - lib-a.o

$ gcc main.c liba.a -flto --entry=main -nostdlib -o l

`_write' referenced in section `.text' of liba.a(lib-a.o): defined in discarded

section `.text' of /tmp/ccwUUKiA.o (symbol from plugin)

collect2: error: ld returned 1 exit status


[Bug rtl-optimization/55757] Suboptimal interrupt prologue/epilogue for ARMv7-M (Cortex-M3)

2012-12-20 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55757



Joey Ye joey.ye at arm dot com changed:



   What|Removed |Added



 CC||joey.ye at arm dot com



--- Comment #4 from Joey Ye joey.ye at arm dot com 2012-12-21 03:23:07 UTC ---

 An interrupt handler function (void something(void)), but without attribute,

 doing something inside (posts a FreeRTOS semaphore, calls vPortYieldFromISR()

 if it's needed) actually saves a lot of registers on entry:

 23b4:b507  push{r0, r1, r2, lr}

Pushing of scratch registers can be used to 

1. align stack, which Richard has explained

2. allocate stack frame, as a code size optimization of sub sp, #x



Explain with following example:

extern void bar(int *, int *);

void foo()

{

int a, b;

bar(a, b);

}

Built with -Os -mcpu=cortex-m3:

push {r0, r1, r2, lr} 



Here, pushing of r0 and r1 allocates a 8-byte frame for local variables.

Pushing of r2 is to make sp aligned to 8 bytes together with pushing lr. Values

of r0-r2 pushed to stack don't really matter.



But built with -O2:

push{lr}

sub sp, sp, #12



Former is better on code size, latter wins on performance. Hopefully this

explains everything.


[Bug rtl-optimization/55757] Suboptimal interrupt prologue/epilogue for ARMv7-M (Cortex-M3)

2012-12-20 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55757



--- Comment #5 from Joey Ye joey.ye at arm dot com 2012-12-21 03:32:21 UTC ---

However, there is room to improve both performance and stack consumption in

case of Os:



extern void bar(int *);



void foo()

{

int a;

bar(a);

}



Built with -mcpu=cortex-m3 -Os:

push{r0, r1, r2, lr}

addr0, sp, #4

blbar

pop{r1, r2, r3, pc}



Apparently it should be optimized to save 8 bytes of stack consumption and two

stores:

push{r0, lr}

movr0, sp

blbar

pop{r1, pc}


[Bug tree-optimization/54742] Switch elimination in FSM loop

2012-10-10 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742



--- Comment #3 from Joey Ye joey.ye at arm dot com 2012-10-10 07:37:15 UTC ---

Current jump-threading is too conservative to thread this case. Following

limits are what I observed by reading code:

1. It only thread around blocks that

  * Single entry

  * Multiple exit

  * No PHI nodes

  Even the simple case as forwarding block is excluded.



2. It only thread a block once. For blocks with more than one entries, it is

possible to be threaded more than one time too.


[Bug tree-optimization/54733] New: Missing opportunity to optimize endian independent load/store

2012-09-28 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54733



 Bug #: 54733

   Summary: Missing opportunity to optimize endian independent

load/store

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: joey...@arm.com





This case tries to load from memory in little-endian way, no matter what

endianess current target is.



On trunk 4.8 little-endian target, two byte loads should be able to combined

into a single 16-byte load.



int read_aux(void *, int);



int read_le()

{ 

unsigned char data[2]; 

read_aux(data, 2); 

return data[0] | (data[1]8);

}



Current Tree (optimized):

  unsigned char data[2];



  read_aux (data, 2);

  D.4064_1 = data[0];

  D.4065_2 = (int) D.4064_1;

  D.4066_3 = data[1];

  D.4067_4 = (int) D.4066_3;

  D.4068_5 = D.4067_4  8;

  D.4063_6 = D.4068_5 | D.4065_2;

  data ={v} {CLOBBER};

  return D.4063_6;



Expected Tree (optimized):

  unsigned char data[2];

  unsigned short *temp;

  unsigned in D.4064;



  read_aux (data, 2);

  temp=data;

  D.4064_1 = *temp;

  return D.4064_1;


[Bug tree-optimization/54742] New: Switch elimination in FSM loop

2012-09-28 Thread joey.ye at arm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742



 Bug #: 54742

   Summary: Switch elimination in FSM loop

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: joey...@arm.com





Following interesting case is reduced from a benchmark. It implements a FSM

with a loop and switch. There is opportunity to eliminate the switch since all

state transition is definite in compile time.





Source program:

---

int sum0, sum1, sum2, sum3;

int foo(const char * str)

{

int state=0;

const char *s=str;



for (; *s; s++)

{

char c=*s;

switch (state) {

case 0:

if (c == '+') state = 1;

else if (c != '-') sum0+=c;

break;

case 1:

if (c == '+') state = 2;

else if (c == '-') state = 0;

else sum1+=c;

break;

case 2:

if (c == '+') state = 3;

else if (c == '-') state = 1;

else sum2+=c;

break;

case 3:

if (c == '-') state = 2;

else if (c != '+') sum3+=c;

break;

default:

break;

}

}

return state;

}

---

Say, instead of setting state=1 and loop back to switch head, it can be

optimized to setting state=1, check loop end condition and jump directly to the

label of case_1. Thus the overhead of switch (either if-then-else or jump

table) is eliminated. On machine without sophisticate branch prediction, such

an optimization is even more appealing.



GCC trunk 4.8 doesn't have such a optimization.



Expected tree output in source form:

---

int sum0, sum1, sum2, sum3;

int foo(const char * str)

{

int state=0;

const char *s=str;

char c=*s;

if (!c) goto end;

state_0:

if (c == '+') {

state = 1;

if ((c=* (++s))!=0) goto state_1;   // Check loop end condition and go

directly to next state

else goto end;

}

else if (c != '-') sum0+=c;

if ((c=* (++s))!=0) goto state_0;

goto end;



state_1:

if (c == '+') {

state = 2;

if ((c=* (++s))!=0) goto state_2;

else goto end;

}

else if (c == '-') {

state = 0;

if ((c=* (++s))!=0) goto state_0;

else goto end;

}

else sum1+=c;

if ((c=* (++s))!=0) goto state_1;

goto end;



state_2:

if (c == '+') {

state = 3;

if ((c=* (++s))!=0) goto state_3;

else goto end;

}

else if (c == '-') {

state = 1;

if ((c=* (++s))!=0) goto state_1;

else goto end;

}

else sum1+=c;

if ((c=* (++s))!=0) goto state_2;

goto end;



state_3:

if (c == '-') {

state = 2;

if ((c=* (++s))!=0) goto state_2;

else goto end;

}

else if (c != '+') sum3+=c;

if ((c=* (++s))!=0) goto state_3;

end:

return state;

}

---


[Bug middle-end/51200] Wrong code sequence to store restrict volatile bitfield

2011-12-20 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51200

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #4 from Joey Ye joey.ye at arm dot com 2011-12-21 04:34:57 UTC ---
Fixed in trunk 182545


[Bug middle-end/51200] Wrong code sequence to store restrict volatile bitfield

2011-11-21 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51200

--- Comment #2 from Joey Ye joey.ye at arm dot com 2011-11-22 03:58:29 UTC ---
Here is a test case fix.

With this patch, backend part of Bernd's original patch can be skipped. Thus
DJ's concern of unnecessary change can be addressed.

Also this test case intends to warn a situation that is incompatible to abi
version 1, -fstrict-volatile-bitfields happenly hides the incompatibility. IMHO
it is consultable to claim strict volatile bitfields violates version 1. So
fixing the test case and make it work as intended is more reasonable to me.


--- a/gcc/testsuite/g++.dg/abi/bitfield12.C
+++ b/gcc/testsuite/g++.dg/abi/bitfield12.C
@@ -1,4 +1,4 @@
-// { dg-options -Wabi -fabi-version=1 }
+// { dg-options -Wabi -fabi-version=1 -fno-strict-volatile-bitfields }

 struct S { // { dg-warning ABI }
   char c : 1024; // { dg-warning width }

 struct S { // { dg-warning ABI }
   char c : 1024; // { dg-warning


[Bug middle-end/51200] New: Wrong code sequence to store restrict volatile bitfield

2011-11-17 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51200

 Bug #: 51200
   Summary: Wrong code sequence to store restrict volatile
bitfield
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: joey...@arm.com


Trunk 179074 generates wrong code sequence with -fstrict-volatile-bitfields on
ARM and x86. 

ARM AAPCS default enable strict volatile bitfields so it is critical on ARM:

/* { dg-do run } */
/* { dg-options -fstrict-volatile-bitfields } */

extern void abort(void);
struct thing {
  volatile unsigned short a: 8;
  volatile unsigned short b: 8;
} t = {1,2};

int main()
{
  t.a = 3;
  if (t.a !=3 || t.b !=2) abort();
  return 0;
}


[Bug middle-end/51200] Wrong code sequence to store restrict volatile bitfield

2011-11-17 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51200

--- Comment #1 from Joey Ye joey.ye at arm dot com 2011-11-18 02:23:17 UTC ---
A patch is available at http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00217.html
but is pending for about 1 year.

Latest discussion is at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01623.html


[Bug target/49437] interrupt return pop sometimes corrupts sp

2011-08-02 Thread joey.ye at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49437

Joey Ye joey.ye at arm dot com changed:

   What|Removed |Added

 CC||joey.ye at arm dot com

--- Comment #2 from Joey Ye joey.ye at arm dot com 2011-08-03 00:47:10 UTC ---
A patch and test case is available at
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00244.html