[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign

2009-03-12 Thread Joey dot ye at intel dot com


--- Comment #47 from Joey dot ye at intel dot com  2009-03-12 06:51 ---
(In reply to comment #46)
 Created an attachment (id=17444)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444action=view) [edit]
 gcc.target/i386/stackalign/longlong-2.c for -mnostackalign on darwin10
 /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc
 -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/
 /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c
 -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s
That's because MacOS require stack alignment to 16 byte when making call and
ignores -mpreferred-stack-boundary=2. These cases should skipped for MacOS.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137



[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign

2009-03-03 Thread Joey dot ye at intel dot com


--- Comment #35 from Joey dot ye at intel dot com  2009-03-04 01:41 ---
(In reply to comment #32)
 I don't see the reason for  optimize_function_for_size_p (cfun), care to 
 back
 up with benchmarks that forcing dynamic realignment for long long variables
 with -mpreferred-stack-boundary=2 improves performance rather than slows 
 things
 down (because of the dynamic realignment)?
Checking optimize_function_for_size_p is to avoid prologue/epilogue code size
increase when -Os is used, which is initially complained by Jakub.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137



[Bug middle-end/39315] Unaligned move used on aligned stack variable

2009-02-26 Thread Joey dot ye at intel dot com


--- Comment #3 from Joey dot ye at intel dot com  2009-02-27 02:53 ---
(In reply to comment #2)
 Created an attachment (id=17368)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17368action=view) [edit]
 A patch
 Does this patch make sense?
It works fine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39315



[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign

2009-02-22 Thread Joey dot ye at intel dot com


--- Comment #31 from Joey dot ye at intel dot com  2009-02-23 03:15 ---
How about this patch?
1. Only reduce DI mode when -Os
2. Ignore TYPE_USER_ALIGN, so that stack realign happens for case in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137#c28, which IMHO is
acceptable.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 5221)
+++ config/i386/i386.c  (working copy)
@@ -19607,6 +19607,13 @@
 ix86_local_alignment (tree type, enum machine_mode mode,
  unsigned int align)
 {
+  /* We don't want to align DImode to 64bit for compilation with
+ -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment
+ prologue.  */
+  if (mode == DImode  !TARGET_64BIT  ix86_preferred_stack_boundary  64
+   optimize_function_for_size_p (cfun))
+align = 32;
+
   /* If TYPE is NULL, we are allocating a stack slot for caller-save
  register in MODE.  We will return the largest alignment of XF
  and DF.  */
@@ -19616,6 +19623,12 @@
align = GET_MODE_ALIGNMENT (DFmode);
   return align;
 }
+  if (!TARGET_64BIT
+   optimize_function_for_size_p (cfun)
+   align == 64
+   ix86_preferred_stack_boundary  64
+   (mode == DImode || (type  TYPE_MODE (type) == DImode)))
+align = 32;

   /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
  to 16byte boundary.  */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137



[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign

2009-02-17 Thread Joey dot ye at intel dot com


--- Comment #20 from Joey dot ye at intel dot com  2009-02-17 09:18 ---
(In reply to comment #19)
 Just for the record, here is an unsuccessful attempt to avoid stack 
 realignment
 just because of DImode for -m32 or because of DFmode at -m32 -Os.  This patch
 unfortunately caused a handful regressions, like 20020220-1.c.
Is it OK to enable this patch with a new option? Defaultly not to realign a
mode (DImode) to its nature boundary is confusing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137



[Bug target/39146] Unnecessary stack alignment

2009-02-16 Thread Joey dot ye at intel dot com


--- Comment #12 from Joey dot ye at intel dot com  2009-02-16 08:49 ---
Created an attachment (id=17305)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17305action=view)
New patch attached

Test finished. No regression with emx_avx_sim. Wait to checkin to 4.5


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

  Attachment #17283|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39146] Unnecessary stack alignment

2009-02-12 Thread Joey dot ye at intel dot com


--- Comment #10 from Joey dot ye at intel dot com  2009-02-12 15:20 ---
(In reply to comment #8)
 We still have push and mov. I guess it may be the best we can do.
 But please run full 32 and 64bit testsuite with your patch as well
 as under emx-avx-sim.
full 32/64 bit test pass with no regression {-m32, -m32 -mstackrealign
-mpreferred-stack-boundary=4, -m64}. Haven't tested emx-avx-sim test yet.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39146] Unnecessary stack alignment

2009-02-11 Thread Joey dot ye at intel dot com


--- Comment #5 from Joey dot ye at intel dot com  2009-02-12 01:45 ---
Stack realign is finalized by
stack_realign = (incoming_stack_boundary
 (current_function_is_leaf
   ? crtl-max_used_stack_slot_alignment
   : crtl-stack_alignment_needed));
since bar is leaf function, it checks max_used_stack_slot_alignment.

According to it's definition, max_used_stack_slot_alignment is   /* The largest
alignment of slot allocated on the stack.  */. Parameter x isn't allocated on
local stack, so max_used_stack_slot_alignment shouldn't be set to 256 bits.

In locate_and_pad_parm,
  if (crtl-max_used_stack_slot_alignment  crtl-stack_alignment_needed)
crtl-max_used_stack_slot_alignment = crtl-stack_alignment_needed;
sets max_used_stack_slot_alignment to 256 bits, which seems shouldn't happen
all the time.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39146] Unnecessary stack alignment

2009-02-11 Thread Joey dot ye at intel dot com


--- Comment #7 from Joey dot ye at intel dot com  2009-02-12 02:26 ---
Created an attachment (id=17283)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17283action=view)
A patch to fix this problem

Impact to other test unknown. Test undergoing.

HJ, can you also help to verify and test this patch?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39148] -Os increase code size when stack is aligned

2009-02-11 Thread Joey dot ye at intel dot com


--- Comment #6 from Joey dot ye at intel dot com  2009-02-12 02:33 ---
(In reply to comment #5)
 If ACCUMULATE_OUTGOING_ARGS is off, ECX will be used
 for stack alignment and it may lead to code size
 increase due to register spill since ia32 has very
 few registers.
The code increase resulted from stack realign are mainly from prologue
increase. ECX is only used as hard register in prologue/epilogue and the impact
to function body is low.

If ACCUMULATE_OUTGOING_ARGS does increase code size, then for big functions,
benefit of !ACCUMULATE_OUTGOING_ARGS will offset increase of prologue/epilogue.

So simply enable ACCUMULATE_OUTGOING_ARGS for stack realign isn't be the best
option for all cases either.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39148



[Bug target/39146] Unnecessary stack alignment

2009-02-11 Thread Joey dot ye at intel dot com


--- Comment #9 from Joey dot ye at intel dot com  2009-02-12 02:40 ---
(In reply to comment #8)
 We still have push and mov. I guess it may be the best we can do.
I believe so too.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign

2009-02-10 Thread Joey dot ye at intel dot com


--- Comment #10 from Joey dot ye at intel dot com  2009-02-11 01:03 ---
(In reply to comment #9)
 Created an attachment (id=17279)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279action=view) [edit]
 A patch to add a new -malign-double= option
This patch looks OK to me.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137



[Bug target/39146] Unnecessary stack alignment

2009-02-09 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2009-02-10 05:35 ---
Argument need 32 bytes alignment, No way to guarantee the argument won't be
spilled. That's why stack adjustment is there. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146



[Bug target/39082] union with long double doesn't follow x86-64 psABI

2009-02-03 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2009-02-04 02:17 ---
GCC doesn't follow x86-64 psABI on this case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39082



[Bug target/38952] [4.4 Regression] EH does not work.

2009-01-26 Thread Joey dot ye at intel dot com


--- Comment #20 from Joey dot ye at intel dot com  2009-01-26 11:49 ---
(In reply to comment #10)
 This is caused by stack alignment change, revision 138335. Joey and
 Xuepeng will look into it after holiday, Feb. 1.
This must be stack alignment change. Looks we didn't handle stack unwinding on
Cygwin correctly.

Dave, comparing the the EH mechanism in Linux, what's the difference of SjLj EH
in Cygwin? Answer to this question might help solving the problem sooner.

Thanks - Joey


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38952



[Bug target/38899] pessimizes function without SSE intrinsics

2009-01-20 Thread Joey dot ye at intel dot com


--- Comment #2 from Joey dot ye at intel dot com  2009-01-21 02:40 ---
Following case isn't vecterized with -O3 on x86_64 either, although arrays are
aligned:
#include stdio.h

float __attribute__((aligned(16))) in1[] = {
1.2, 3.5, 1.7, 2.8
};
float __attribute__((aligned(16))) in2[] = {
-0.7, 2.6, 3.3, -4.0
};
float __attribute__((aligned(16))) out[4]; 
void __attribute__((noinline)) mul()
{
int i;
for (i = 0; i  4; i++)
out[i] = in1[i] * in2[i];
}

int main(void)
{
mul();
printf(%f %f %f %f\n, out[0], out[1], out[2], out[3]);
return 0;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38899



[Bug tree-optimization/38785] huge performance regression on EEMBC bitmnp01

2009-01-14 Thread Joey dot ye at intel dot com


--- Comment #7 from Joey dot ye at intel dot com  2009-01-14 10:08 ---
(In reply to comment #5)
 Joern, re. comment #4, Richi refers to my patch to enable PRE at -Os, see 
 [1]. 
 An extension to this patch that we tested on x86 machines, is to disable PRE
 for scalar integer registers, via SMALL_REGISTER_CLASSES.  I changed
 SMALL_REGISTER_CLASSES into a target hook for this purpose, see [2]. You could
 play with this, see if you can use this to cure your problem...
 [1] http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00199.html
 [2] http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00590.html
Reproduced on x86. But I fail to build with patch [2] on x86_64, anything
wrong?
../../src/gcc/target-def.h:476:1: error: unterminated #ifndef
../../src/gcc/c-common.c:8197: error: 'TARGETCM_INITIALIZER' undeclared here
(not in a function)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785



[Bug target/38736] [4.4 Regression] -mavx can change the ABI via BIGGEST_ALIGNMENT

2009-01-06 Thread Joey dot ye at intel dot com


--- Comment #5 from Joey dot ye at intel dot com  2009-01-07 02:45 ---
More places with BIGGEST_ALIGN:
$ grep -r (aligned) .|grep attribute|grep -v testsuite|grep -v texi
./libstdc++-v3/libsupc++/eh_alloc.cc:typedef char
one_buffer[EMERGENCY_OBJ_SIZE] __attribute__((aligned));
./libjava/exception.cc:  char end[0] __attribute__((aligned));
./libjava/exception.cc:__attribute__((aligned));
./gcc/unwind-sjlj.c:  jmp_buf jbuf __attribute__((aligned));


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38736



[Bug target/33604] [4.3/4.4 Regression] Revision 119502 causes significantly slower results with 4.3/4.4 compared to 4.2

2008-12-29 Thread Joey dot ye at intel dot com


--- Comment #45 from Joey dot ye at intel dot com  2008-12-30 01:49 ---
(In reply to comment #44)
 Does anyone have new numbers?
Fixed on both i386/x86_64:
x86_64:
4.4 (trunk 142847): 5.4s
4.3.2 release:  5.4s
4.2.4 release:  5.4s

i386:
4.4 (trunk 142847): 2.7s
4.3.2 release:  2.8s
4.2.4 release:  2.7s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604



[Bug rtl-optimization/37397] [4.4 Regression] IRA performance impact on SPEC CPU 2K/2006

2008-12-29 Thread Joey dot ye at intel dot com


--- Comment #6 from Joey dot ye at intel dot com  2008-12-30 02:50 ---
(In reply to comment #4)
 Revision 141860 caused 30% slowdown on 454.calculix in SPEC CPU 2006
 with -O2 -ffast-math on Linux/Intel64.
This regression has been fixed in some revision between 142187 and 142212.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37397



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-12-09 Thread Joey dot ye at intel dot com


--- Comment #12 from Joey dot ye at intel dot com  2008-12-10 03:01 ---
Fixed at trunk 142631


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006

2008-11-30 Thread Joey dot ye at intel dot com


--- Comment #8 from Joey dot ye at intel dot com  2008-12-01 02:18 ---
Yes. It fixes 416/481 on 32 bits and 481 on 64 bits.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280



[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006

2008-11-28 Thread Joey dot ye at intel dot com


--- Comment #6 from Joey dot ye at intel dot com  2008-11-28 15:11 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01428.html fixed this
regression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280



[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006

2008-11-27 Thread Joey dot ye at intel dot com


--- Comment #4 from Joey dot ye at intel dot com  2008-11-28 03:39 ---
142250 doesn't fix this regression. 416.gamess and 481.wrf still fail.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-21 Thread Joey dot ye at intel dot com


--- Comment #8 from Joey dot ye at intel dot com  2008-11-21 12:00 ---
In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any
option combination from both sets should be prohibited.

Please add more options into these set in case I missed any.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass

2008-10-27 Thread Joey dot ye at intel dot com


--- Comment #23 from Joey dot ye at intel dot com  2008-10-28 01:19 ---
(In reply to comment #22)
 Created an attachment (id=16571)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571action=view) [edit]
 A patch to re-enable regmove
 After applying this patch to re-enable regmove, I got
 [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira 
 -fno-ira -m32
HJ, is your foo.c the case attached in comment #18?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364



[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass

2008-10-24 Thread Joey dot ye at intel dot com


--- Comment #18 from Joey dot ye at intel dot com  2008-10-24 08:36 ---
Created an attachment (id=16536)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536action=view)
Reduced performance case from cpu2006/454.calculix

50% regression with IRA core2 on trunk revsion 140514 and 141335

$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20081024 (experimental) [trunk revision 141335] (GCC) 
$ gcc -m32 -O2 -mssse3 -mfpmath=sse 36.c
$ time -p ./a.out
real 7.97
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -o core2.exe 36.c
$ time -p ./core2.exe
real 12.27
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -fno-ira -o no-ira.exe 36.c
$ time -p ./no-ira.exe
real 8.03


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364



[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass

2008-10-24 Thread Joey dot ye at intel dot com


--- Comment #21 from Joey dot ye at intel dot com  2008-10-25 04:14 ---
To me scheduler is irrelevant here. GCC has no core2 pipeline description so
the instruction scheduling doesn't looks optimized. But for OOO processor like
core2, IMHO scheduling shouldn't make that much difference. Also core2 + no-ira
doesn't hurt, which means core2 scheduling is not the root cause.

Instead old code uses different register for loading, but IRA code always uses
xmm7 as load target. Need to figure out two questions:
1. why instructions from core2+ira runs slower than ira?
2. why core2+ira generate so different code as non-core2?

Scheduler dump for core2:
;;  insn  codebb   dep  prio  cost   reservation
;;    --   ---       ---
;;  10847 4 0 0 0   nothing : 70 109 43
;;   43   102 4 1 0 0   nothing : 70 51 117 114 67 109
;;  10947 4 2 0 0   nothing : 70 44
;;   44   102 4 1 0 0   nothing : 70 57 55 59 67
;;   45   102 4 0 0 0   nothing : 70 65 67 112 110
;;   46   102 4 0 0 0   nothing : 70 55 49 67 61
;;  110   102 4 1 0 0   nothing : 70 65 61
;;   61   720 4 2 0 0   nothing : 70 55 62
;;   62   720 4 1 0 0   nothing : 70 47 111


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364



[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass

2008-10-23 Thread Joey dot ye at intel dot com


--- Comment #17 from Joey dot ye at intel dot com  2008-10-23 08:42 ---
CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on
Core2 ix86:
 IRAIRA_core2   NO_IRA_core2
454.calculix 1.00   0.901.01

Revision: trunk 140514

Options in detail:
IRA= -m32 -O2 -mssse3 -mfpmath=sse
IRA_core2= $IRA -mtune=core2
NO_IRA_core2= $IRA_core2 -fno-ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364



[Bug rtl-optimization/37571] New: Performance regression with -mtune=core2

2008-09-18 Thread Joey dot ye at intel dot com
On Core2 ix86 machine following case (reduced from cpu2000.mcf) runs 50% slower
if compiled with trunk -mtune=core2 -O2

unsigned int g_i,g_j;
unsigned int g_a=1,g_b;
void __attribute__((noinline)) foo()
{
do {
if (g_a  g_i)
{g_i++;}
else
{g_j++;}
 } while (g_b--);
}

int main()
{
int i;
for (i=0; i4; i++)
{
g_b=0x7fff;
foo();
}
return 0;
}


-- 
   Summary: Performance regression with -mtune=core2
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571



[Bug rtl-optimization/37571] Performance regression with -mtune=core2

2008-09-18 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-09-18 16:01 ---
Root cause is that instruction length of fused jcc is set to 16, which prevent
the block from merging and copying. For some reason Core2 runs poorly with a
unmerged branch block under certain circonstances.

Following patch fixes it:

Index: i386.md
===
--- i386.md (revision 3923)
+++ i386.md (working copy)
@@ -421,6 +421,9 @@
 ]
 (const_int 1)))

+(define_attr length_jcc_fuse 
+  (const_int 0))
+
 ;; The (bounding maximum) length of an instruction in bytes.
 ;; ??? fistp and frndint are in fact fldcw/{fistp,frndint}/fldcw sequences.
 ;; Later we may want to split them and compute proper length as for
@@ -442,7 +445,8 @@
   (plus (attr prefix_rep)
 (plus (attr prefix_data16)
   (plus (attr length_immediate)
-(attr length_address)))
+(plus (attr length_address)
+   (attr length_jcc_fuse

 ;; The `memory' attribute is `none' if no memory is referenced, `load' or
 ;; `store' if there is a simple memory reference therein, or `unknown'
@@ -645,7 +649,7 @@
 (include k6.md)
 (include athlon.md)
 (include geode.md)
-;;(include core2.md)
+(include core2.md)


 ;; Operand and operator predicates and constraints
@@ -14033,7 +14037,8 @@
   return test{imodesuffix}\t%2, %2\n\t
 %+j%E1\t%l0\t ASM_COMMENT_START  fused;
 }
-  [(set_attr type multi)
+  [(set_attr type icmp)
+   (set_attr length_jcc_fuse 2)
(set_attr mode MODE)])

 (define_insn *jcc_fused_2
@@ -14048,7 +14053,8 @@
   return test{imodesuffix}\t%2, %2\n\t
 %+j%e1\t%l0\t ASM_COMMENT_START  fused;
 }
-  [(set_attr type multi)
+  [(set_attr type icmp)
+   (set_attr length_jcc_fuse 2)
(set_attr mode MODE)])

 (define_insn *jcc_fused_3
@@ -14066,7 +14072,8 @@
   return cmp{imodesuffix}\t{%3, %2|%2, %3}\n\t
 %+j%E1\t%l0\t ASM_COMMENT_START  fused;
 }
-  [(set_attr type multi)
+  [(set_attr type icmp)
+   (set_attr length_jcc_fuse 2)
(set_attr mode MODE)])

 (define_insn *jcc_fused_4
@@ -14084,7 +14091,8 @@
   return cmp{imodesuffix}\t{%3, %2|%2, %3}\n\t
 %+j%e1\t%l0\t ASM_COMMENT_START  fused;
 }
-  [(set_attr type multi)
+  [(set_attr type icmp)
+   (set_attr length_jcc_fuse 2)
(set_attr mode MODE)])

 ;; In general it is not safe to assume too much about CCmode registers,


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571



[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions

2008-08-28 Thread Joey dot ye at intel dot com


--- Comment #11 from Joey dot ye at intel dot com  2008-08-28 06:14 ---
(In reply to comment #4)
 We got
   Running 416.gamess ref base lnx32-gcc default
 416.gamess: copy #0 non-zero return code (rc=0, signal=11)
 416.gamess: copy #0 non-zero return code (rc=0, signal=11)
 416.gamess: copy #0 non-zero return code (rc=0, signal=11)
 We will try to find a small testcase.
Small case available:
$ cat case.f
  SUBROUTINE SCHMD(V,M,N,LDV)
  IMPLICIT DOUBLE PRECISION(A-H,O-Z)
  LOGICAL GOPARR,DSKWRK,MASWRK
  DIMENSION V(LDV,N)
  COMMON /IOFILE/ IR,IW,IP,IS,IPK,IDAF,NAV,IODA(400)
  COMMON /PAR   / ME,MASTER,NPROC,IBTYP,IPTIM,GOPARR,DSKWRK,MASWRK
  PARAMETER (ZERO=0.0D+00, ONE=1.0D+00, TOL=1.0D-10)
  IF (M .EQ. 0) GO TO 180
  DO 160 I = 1,M
  DUMI = ZERO
  DO 100 K = 1,N
  100 DUMI = DUMI+V(K,I)*V(K,I)
  DUMI = ONE/ SQRT(DUMI)
  DO 120 K = 1,N
  120 V(K,I) = V(K,I)*DUMI
  IF (I .EQ. M) GO TO 160
  I1 = I+1
  DO 140 J = I1,M
  DUM = -DDOT(N,V(1,J),1,V(1,I),1)
  CALL DAXPY(N,DUM,V(1,I),1,V(1,J),1)
  140 CONTINUE
  160 CONTINUE
  IF (M .EQ. N) RETURN
  180 CONTINUE
  I = M
  J = 0
  200 I0 = I
  I = I+1
  IF (I .GT. N) RETURN
  220 J = J+1
  IF (J .GT. N) GO TO 320
  DO 240 K = 1,N
  240 V(K,I) = ZERO
  CALL DAXPY(N,DUM,V(1,II),1,V(1,I),1)
  260 CONTINUE
  DUMI = ZERO
  DO 280 K = 1,N
  280 DUMI = DUMI+V(K,I)*V(K,I)
  IF ( ABS(DUMI) .LT. TOL) GO TO 220
  DO 300 K = 1,N
  300 V(K,I) = V(K,I)*DUMI
  GO TO 200
  320 END
  program main
  DOUBLE PRECISION V
  DIMENSION V(18, 18)
  common // v

  call schmd(V, 1, 18, 18)
  end

  subroutine DAXPY
  end

  FUNCTION DDOT ()
  DOUBLE PRECISION DDOT
  DDOT = 1
  end

$ gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080826 (experimental) [trunk revision 139590] (GCC) 
$ gfortran -O2 -o case.exe case.f -m32
$ ./case.exe
Segmentation fault

$ gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080826 (experimental) [trunk revision 139589] (GCC) 
$ gfortran -O2 -o case.exe case.f -m32
$ ./case.exe
$


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243



[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions

2008-08-27 Thread Joey dot ye at intel dot com


--- Comment #7 from Joey dot ye at intel dot com  2008-08-27 08:07 ---
Created an attachment (id=16155)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16155action=view)
Test case from 2006.434.zeusmp

Though fail to extract a smaller case, hopeful it helpful.

Compile with gfortran -c -O2 -DSPEC_CPU_LP64 tranx1.f -S -fdump-rtl-all -g.
Miscompile in revision 139590.

In IRA dump file, I believe following suspicious RTL is the cause of segfault:
(insn 886 885 893 35 tranx1.f:570 (set (reg:DI 0 ax [orig:123 D.3215 ] [123])
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -104 [0xff98])) [68 D.3215+0 S8 A64]))
89 {*movdi_1_rex64} (nil))

(insn 893 886 896 35 tranx1.f:570 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -104 [0xff98])) [68 ivtmp.160+0 S8 A64])
(reg/f:DI 3 bx [orig:159 ivtmp.160 ] [159])) 89 {*movdi_1_rex64} (nil))
D.3215 and ivtmp.160 shares the spill space (%rsp-104), where as D.3215 and
ivtmp.160 has overlapped liverange.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243



[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions

2008-08-27 Thread Joey dot ye at intel dot com


--- Comment #8 from Joey dot ye at intel dot com  2008-08-27 08:11 ---
GDB output:
(gdb)  b tranx1_
Breakpoint 1 at 0x43a670
(gdb)  r

Breakpoint 1, 0x0043a670 in tranx1_ ()
(gdb)  b *0x43accd
Breakpoint 2 at 0x43accd
(gdb)  b *0x43acf4
Breakpoint 3 at 0x43acf4
(gdb)  b *0x43ad2f
Breakpoint 4 at 0x43ad2f
(gdb)  c

Breakpoint 2, 0x0043accd in tranx1_ ()
(gdb)  x 0x43accd
0x43accd tranx1_+1629:mov0xff98(%rsp),%rcx
(gdb)  c

Breakpoint 3, 0x0043acf4 in tranx1_ ()
(gdb)  x 0x43acf4
0x43acf4 tranx1_+1668:lea0x160603e8(,%rcx,8),%rbx
(gdb)  i r rcx
rcx0x5  5
(gdb)  c

Breakpoint 4, 0x0043ad2f in tranx1_ ()
(gdb)  x 0x43ad2f
0x43ad2f tranx1_+1727:mov%rbx,0xff98(%rsp) 
// RTL #893 Suspicious
(gdb)  i r rbx
rbx0x16060410   369493008
(gdb)  c

Breakpoint 2, 0x0043accd in tranx1_ ()
(gdb)  x 0x43accd
0x43accd tranx1_+1629:mov0xff98(%rsp),%rcx
(gdb)  c

Breakpoint 3, 0x0043acf4 in tranx1_ ()
(gdb)  x 0x43acf4
0x43acf4 tranx1_+1668:lea0x160603e8(,%rcx,8),%rbx
(gdb)  i r rcx
rcx0x16060410   369493008
(gdb)  c

Breakpoint 4, 0x0043ad2f in tranx1_ ()
(gdb)  x 0x43ad2f
0x43ad2f tranx1_+1727:mov%rbx,0xff98(%rsp)
(gdb)  i r rbx
rbx0xc6362468   3325437032
(gdb)  c

Program received signal SIGSEGV, Segmentation fault.
0x0043ad65 in tranx1_ ()
(gdb)  x 0x43ad65
0x43ad65 tranx1_+1781:subsd  (%r14),%xmm0
(gdb)  i r r14
r140xc6362468   3325437032


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243



[Bug target/37158] Wrong insn for _mm_comieq_sd

2008-08-19 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-08-19 08:19 ---
Check out such code in i386.c:
/* Figure out whether to use ordered or unordered fp comparisons.
   Return the appropriate mode to use.  */

enum machine_mode
ix86_fp_compare_mode (enum rtx_code code ATTRIBUTE_UNUSED)
{
  /* ??? In order to make all comparisons reversible, we do all comparisons
 non-trapping when compiling for IEEE.  Once gcc is able to distinguish
 all forms trapping and nontrapping comparisons, we can make inequality
 comparisons trapping again, since it results in better code when using
 FCOM based compares.  */
  return TARGET_IEEE_FP ? CCFPUmode : CCFPmode;
}


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

Summary| Wrong insn for |Wrong insn for _mm_comieq_sd
   |_mm_comieq_sd   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37158



[Bug rtl-optimization/37124] New: ICE with attribute(option(no-mmx))

2008-08-14 Thread Joey dot ye at intel dot com
$ cat opt.c
extern void abort (void);
double
foo (int arg)
{
  if (arg != 116)
abort();
  return arg + 1;
}
inline double
#if HAS_ATTR
__attribute__ ((__option__ (no-mmx)))
#endif
bar (int arg)
{
  foo (arg);
  __builtin_return (__builtin_apply ((void (*) ()) foo,
 __builtin_apply_args (), 16));
}
$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080810 (experimental) [trunk revision 138935] (GCC) 
$ gcc -c -m32 -O3 -mmmx opt.c -DHAS_ATTR=1
opt.c: In function 'bar':
opt.c:20: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
$ gcc -c -m32 -O3 -mmmx opt.c -DHAS_ATTR=0
$


-- 
   Summary: ICE with attribute(option(no-mmx))
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37124



[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64

2008-08-10 Thread Joey dot ye at intel dot com


--- Comment #6 from Joey dot ye at intel dot com  2008-08-11 05:52 ---
(In reply to comment #4)
 If you remove -ffast-math, does it miscompare?
Passes without -ffast-math.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983



[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64

2008-08-07 Thread Joey dot ye at intel dot com


--- Comment #3 from Joey dot ye at intel dot com  2008-08-07 07:55 ---
Although 138318 fixes the compiler ICE, it miscompile with -O3 -ffast-math on
x86-64:
  Running 172.mgrid ref base o3 default
*** Miscompare of mgrid.out, see
/home/jye2/cpu2000/benchspec/CFP2000/172.mgrid/run/0003/mgrid.out.mis

No small case available yet


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983



[Bug middle-end/34921] Misalign stack variable referenced by nested function

2008-08-06 Thread Joey dot ye at intel dot com


--- Comment #9 from Joey dot ye at intel dot com  2008-08-06 08:05 ---
Fixed


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921



[Bug c++/37012] numerous stackalign related testsuite failures on i686-apple-darwin9

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #18 from Joey dot ye at intel dot com  2008-08-04 07:24 ---
(In reply to comment #9)
 Joey, I think the problem is the usage of STACK_BOUNDARY / BITS_PER_UNIT
 for stack alignment. On MacOS, STACK_BOUNDARY 128 on ia32. Shouldn't
 we use UNITS_PER_WORD in some cases? Please double check all usages of
 STACK_BOUNDARY / BITS_PER_UNIT in our stack alignment codes.
That's exactly what I worried about 128 bits STACK_BOUNDARY. For example
following code won't work on Darwin:
  int param_ptr_offset = (call_used_regs[REGNO (crtl-drap_reg)]
  ? 0 : STACK_BOUNDARY / BITS_PER_UNIT);
UNITS_PER_WORD should be used instead. Working on the patch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37012



[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #6 from Joey dot ye at intel dot com  2008-08-04 08:28 ---
(In reply to comment #3)
 Joey, when we compute frame layout, we don't count the duplicated
 return address pushed onto stack when DRAP is used. Also when we
 push return address, shouldn't we use -UNITS_PER_WORD, instead of
 -(STACK_BOUNDARY / BITS_PER_UNIT))? On MacOS, STACK_BOUNDARY is
 128 on ia32. Does this patch make sense?
 --- ./i386.c.drap   2008-08-03 09:50:05.0 -0700
 +++ ./i386.c2008-08-03 11:36:40.0 -0700
 @@ -7291,6 +7291,10 @@ ix86_compute_frame_layout (struct ix86_f
if (stack_realign_fp)
  offset = (offset + stack_alignment_needed -1)  -stack_alignment_needed;
 +  /* Duplicated return address when DRAP is used.  */
 +  if (crtl-drap_reg  crtl-stack_realign_needed)
 +offset += UNITS_PER_WORD;
 +
/* Register save area */
offset += frame-nregs * UNITS_PER_WORD;
 @@ -7692,8 +7696,7 @@ ix86_expand_prologue (void)
  expand_builtin_return_addr etc.  */
x = crtl-drap_reg;
x = gen_frame_mem (Pmode,
 - plus_constant (x,
 -   -(STACK_BOUNDARY / BITS_PER_UNIT)));
 + plus_constant (x, -UNITS_PER_WORD));
insn = emit_insn (gen_push (x));
RTX_FRAME_RELATED_P (insn) = 1;
  }
I suspect this patch is incorrect. 
  /* Skip return address and saved base pointer.  */
  offset = frame_pointer_needed ? UNITS_PER_WORD * 2 : UNITS_PER_WORD;
already count the duplicated address in. I'm analyzing what makes this case
fail.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010



[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #7 from Joey dot ye at intel dot com  2008-08-04 09:03 ---
This problem is associated with -mpreferred-stack-boundary=2, rather than with
stack alignment. Following case fails on trunk before merging with stack
branch:
$ cat y1.c
/* PR middle-end/37010 */
/* { dg-do run { target { { i?86-*-* x86_64-*-* }  ilp32 } } } */
/* { dg-options -msse2 } */

typedef __PTRDIFF_TYPE__ ptrdiff_t;
extern void abort (void);

int
__attribute__ ((noinline))
check (void *i, int align)
{
  if ptrdiff_t) i)  (align - 1)) != 0)
{
  abort ();
}
  return 0;
}
typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));

void
__attribute__ ((noinline))
foo (__m128 x, __m128 y ,__m128 z ,__m128 a, int size)
{
  check(a, __alignof__(a));
}

int
main (void)
{
  __m128 x = { 1.0 };
  foo (x, x, x, x, 5);
  return 0;
}

$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080707 (experimental) [trunk revision 137572] (GCC) 
$ gcc  -o y1.exe y1.c -m32 -Os -msse2 -mpreferred-stack-boundary=2
$ ./y.exe
Aborted


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010



[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #8 from Joey dot ye at intel dot com  2008-08-04 09:11 ---
Root cause is that outgoing parameter frame is aligned based on stack pointer.
Namely, address_of_stack_param = SP + offset + fixed_padding.

With -mpreferred-stack-boundary=2, alignment of SP is only 4 bytes. Outgoing
frame won't be possibly aligned with 16 bytes without additional 'and $-16,
sp'.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010



[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #11 from Joey dot ye at intel dot com  2008-08-04 14:11 ---
(In reply to comment #10)
 Did you mean we needed 2 additional 'and $-16, sp insns to align the
 stack? I don't think so.
Definitely not. 
Solution 1: Just ignore it. __m128 parameter shouldn't be passed with
-mpreferred-stack-boundary=2, or
Solution 2. Record max alignment of all outgoing parameter, and
crtl-preferred_stack_boundary = max_parameter_alignment


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010



[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment

2008-08-04 Thread Joey dot ye at intel dot com


--- Comment #15 from Joey dot ye at intel dot com  2008-08-05 01:01 ---
(In reply to comment #12)
 I think the problem is in
   /* Set offset to aligned because the realigned frame tarts from here.  */
   if (stack_realign_fp)
 offset = (offset + stack_alignment_needed -1)  -stack_alignment_needed;
 This code assumes that offset 0 is properly aligned to any alignment,
 which isn't true. It happens to work with -maccumulate-outgoing-args.
I still believe #8 is the right reason.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010



[Bug middle-end/36983] New: Trunk 138207 miscompiles 172.mgrid on x86-64

2008-07-31 Thread Joey dot ye at intel dot com
$ cat mgrid.f
  SUBROUTINE RESID(U,V,R,N,A)
  INTEGER N
  REAL*8 U(N,N,N),V(N,N,N),R(N,N,N),A(0:3)
  INTEGER I3, I2, I1
  DO 600 I3=2,N-1
  DO 600 I2=2,N-1
  DO 600 I1=2,N-1
 600  R(I1,I2,I3)=V(I1,I2,I3)
   -A(0)*( U(I1,  I2,  I3  ) )
   -A(1)*( U(I1-1,I2,  I3  ) + U(I1+1,I2,  I3  )
  +  U(I1,  I2-1,I3  ) + U(I1,  I2+1,I3  )
  +  U(I1,  I2,  I3-1) + U(I1,  I2,  I3+1) )
   -A(2)*( U(I1-1,I2-1,I3  ) + U(I1+1,I2-1,I3  )
  +  U(I1-1,I2+1,I3  ) + U(I1+1,I2+1,I3  )
  +  U(I1,  I2-1,I3-1) + U(I1,  I2+1,I3-1)
  +  U(I1,  I2-1,I3+1) + U(I1,  I2+1,I3+1)
  +  U(I1-1,I2,  I3-1) + U(I1-1,I2,  I3+1)
  +  U(I1+1,I2,  I3-1) + U(I1+1,I2,  I3+1) )
   -A(3)*( U(I1-1,I2-1,I3-1) + U(I1+1,I2-1,I3-1)
  +  U(I1-1,I2+1,I3-1) + U(I1+1,I2+1,I3-1)
  +  U(I1-1,I2-1,I3+1) + U(I1+1,I2-1,I3+1)
  +  U(I1-1,I2+1,I3+1) + U(I1+1,I2+1,I3+1) )
  RETURN
  END
$ gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --prefix=/home/jye2/rrs/138207/usr
--enable-languages=c,c++,fortran --disable-bootstrap
Thread model: posix
gcc version 4.4.0 20080728 (experimental) [trunk revision 138207] (GCC) 
$ gfortran   -O3 -ffast-math mgrid.f -c
mgrid.f: In function 'resid':
mgrid.f:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

$ gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --prefix=/home/jye2/rrs/138206/usr
--enable-languages=c,c++,fortran --disable-bootstrap
Thread model: posix
gcc version 4.4.0 20080728 (experimental) [trunk revision 138206] (GCC) 
$ gfortran   -O3 -ffast-math mgrid.f -c
$


-- 
   Summary: Trunk 138207 miscompiles 172.mgrid on x86-64
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983



[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64

2008-07-31 Thread Joey dot ye at intel dot com


--- Comment #2 from Joey dot ye at intel dot com  2008-07-31 10:50 ---
Yes. Just notice that latest trunk passes.


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983



[Bug middle-end/36986] New: Trunk 138207 miscompiles 447.dealII

2008-07-31 Thread Joey dot ye at intel dot com
This bug is also caused by 138207, and latest trunk still fails (138353)
$ g++ -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080731 (experimental) [trunk revision 138353] (GCC) 
$ g++ -c -O3 -ffast-math   dof_tools.i.cc
dof_tools.cc: In static member function 'static void
DoFTools::make_flux_sparsity_pattern(const DoFHandlerdim, SparsityPattern,
const FullMatrixdouble, const FullMatrixdouble) [with int dim = 3,
SparsityPattern = CompressedBlockSparsityPattern]':
dof_tools.cc:485: internal compiler error: in gimple_cond_get_ops_from_tree, at
gimple.c:493
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


-- 
   Summary: Trunk 138207 miscompiles 447.dealII
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986



[Bug middle-end/36986] Trunk 138207 miscompiles 447.dealII

2008-07-31 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-07-31 11:33 ---
Created an attachment (id=15982)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15982action=view)
Preprocessed test case


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986



[Bug tree-optimization/36835] Trunk 137774 miscompile cpu2006.473.astar

2008-07-16 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-07-16 13:14 ---
Fixed by revision 137859


-- 

Joey dot ye at intel dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835



[Bug tree-optimization/36835] New: Trunk 137774 miscompile cpu2006.473.astar

2008-07-15 Thread Joey dot ye at intel dot com
gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --enable-languages=c,c++ --disable-bootstrap
Thread model: posix
gcc version 4.4.0 20080714 (experimental) [trunk revision 137774] (GCC)

  Running 473.astar test base lnx32e default
Error with '/home/jye2/cpu2006/bin/specinvoke -E -d
/home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e. -c
1 -e compare.err -o compare.stdout -f compare.cmd': check file
'/home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e./.err'
*** Miscompare of lake.out, see
/home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e./lake.out.mis
Invalid run; unable to continue.  If you wish to ignore errors please use '-I'
or ignore_errors


-- 
   Summary: Trunk 137774 miscompile cpu2006.473.astar
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835



[Bug tree-optimization/36765] [4.4 Regression] Revision 137573 miscompiles 464.h264ref in SPEC CPU 2006

2008-07-10 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-07-11 05:46 ---
Created an attachment (id=15897)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15897action=view)
Small test case reduced from cpu2006.464.h264ref

/home/jye2/work/bug-37665 gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080707 (experimental) [trunk revision 137573] (GCC) 
/home/jye2/work/bug-37665 make -B  ./m.exe  echo PASS
gcc -c main.c -g
gcc -O2 -ffast-math -g   -c -o l5.o l5.c
gcc -o m.exe main.o l5.o
Bmin[0]=21
Aborted

/home/jye2/work/bug-37665 gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20080707 (experimental) [trunk revision 137572] (GCC) 
/home/jye2/work/bug-37665 make -B  ./m.exe  echo PASS
gcc -c main.c -g
gcc -O2 -ffast-math -g   -c -o l5.o l5.c
gcc -o m.exe main.o l5.o
PASS


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765



[Bug tree-optimization/36765] [4.4 Regression] Revision 137573 miscompiles 464.h264ref in SPEC CPU 2006

2008-07-10 Thread Joey dot ye at intel dot com


--- Comment #2 from Joey dot ye at intel dot com  2008-07-11 05:49 ---
Effect of line 76 
buffer_frame[0] = InitFullness;
is eliminated by optimizer due to bug in GCC.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765



[Bug tree-optimization/36054] bad code generation with -ftree-vectorize

2008-05-05 Thread Joey dot ye at intel dot com


--- Comment #13 from Joey dot ye at intel dot com  2008-05-05 07:22 ---
It is helpful. Root cause is that memory allocated by new is only aligned to 8
bytes under i386. In your case, object Environment is allocated by new and its
constructor tried to use movdqa to initialize its members. Following small case
shows the problem:
/* Compile with option -m32 -msse2 
   Current behavior: runtime segment fault
 */
#include stdio.h
#include emmintrin.h

struct A {
public:
__m128i m;
void init() { m = _mm_setzero_si128(); }
};

int main()
{
A * a = new A;
printf(Address of A: %p\n, a);
a-init();
delete a;
return 0;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054



[Bug tree-optimization/36054] bad code generation with -ftree-vectorize

2008-05-05 Thread Joey dot ye at intel dot com


--- Comment #14 from Joey dot ye at intel dot com  2008-05-05 07:29 ---
HJ, 

AVX will have the similar problem on x86_64, whose new only returns object
aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32
bytes boundary.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054



[Bug tree-optimization/36054] bad code generation with -ftree-vectorize

2008-04-30 Thread Joey dot ye at intel dot com


--- Comment #8 from Joey dot ye at intel dot com  2008-04-30 10:53 ---
(In reply to comment #6)
 (In reply to comment #4)
   have you tried to compile with -march=core2 -mfpmath=sse -msse?
  Yes, I've compiled it as following:
  % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096
  -Wnon-virtual-dtor -fPIC kernel_build.ii
 -m32 ? 

-m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change
of ia32 intrinsic conflict with 4.3.0 header files.

I'm using 4.3.0. Compilation passes but I still got link errors like:
/tmp/ccfJXXcV.o:(.rodata._ZTVN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[vtable
for portaudio::MemFunCallbackStreamnova::PortAudioBackend]+0x10): undefined
reference to `portaudio::Stream::close()'
/tmp/ccfJXXcV.o:(.rodata._ZTIN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[typeinfo
for portaudio::MemFunCallbackStream


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054



[Bug tree-optimization/36054] bad code generation with -ftree-vectorize

2008-04-30 Thread Joey dot ye at intel dot com


--- Comment #9 from Joey dot ye at intel dot com  2008-04-30 10:56 ---
(In reply to comment #8)
 -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline 
 change
Correction: -m32 is a must, but doesn't fix all. Options I'm using:
 g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096
-Wnon-virtual-dtor  -m32 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054



[Bug tree-optimization/36054] bad code generation with -ftree-vectorize

2008-04-30 Thread Joey dot ye at intel dot com


--- Comment #11 from Joey dot ye at intel dot com  2008-05-01 04:31 ---
Tim,

Since it doesn't link, I can only check the .s file. There are a couple of
constructor called Environment, which one is the problemetic function?

grep Environment kernel_build.s|grep glob
...
.globl _ZN4nova11EnvironmentD1Ev
.globl _ZN4nova11EnvironmentD2Ev
.globl _ZN4nova11EnvironmentC1Ev


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054



[Bug middle-end/36078] New: gfortran fails to build cpu2006/465.tonto

2008-04-29 Thread Joey dot ye at intel dot com
Start from trunk 134730, still fail by 134775:

$ cat f2.f90
   subroutine foo(func,p,eval)
  real(kind=kind(1.0d0)), dimension(3,0:4,0:4,0:4) :: p
  logical(kind=kind(.true.)), dimension(5,5,5) :: eval
  interface
 subroutine func(values,pt)
real(kind=kind(1.0d0)), dimension(:), intent(out) :: values
real(kind=kind(1.0d0)), dimension(:,:), intent(in) :: pt
 end subroutine
  end interface
  real(kind=kind(1.0d0)), dimension(125,3) :: pt
  integer(kind=kind(1)) :: n_pt

  n_pt = 1
  pt(1:n_pt,:) = 
 reshape( 
pack( 
   transpose(reshape(p,(/3,125/))), 
   spread(reshape(eval,(/125/)),dim=2,ncopies=3)), 
(/n_pt,3/))

   end subroutine
   end 

$ gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,fortran
Thread model: posix
gcc version 4.4.0 20080427 (experimental) [trunk revision 134730] (GCC) 

$ gfortran -c -O2 f2.f90
f2.f90: In function 'foo':
f2.f90:1: internal compiler error: in execute_todo, at passes.c:991
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


-- 
   Summary: gfortran fails to build cpu2006/465.tonto
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078



[Bug middle-end/36074] [4.4 Regression]: 447.dealII in SPEC CPU 2006 failed to compile

2008-04-29 Thread Joey dot ye at intel dot com


--- Comment #5 from Joey dot ye at intel dot com  2008-04-29 10:41 ---
Can be related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078, where I do
have a small case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36074



[Bug middle-end/34921] Misalign stack variable referenced by nested function

2008-01-22 Thread Joey dot ye at intel dot com


--- Comment #5 from Joey dot ye at intel dot com  2008-01-23 01:45 ---
(In reply to comment #2)
 I bet if you put jj in struct and don't have a nested function, this will be
 the same issue.

Not the same. In fact it passes if not referenced by a nested function. The
root is in tree-nested.c

$ cat nested-3.c
#include stdio.h
#include stdlib.h
typedef int aligned __attribute__((aligned(16)));
int global;

void
check (int *i)
{
  *i = 20;
  if int) i)  (__alignof__(aligned) - 1)) != 0)
{
  printf(\nUnalign address (%d): %p!\n,
 __alignof__(aligned), i);
  abort ();
}
}

void
foo (void)
{
  aligned jj;
  int j2;
  void bar ()
{
  j2 = -20;
}
  jj = 0;
  bar ();
  check (jj);
}

int
main()
{
  foo ();
  return 0;
}
$ diff -p nested-2.c nested-3.c
*** nested-2.c  2008-01-22 14:24:39.0 +0800
--- nested-3.c  2008-01-23 09:38:47.0 +0800
*** void
*** 19,27 
  foo (void)
  {
aligned jj;
void bar ()
  {
!   jj = -20;
  }
jj = 0;
bar ();
--- 19,28 
  foo (void)
  {
aligned jj;
+   int j2;
void bar ()
  {
!   j2 = -20;
  }
jj = 0;
bar ();
$ gcc -m32 -o nested-3.exe nested-3.c
$ ./nested-3.exe
$


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921



[Bug c/34921] Misalign stack variable referenced by nested function

2008-01-21 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2008-01-22 06:38 ---
This patch should fix it:
Index: gcc/tree-nested.c
===
--- gcc/tree-nested.c   (revision 131342)
+++ gcc/tree-nested.c   (working copy)
@@ -183,6 +183,10 @@

   TREE_CHAIN (field) = *p;
   *p = field;
+
+  /* Set correct alignment for frame struct type */
+  if (TYPE_ALIGN(type)  DECL_ALIGN (field))
+TYPE_ALIGN(type) = DECL_ALIGN (field);
 }

 /* Build or return the RECORD_TYPE that describes the frame state that is


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921



[Bug c/34921] New: Misalign stack variable referenced by nested function

2008-01-21 Thread Joey dot ye at intel dot com
 cat nested-2.c
#include stdio.h
#include stdlib.h
typedef int aligned __attribute__((aligned(16)));
int global;

void
check (int *i)
{
  *i = 20;
  if int) i)  (__alignof__(aligned) - 1)) != 0)
{
  printf(\nUnalign address (%d): %p!\n,
 __alignof__(aligned), i);
  abort ();
}
}

void
foo (void)
{
  aligned jj;
  void bar ()
{
  jj = -20;
}
  jj = 0;
  bar ();
  check (jj);
}

int
main()
{
  foo ();
  return 0;
}
 gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /home/wlin5/gcc/src-daily/configure
--enable-languages=c,c++,fortran --disable-bootstrap
Thread model: posix
gcc version 4.3.0 20080106 (experimental) [trunk revision 131347] (GCC) 
 gcc -m32 -o nested-2.exe nested-2.c
 ./nested-2.exe

Unalign address (16): 0xffa137dc!
Aborted


-- 
   Summary: Misalign stack variable referenced by nested function
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921



[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown

2007-10-22 Thread Joey dot ye at intel dot com


--- Comment #28 from Joey dot ye at intel dot com  2007-10-23 02:23 ---
Got similar result on x86_64, Core 2 improves 24% from 129469 to 129504. That's
great.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921



[Bug libmudflap/33119] New: Missing mf-runtime.h after make -j2 install

2007-08-20 Thread Joey dot ye at intel dot com
mf-runtime.h won't be installed with make -j2 install under x86_64 target. 

From the log file apparantly it is installed at first and then removed when
installing rest of gcc headers files. Can be caused by incorrect dependence
between mudflap and other target.


-- 
   Summary: Missing mf-runtime.h after make -j2 install
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: libmudflap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33119



[Bug libmudflap/33119] Missing mf-runtime.h after make -j2 install

2007-08-20 Thread Joey dot ye at intel dot com


--- Comment #2 from Joey dot ye at intel dot com  2007-08-20 08:53 ---
(In reply to comment #1)
 Nobody does make install with -j.
I guess so, that's why I set it minor. But does that mean error is expected
with -j? My script had -j by accident and it costed me hours to identify the
root cause. I doubt I'm the only lucky guy.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33119



[Bug rtl-optimization/32755] Seg fault when compile CPU2000 with -fsee

2007-07-13 Thread Joey dot ye at intel dot com


--- Comment #1 from Joey dot ye at intel dot com  2007-07-13 09:21 ---
Created an attachment (id=13909)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13909action=view)
Reduced testcase

GCC crashes with gcc -O2 -fsee case-see.c -c

Fails at all recent 4.3 trunk.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755



[Bug rtl-optimization/32755] New: Seg fault when compile CPU2000 with -fsee

2007-07-13 Thread Joey dot ye at intel dot com
4.3 trunk fails to build any 2006 with -fsee on x86_64:
gcc -c -o av.o -DSPEC_CPU -DNDEBUG -DPERL_CORE   -O2 -fsee  
-DSPEC_CPU_LP64  -DSPEC_CPU_LINUX_X64   av.c
av.c: In function 'Perl_av_reify':
av.c:50: internal compiler error: Segmentation fault


-- 
   Summary: Seg fault when compile CPU2000 with -fsee
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755



[Bug rtl-optimization/32755] Seg fault when compile CPU2000 with -fsee

2007-07-13 Thread Joey dot ye at intel dot com


--- Comment #2 from Joey dot ye at intel dot com  2007-07-13 09:27 ---
Root cause looks like at see.c line 1643:
  emit_insn_after (merged_ref, ref);
  delete_insn (ref);
where merged_ref and ref have the same INSN_UID. delete_insn will clear the df
information of that UID, resulted as no df information for merged_ref.

I tried inserting following line and it works:
+ INSN_UID(merged_ref)=cfun-emit-x_cur_insn_uid++;

But it is apparantly ugly. Anyone can share the right approach to replace 
a insn with another one who has the same UID?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755



[Bug middle-end/32598] [4.3 Regression]: 27_io/basic_stringbuf/setbuf/wchar_t/4.cc needs more than 6GB memory to compile

2007-07-03 Thread Joey dot ye at intel dot com


--- Comment #4 from Joey dot ye at intel dot com  2007-07-04 01:17 ---
126198 brought the regression


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32598