[Bug c/38891] using ms_abi function attribute with -mno-sse generates an internal compiler error

2009-02-12 Thread xuepeng dot guo at intel dot com


--- Comment #2 from xuepeng dot guo at intel dot com  2009-02-12 08:35 
---
Confirmed at revision 144120. This is caused by macro
CONDITIONAL_REGISTER_USAGE at i386.h. At the first the code

if (! TARGET_SSE)   \
  { \
int i;  \
for (i = 0; i  FIRST_PSEUDO_REGISTER; i++) \
  if (TEST_HARD_REG_BIT (reg_class_contents[(int)SSE_REGS], i)) \
fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ;   \
  }

set fixed_regs[27], fixed_regs[28], call_used_regs[27], call_used_regs[28] to
1.

And then the code in the same macro

if (TARGET_64BIT\
 ((cfun  cfun-machine-call_abi == MS_ABI) \
|| (!cfun  DEFAULT_ABI == MS_ABI)))   \
  { \
int i;  \
call_used_regs[4 /*RSI*/] = 0;  \
call_used_regs[5 /*RDI*/] = 0;  \
for (i = 0; i  8; i++) \
  call_used_regs[45+i] = 0; \
call_used_regs[27] = call_used_regs[28] = 0;\
  }

set call_used_regs[27], call_used_regs[28] to 0.

This finally caused gcc_assert (!fixed_regs[i] || call_used_regs[i]).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38891



[Bug c/38891] using ms_abi function attribute with -mno-sse generates an internal compiler error

2009-02-12 Thread xuepeng dot guo at intel dot com


--- Comment #3 from xuepeng dot guo at intel dot com  2009-02-12 08:50 
---
The numbers 27 and 28 mean extended SSE registers xmm10 and xmm11. Because we
turned on the option -mno-sse, according to the explanation of
FIXED_REGISTERS in i386.h I think that setting fixed_regs[27] and
fixed_regs[28] to 1 is reasonable. According to the explanation of
CALL_USED_REGISTERS 1 for registers not available across function calls. These
must include the FIXED_REGISTERS and also any registers that can be used
without being saved., when the call_abi equals MS_ABI I think leaving
call_used_regs[27] and call_used_regs[28] as 1 is better.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38891



[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-02-09 Thread xuepeng dot guo at intel dot com


--- Comment #17 from xuepeng dot guo at intel dot com  2009-02-09 09:16 
---
Below is a loop in the case in its original form(compiled by GCC 4.4):

_Z7bench_1PfS_fj:
.LFB2309:
shrl$2, %edx
shufps  $0, %xmm0, %xmm0
subl$1, %edx
xorl%eax, %eax
addq$1, %rdx
salq$4, %rdx
.p2align 4,,10
.p2align 3
.L11:
movaps  %xmm0, %xmm1   
addps   (%rsi,%rax), %xmm1
movaps  %xmm1, (%rdi,%rax)
addq$16, %rax
cmpq%rdx, %rax
jne .L11
rep
ret

The time is:

[xg...@shgcc-10 38824]$ g++ 44.s -o orig.out
[xg...@shgcc-10 38824]$ time ./orig.out

real0m1.878s
user0m1.877s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./orig.out

real0m1.879s
user0m1.879s
sys 0m0.001s
[xg...@shgcc-10 38824]$ time ./orig.out

real0m1.873s
user0m1.872s
sys 0m0.001s

After adding two nop:

.L11:
movaps  %xmm0, %xmm1
nop
nop
addps   (%rsi,%rax), %xmm1
movaps  %xmm1, (%rdi,%rax)
addq$16, %rax
cmpq%rdx, %rax
jne .L11
rep
ret

The time is:
[xg...@shgcc-10 38824]$ g++ 44.s -o 2nop.out
[xg...@shgcc-10 38824]$ time ./2nop.out

real0m1.762s
user0m1.762s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./2nop.out

real0m1.762s
user0m1.762s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./2nop.out

real0m1.762s
user0m1.761s
sys 0m0.000s

I suspect that the code layout maybe hurt the performance.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-01-23 Thread xuepeng dot guo at intel dot com


--- Comment #8 from xuepeng dot guo at intel dot com  2009-01-24 05:12 
---
Created an attachment (id=17173)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17173action=view)
An extracted test case for this bug.

Hi tim, I extracted this test case from your website. But I can't exactly
reproduce this bug on my machine with a core2 quard micor processor. Can you
help me to check whether my test case is valid firstly? Here I post what I got
on my machine for your reference:

[xg...@shgcc-10 38824]$ /home/xguo2/app/trunk/bin/g++ -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --enable-checking=assert --disable-bootstrap
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.4.0 20090121 (experimental) [trunk revision 143537] (GCC)
[xg...@shgcc-10 38824]$ /home/xguo2/app/trunk/bin/g++ -O3 -msse -mfpmath=sse
simd_unroll_benchmarks.cpp -o 44.out
[xg...@shgcc-10 38824]$ time ./44.out

real0m1.877s
user0m1.876s
sys 0m0.001s
[xg...@shgcc-10 38824]$ time ./44.out

real0m1.877s
user0m1.877s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./44.out

real0m1.881s
user0m1.882s
sys 0m0.000s
[xg...@shgcc-10 38824]$ /home/xguo2/app/usr/gcc-4.2/bin/g++ -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /net/gnu-13/export/gnu/src/gcc-4.2/gcc/configure
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --enable-shared
--enable-threads=posix --enable-haifa --enable-checking=assert
--prefix=/usr/gcc-4.2 --with-local-prefix=/usr/local
Thread model: posix
gcc version 4.2.0
[xg...@shgcc-10 38824]$ /home/xguo2/app/usr/gcc-4.2/bin/g++ -O3 -msse
-mfpmath=sse simd_unroll_benchmarks.cpp -o 42.out
[xg...@shgcc-10 38824]$ time ./42.out

real0m1.991s
user0m1.991s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./42.out

real0m1.991s
user0m1.989s
sys 0m0.001s
[xg...@shgcc-10 38824]$ time ./42.out

real0m1.991s
user0m1.990s
sys 0m0.000s
[xg...@shgcc-10 38824]$ g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic
--host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)
[xg...@shgcc-10 38824]$ g++ -O3 -msse -mfpmath=sse simd_unroll_benchmarks.cpp
-o 41.out
[xg...@shgcc-10 38824]$ time ./41.out

real0m1.465s
user0m1.464s
sys 0m0.002s
[xg...@shgcc-10 38824]$ time ./41.out

real0m1.465s
user0m1.465s
sys 0m0.000s
[xg...@shgcc-10 38824]$ time ./41.out

real0m1.465s
user0m1.464s
sys 0m0.002s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug debug/37801] DWARF output for inlined functions doesn't always use DW_TAG_inlined_subroutine

2008-10-16 Thread xuepeng dot guo at intel dot com


--- Comment #1 from xuepeng dot guo at intel dot com  2008-10-16 07:36 
---
Hello Jason, I posted the whole debug_info section of of the binary file of
your example as below.  I guess you mean that DW_TAG_lexical_block tags like
those at 8c and 8d are unnecessary. We should avoid generating them if they
contain nothing. Am I right? 

The section .debug_info contains:

  Compilation Unit @ offset 0x0:
   Length:0x2a2 (32-bit)
   Version:   2
   Abbrev Offset: 0
   Pointer Size:  8
 0b: Abbrev Number: 1 (DW_TAG_compile_unit)
 c   DW_AT_producer: (indirect string, offset: 0x58): GNU C 4.4.0
20081005 (experimental) [trunk revision 4110]   
10   DW_AT_language: 1(ANSI C)
11   DW_AT_name: (indirect string, offset: 0x0): 37801.c  
15   DW_AT_comp_dir: (indirect string, offset: 0xd): /home/xguo2/work 
19   DW_AT_low_pc  : 0x40045c 
21   DW_AT_high_pc : 0x40048b 
29   DW_AT_stmt_list   : 0x0  
 12d: Abbrev Number: 2 (DW_TAG_subprogram)
2e   DW_AT_external: 1
2f   DW_AT_name: (indirect string, offset: 0x3c): third   
33   DW_AT_decl_file   : 1
34   DW_AT_decl_line   : 2
35   DW_AT_prototyped  : 1
36   DW_AT_inline  : 3(declared as inline and inlined)
37   DW_AT_sibling : 0x5b   
 23b: Abbrev Number: 3 (DW_TAG_formal_parameter)
3c   DW_AT_name: (indirect string, offset: 0x53): arg3
40   DW_AT_decl_file   : 1
41   DW_AT_decl_line   : 2
42   DW_AT_type: 0x5b   
 246: Abbrev Number: 4 (DW_TAG_variable)
47   DW_AT_name: (indirect string, offset: 0x37): var3
4b   DW_AT_decl_file   : 1
4c   DW_AT_decl_line   : 4
4d   DW_AT_type: 0x5b   
 251: Abbrev Number: 5 (DW_TAG_variable)
52   DW_AT_name: a
54   DW_AT_decl_file   : 1
55   DW_AT_decl_line   : 5
56   DW_AT_type: 0x62   
 15b: Abbrev Number: 6 (DW_TAG_base_type)
5c   DW_AT_byte_size   : 4
5d   DW_AT_encoding: 5(signed)
5e   DW_AT_name: int  
 162: Abbrev Number: 7 (DW_TAG_pointer_type)
63   DW_AT_byte_size   : 8
64   DW_AT_type: 0x5b   
 168: Abbrev Number: 2 (DW_TAG_subprogram)
69   DW_AT_external: 1
6a   DW_AT_name: (indirect string, offset: 0x42): second  
6e   DW_AT_decl_file   : 1
6f   DW_AT_decl_line   : 9
70   DW_AT_prototyped  : 1
71   DW_AT_inline  : 3(declared as inline and inlined)
72   DW_AT_sibling : 0x9b   
 276: Abbrev Number: 3 (DW_TAG_formal_parameter)
77   DW_AT_name: (indirect string, offset: 0x4e): arg2
7b   DW_AT_decl_file   : 1
7c   DW_AT_decl_line   : 9
7d   DW_AT_type: 0x5b   
 281: Abbrev Number: 4 (DW_TAG_variable)
82   DW_AT_name: (indirect string, offset: 0x32): var2
86   DW_AT_decl_file   : 1
87   DW_AT_decl_line   : 10   
88   DW_AT_type: 0x5b   
 28c: Abbrev Number: 8 (DW_TAG_lexical_block)
 38d: Abbrev Number: 8 (DW_TAG_lexical_block)
 48e: Abbrev Number: 9 (DW_TAG_variable)
8f   DW_AT_abstract_origin: 0x46
 493: Abbrev Number: 9 (DW_TAG_variable)
94   DW_AT_abstract_origin: 0x51
 19b: Abbrev Number: 2 (DW_TAG_subprogram)
9c   DW_AT_external: 1
9d   DW_AT_name: (indirect string, offset: 0x27): first   
a1   DW_AT_decl_file   : 1
a2   DW_AT_decl_line   : 14   
a3   DW_AT_prototyped  : 1
a4   DW_AT_inline  : 3(declared as inline and inlined)
a5   DW_AT_sibling : 0xd7   
 2a9: Abbrev Number: 3 (DW_TAG_formal_parameter)
aa   DW_AT_name: (indirect string, offset: 0x49): arg1
ae   DW_AT_decl_file   : 1
af   DW_AT_decl_line   : 14   
b0   DW_AT_type: 0x5b   
 2b4: Abbrev Number: 4 (DW_TAG_variable)
b5   DW_AT_name: (indirect string, offset: 0x2d): var1
b9   DW_AT_decl_file   : 1
ba   DW_AT_decl_line   : 15   
bb   DW_AT_type: 0x5b   
 2bf: Abbrev Number: 8 (DW_TAG_lexical_block)
 3c0: Abbrev Number: 8 (DW_TAG_lexical_block)
 4c1: Abbrev Number: 9 (DW_TAG_variable)
c2   DW_AT_abstract_origin: 0x81
 4c6: Abbrev Number: 8 (DW_TAG_lexical_block)
 5c7: Abbrev Number: 8 (DW_TAG_lexical_block)
 6c8: Abbrev Number: 9 (DW_TAG_variable)
c9   DW_AT_abstract_origin: 0x46
 6cd: Abbrev Number: 9 (DW_TAG_variable)
ce   DW_AT_abstract_origin: 0x51
 1d7: Abbrev Number: 10 (DW_TAG_subprogram)
d8   DW_AT_abstract_origin: 0x2d
dc   DW_AT_low_pc  : 0x40045c 
e4   DW_AT_high_pc : 0x400464 
ec   DW_AT_frame_base  : 2 byte block: 77 8   (DW_OP_breg7: 8)
ef   DW_AT_sibling : 0x105  
 2f3: Abbrev Number

[Bug debug/37801] DWARF output for inlined functions doesn't always use DW_TAG_inlined_subroutine

2008-10-16 Thread xuepeng dot guo at intel dot com


--- Comment #3 from xuepeng dot guo at intel dot com  2008-10-17 04:58 
---
Yes, I agree with you. Would you please explain your idea in more detailed way?
Please take what I posted in comment #1 as an example to show what should be
generated and what should not be generated. I am willing to fix this bug under
your instruction.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37801



[Bug debug/37022] [4.4 regression] internal compiler error: in compute_barrier_args_size

2008-08-11 Thread xuepeng dot guo at intel dot com


--- Comment #10 from xuepeng dot guo at intel dot com  2008-08-12 02:07 
---
(In reply to comment #7)
 Sorry, I can't reproduce the first issue with a x86_64-linux - i?86-darwin
 cross on the provided preprocessed testcase, tried many different
 -march=/-mtune=
 options as well as -f{,no-}asynchronous-unwind-tables.  What tuning do you 
 use?
  What preferred stack size?
 The later C testcase I can reproduce, but here the testcase has a frame 
 pointer
 (insn/f 38 37 39 pr37022.c:4 (set (reg/f:SI 6 bp)
 (reg/f:SI 7 sp)) 47 {*movsi_1} (nil))
 so I must say I don't understand at all why we generate any
 DW_CFA_GNU_args_size
 directives.  

I am not clear of you but this rtl is necessary for our stack realign proposal.
During we designed and implemented the stack realign proposal we didn't intend
to affect the existing way that DW_CFA_GNU_args_szie works. 

I believe the unwinder won't use them anyway, as in
 uw_install_context_1
 if (!_Unwind_GetGRPtr (current, __builtin_dwarf_sp_column ()))
 the condition is false (sp is saved in bp).  For the -fno-a-u-t we check
 cfa.reg:
The unwinder will restore sp by DW_CFA_def_cfa_expression as shown below:
080483a8 foo:
 80483a8:   8d 4c 24 04 lea0x4(%esp),%ecx
 80483ac:   83 e4 f0and$0xfff0,%esp
 80483af:   ff 71 fcpushl  -0x4(%ecx)
 80483b2:   55  push   %ebp
 80483b3:   89 e5   mov%esp,%ebp
 80483b5:   51  push   %ecx
 80483b6:   83 ec 04sub$0x4,%esp

0018 0024 001c FDE cie= pc=080483a8..080483d8
  DW_CFA_advance_loc: 4 to 080483ac
  DW_CFA_def_cfa: r1 (ecx) ofs 0
  DW_CFA_advance_loc: 9 to 080483b5
  DW_CFA_expression: r5 (ebp) (DW_OP_breg5: 0)
  DW_CFA_advance_loc: 1 to 080483b6
  DW_CFA_def_cfa_expression (DW_OP_breg5: -4; DW_OP_deref)  restore sp
  DW_CFA_advance_loc: 12 to 080483c2
  DW_CFA_GNU_args_size: 32
  DW_CFA_advance_loc: 22 to 080483d8
  DW_CFA_GNU_args_size: 0
  DW_CFA_nop


 if (!flag_asynchronous_unwind_tables  cfa.reg != STACK_POINTER_REGNUM)
 but not so for the -fa-u-t case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022



[Bug debug/37022] [4.4 regression] internal compiler error: in compute_barrier_args_size

2008-08-11 Thread xuepeng dot guo at intel dot com


--- Comment #11 from xuepeng dot guo at intel dot com  2008-08-12 02:11 
---
(In reply to comment #9)
 The darwin -m64 failures are then the same problem, cross-jumping of noreturn
 calls between different level of stack depths.
 I've been wrong about DW_CFA_GNU_args_size being useless for cfa.reg !=
 STACK_POINTER_REGNUM, while such directives won't ever be used by the libgcc
 unwinder, they might be used by debuggers to set correct value of stack
 pointer,
 and therefore such directives aren't useless and so we should avoid
 crossjumping
 in that case.  Not sure how to detect that in crossjumping code though.

You are right. IMHO this is exactly the reason.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022



[Bug debug/37022] libffi test suite failures

2008-08-06 Thread xuepeng dot guo at intel dot com


--- Comment #2 from xuepeng dot guo at intel dot com  2008-08-06 06:30 
---
Created an attachment (id=16030)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16030action=view)
Testcase.

Hi, I got the similar failure on linux/x86 platform.

[EMAIL PROTECTED] minbuild]$
/home/xguo2/internal-source-tree/stack-internal/minbuild/gcc/testsuite/g++/../../g++
-B/home/xguo2/internal-source-tree/stack-internal/minbuild/gcc/testsuite/g++/../../
/home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C
 -nostdinc++
-I/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/include
-I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/libsupc++
-I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/include/backward
-I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/testsuite/util
-fmessage-length=0  -Os  -fasynchronous-unwind-tables
-mpreferred-stack-boundary=4   
-L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs

-L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libiberty
 -lm   -m32 -o ./async-unwind-1.exe
/home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C:
In function ‘void foo(int, ...)’:
/home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C:74:
internal compiler error: in compute_barrier_args_size, at dwarf2out.c:1289
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022



[Bug debug/37022] libffi test suite failures

2008-08-06 Thread xuepeng dot guo at intel dot com


--- Comment #3 from xuepeng dot guo at intel dot com  2008-08-06 06:38 
---
Created an attachment (id=16031)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16031action=view)
A smaller case.

[EMAIL PROTECTED] stackalign]$ /home/xguo2/app/stack-internal/bin/g++ -m32 -Os 
-fasynchronous-unwind-tables -mpreferred-stack-boundary=4 a1.C
a1.C: In function ‘void foo(int, ...)’:
a1.C:27: internal compiler error: in compute_barrier_args_size, at
dwarf2out.c:1289
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022