[Bug c/38891] using ms_abi function attribute with -mno-sse generates an internal compiler error
--- Comment #2 from xuepeng dot guo at intel dot com 2009-02-12 08:35 --- Confirmed at revision 144120. This is caused by macro CONDITIONAL_REGISTER_USAGE at i386.h. At the first the code if (! TARGET_SSE) \ { \ int i; \ for (i = 0; i FIRST_PSEUDO_REGISTER; i++) \ if (TEST_HARD_REG_BIT (reg_class_contents[(int)SSE_REGS], i)) \ fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ; \ } set fixed_regs[27], fixed_regs[28], call_used_regs[27], call_used_regs[28] to 1. And then the code in the same macro if (TARGET_64BIT\ ((cfun cfun-machine-call_abi == MS_ABI) \ || (!cfun DEFAULT_ABI == MS_ABI))) \ { \ int i; \ call_used_regs[4 /*RSI*/] = 0; \ call_used_regs[5 /*RDI*/] = 0; \ for (i = 0; i 8; i++) \ call_used_regs[45+i] = 0; \ call_used_regs[27] = call_used_regs[28] = 0;\ } set call_used_regs[27], call_used_regs[28] to 0. This finally caused gcc_assert (!fixed_regs[i] || call_used_regs[i]). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38891
[Bug c/38891] using ms_abi function attribute with -mno-sse generates an internal compiler error
--- Comment #3 from xuepeng dot guo at intel dot com 2009-02-12 08:50 --- The numbers 27 and 28 mean extended SSE registers xmm10 and xmm11. Because we turned on the option -mno-sse, according to the explanation of FIXED_REGISTERS in i386.h I think that setting fixed_regs[27] and fixed_regs[28] to 1 is reasonable. According to the explanation of CALL_USED_REGISTERS 1 for registers not available across function calls. These must include the FIXED_REGISTERS and also any registers that can be used without being saved., when the call_abi equals MS_ABI I think leaving call_used_regs[27] and call_used_regs[28] as 1 is better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38891
[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
--- Comment #17 from xuepeng dot guo at intel dot com 2009-02-09 09:16 --- Below is a loop in the case in its original form(compiled by GCC 4.4): _Z7bench_1PfS_fj: .LFB2309: shrl$2, %edx shufps $0, %xmm0, %xmm0 subl$1, %edx xorl%eax, %eax addq$1, %rdx salq$4, %rdx .p2align 4,,10 .p2align 3 .L11: movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm1 movaps %xmm1, (%rdi,%rax) addq$16, %rax cmpq%rdx, %rax jne .L11 rep ret The time is: [xg...@shgcc-10 38824]$ g++ 44.s -o orig.out [xg...@shgcc-10 38824]$ time ./orig.out real0m1.878s user0m1.877s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./orig.out real0m1.879s user0m1.879s sys 0m0.001s [xg...@shgcc-10 38824]$ time ./orig.out real0m1.873s user0m1.872s sys 0m0.001s After adding two nop: .L11: movaps %xmm0, %xmm1 nop nop addps (%rsi,%rax), %xmm1 movaps %xmm1, (%rdi,%rax) addq$16, %rax cmpq%rdx, %rax jne .L11 rep ret The time is: [xg...@shgcc-10 38824]$ g++ 44.s -o 2nop.out [xg...@shgcc-10 38824]$ time ./2nop.out real0m1.762s user0m1.762s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./2nop.out real0m1.762s user0m1.762s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./2nop.out real0m1.762s user0m1.761s sys 0m0.000s I suspect that the code layout maybe hurt the performance. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
--- Comment #8 from xuepeng dot guo at intel dot com 2009-01-24 05:12 --- Created an attachment (id=17173) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17173action=view) An extracted test case for this bug. Hi tim, I extracted this test case from your website. But I can't exactly reproduce this bug on my machine with a core2 quard micor processor. Can you help me to check whether my test case is valid firstly? Here I post what I got on my machine for your reference: [xg...@shgcc-10 38824]$ /home/xguo2/app/trunk/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --enable-checking=assert --disable-bootstrap --enable-languages=c,c++,fortran Thread model: posix gcc version 4.4.0 20090121 (experimental) [trunk revision 143537] (GCC) [xg...@shgcc-10 38824]$ /home/xguo2/app/trunk/bin/g++ -O3 -msse -mfpmath=sse simd_unroll_benchmarks.cpp -o 44.out [xg...@shgcc-10 38824]$ time ./44.out real0m1.877s user0m1.876s sys 0m0.001s [xg...@shgcc-10 38824]$ time ./44.out real0m1.877s user0m1.877s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./44.out real0m1.881s user0m1.882s sys 0m0.000s [xg...@shgcc-10 38824]$ /home/xguo2/app/usr/gcc-4.2/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /net/gnu-13/export/gnu/src/gcc-4.2/gcc/configure --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --enable-shared --enable-threads=posix --enable-haifa --enable-checking=assert --prefix=/usr/gcc-4.2 --with-local-prefix=/usr/local Thread model: posix gcc version 4.2.0 [xg...@shgcc-10 38824]$ /home/xguo2/app/usr/gcc-4.2/bin/g++ -O3 -msse -mfpmath=sse simd_unroll_benchmarks.cpp -o 42.out [xg...@shgcc-10 38824]$ time ./42.out real0m1.991s user0m1.991s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./42.out real0m1.991s user0m1.989s sys 0m0.001s [xg...@shgcc-10 38824]$ time ./42.out real0m1.991s user0m1.990s sys 0m0.000s [xg...@shgcc-10 38824]$ g++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20071124 (Red Hat 4.1.2-42) [xg...@shgcc-10 38824]$ g++ -O3 -msse -mfpmath=sse simd_unroll_benchmarks.cpp -o 41.out [xg...@shgcc-10 38824]$ time ./41.out real0m1.465s user0m1.464s sys 0m0.002s [xg...@shgcc-10 38824]$ time ./41.out real0m1.465s user0m1.465s sys 0m0.000s [xg...@shgcc-10 38824]$ time ./41.out real0m1.465s user0m1.464s sys 0m0.002s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
[Bug debug/37801] DWARF output for inlined functions doesn't always use DW_TAG_inlined_subroutine
--- Comment #1 from xuepeng dot guo at intel dot com 2008-10-16 07:36 --- Hello Jason, I posted the whole debug_info section of of the binary file of your example as below. I guess you mean that DW_TAG_lexical_block tags like those at 8c and 8d are unnecessary. We should avoid generating them if they contain nothing. Am I right? The section .debug_info contains: Compilation Unit @ offset 0x0: Length:0x2a2 (32-bit) Version: 2 Abbrev Offset: 0 Pointer Size: 8 0b: Abbrev Number: 1 (DW_TAG_compile_unit) c DW_AT_producer: (indirect string, offset: 0x58): GNU C 4.4.0 20081005 (experimental) [trunk revision 4110] 10 DW_AT_language: 1(ANSI C) 11 DW_AT_name: (indirect string, offset: 0x0): 37801.c 15 DW_AT_comp_dir: (indirect string, offset: 0xd): /home/xguo2/work 19 DW_AT_low_pc : 0x40045c 21 DW_AT_high_pc : 0x40048b 29 DW_AT_stmt_list : 0x0 12d: Abbrev Number: 2 (DW_TAG_subprogram) 2e DW_AT_external: 1 2f DW_AT_name: (indirect string, offset: 0x3c): third 33 DW_AT_decl_file : 1 34 DW_AT_decl_line : 2 35 DW_AT_prototyped : 1 36 DW_AT_inline : 3(declared as inline and inlined) 37 DW_AT_sibling : 0x5b 23b: Abbrev Number: 3 (DW_TAG_formal_parameter) 3c DW_AT_name: (indirect string, offset: 0x53): arg3 40 DW_AT_decl_file : 1 41 DW_AT_decl_line : 2 42 DW_AT_type: 0x5b 246: Abbrev Number: 4 (DW_TAG_variable) 47 DW_AT_name: (indirect string, offset: 0x37): var3 4b DW_AT_decl_file : 1 4c DW_AT_decl_line : 4 4d DW_AT_type: 0x5b 251: Abbrev Number: 5 (DW_TAG_variable) 52 DW_AT_name: a 54 DW_AT_decl_file : 1 55 DW_AT_decl_line : 5 56 DW_AT_type: 0x62 15b: Abbrev Number: 6 (DW_TAG_base_type) 5c DW_AT_byte_size : 4 5d DW_AT_encoding: 5(signed) 5e DW_AT_name: int 162: Abbrev Number: 7 (DW_TAG_pointer_type) 63 DW_AT_byte_size : 8 64 DW_AT_type: 0x5b 168: Abbrev Number: 2 (DW_TAG_subprogram) 69 DW_AT_external: 1 6a DW_AT_name: (indirect string, offset: 0x42): second 6e DW_AT_decl_file : 1 6f DW_AT_decl_line : 9 70 DW_AT_prototyped : 1 71 DW_AT_inline : 3(declared as inline and inlined) 72 DW_AT_sibling : 0x9b 276: Abbrev Number: 3 (DW_TAG_formal_parameter) 77 DW_AT_name: (indirect string, offset: 0x4e): arg2 7b DW_AT_decl_file : 1 7c DW_AT_decl_line : 9 7d DW_AT_type: 0x5b 281: Abbrev Number: 4 (DW_TAG_variable) 82 DW_AT_name: (indirect string, offset: 0x32): var2 86 DW_AT_decl_file : 1 87 DW_AT_decl_line : 10 88 DW_AT_type: 0x5b 28c: Abbrev Number: 8 (DW_TAG_lexical_block) 38d: Abbrev Number: 8 (DW_TAG_lexical_block) 48e: Abbrev Number: 9 (DW_TAG_variable) 8f DW_AT_abstract_origin: 0x46 493: Abbrev Number: 9 (DW_TAG_variable) 94 DW_AT_abstract_origin: 0x51 19b: Abbrev Number: 2 (DW_TAG_subprogram) 9c DW_AT_external: 1 9d DW_AT_name: (indirect string, offset: 0x27): first a1 DW_AT_decl_file : 1 a2 DW_AT_decl_line : 14 a3 DW_AT_prototyped : 1 a4 DW_AT_inline : 3(declared as inline and inlined) a5 DW_AT_sibling : 0xd7 2a9: Abbrev Number: 3 (DW_TAG_formal_parameter) aa DW_AT_name: (indirect string, offset: 0x49): arg1 ae DW_AT_decl_file : 1 af DW_AT_decl_line : 14 b0 DW_AT_type: 0x5b 2b4: Abbrev Number: 4 (DW_TAG_variable) b5 DW_AT_name: (indirect string, offset: 0x2d): var1 b9 DW_AT_decl_file : 1 ba DW_AT_decl_line : 15 bb DW_AT_type: 0x5b 2bf: Abbrev Number: 8 (DW_TAG_lexical_block) 3c0: Abbrev Number: 8 (DW_TAG_lexical_block) 4c1: Abbrev Number: 9 (DW_TAG_variable) c2 DW_AT_abstract_origin: 0x81 4c6: Abbrev Number: 8 (DW_TAG_lexical_block) 5c7: Abbrev Number: 8 (DW_TAG_lexical_block) 6c8: Abbrev Number: 9 (DW_TAG_variable) c9 DW_AT_abstract_origin: 0x46 6cd: Abbrev Number: 9 (DW_TAG_variable) ce DW_AT_abstract_origin: 0x51 1d7: Abbrev Number: 10 (DW_TAG_subprogram) d8 DW_AT_abstract_origin: 0x2d dc DW_AT_low_pc : 0x40045c e4 DW_AT_high_pc : 0x400464 ec DW_AT_frame_base : 2 byte block: 77 8 (DW_OP_breg7: 8) ef DW_AT_sibling : 0x105 2f3: Abbrev Number
[Bug debug/37801] DWARF output for inlined functions doesn't always use DW_TAG_inlined_subroutine
--- Comment #3 from xuepeng dot guo at intel dot com 2008-10-17 04:58 --- Yes, I agree with you. Would you please explain your idea in more detailed way? Please take what I posted in comment #1 as an example to show what should be generated and what should not be generated. I am willing to fix this bug under your instruction. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37801
[Bug debug/37022] [4.4 regression] internal compiler error: in compute_barrier_args_size
--- Comment #10 from xuepeng dot guo at intel dot com 2008-08-12 02:07 --- (In reply to comment #7) Sorry, I can't reproduce the first issue with a x86_64-linux - i?86-darwin cross on the provided preprocessed testcase, tried many different -march=/-mtune= options as well as -f{,no-}asynchronous-unwind-tables. What tuning do you use? What preferred stack size? The later C testcase I can reproduce, but here the testcase has a frame pointer (insn/f 38 37 39 pr37022.c:4 (set (reg/f:SI 6 bp) (reg/f:SI 7 sp)) 47 {*movsi_1} (nil)) so I must say I don't understand at all why we generate any DW_CFA_GNU_args_size directives. I am not clear of you but this rtl is necessary for our stack realign proposal. During we designed and implemented the stack realign proposal we didn't intend to affect the existing way that DW_CFA_GNU_args_szie works. I believe the unwinder won't use them anyway, as in uw_install_context_1 if (!_Unwind_GetGRPtr (current, __builtin_dwarf_sp_column ())) the condition is false (sp is saved in bp). For the -fno-a-u-t we check cfa.reg: The unwinder will restore sp by DW_CFA_def_cfa_expression as shown below: 080483a8 foo: 80483a8: 8d 4c 24 04 lea0x4(%esp),%ecx 80483ac: 83 e4 f0and$0xfff0,%esp 80483af: ff 71 fcpushl -0x4(%ecx) 80483b2: 55 push %ebp 80483b3: 89 e5 mov%esp,%ebp 80483b5: 51 push %ecx 80483b6: 83 ec 04sub$0x4,%esp 0018 0024 001c FDE cie= pc=080483a8..080483d8 DW_CFA_advance_loc: 4 to 080483ac DW_CFA_def_cfa: r1 (ecx) ofs 0 DW_CFA_advance_loc: 9 to 080483b5 DW_CFA_expression: r5 (ebp) (DW_OP_breg5: 0) DW_CFA_advance_loc: 1 to 080483b6 DW_CFA_def_cfa_expression (DW_OP_breg5: -4; DW_OP_deref) restore sp DW_CFA_advance_loc: 12 to 080483c2 DW_CFA_GNU_args_size: 32 DW_CFA_advance_loc: 22 to 080483d8 DW_CFA_GNU_args_size: 0 DW_CFA_nop if (!flag_asynchronous_unwind_tables cfa.reg != STACK_POINTER_REGNUM) but not so for the -fa-u-t case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022
[Bug debug/37022] [4.4 regression] internal compiler error: in compute_barrier_args_size
--- Comment #11 from xuepeng dot guo at intel dot com 2008-08-12 02:11 --- (In reply to comment #9) The darwin -m64 failures are then the same problem, cross-jumping of noreturn calls between different level of stack depths. I've been wrong about DW_CFA_GNU_args_size being useless for cfa.reg != STACK_POINTER_REGNUM, while such directives won't ever be used by the libgcc unwinder, they might be used by debuggers to set correct value of stack pointer, and therefore such directives aren't useless and so we should avoid crossjumping in that case. Not sure how to detect that in crossjumping code though. You are right. IMHO this is exactly the reason. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022
[Bug debug/37022] libffi test suite failures
--- Comment #2 from xuepeng dot guo at intel dot com 2008-08-06 06:30 --- Created an attachment (id=16030) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16030action=view) Testcase. Hi, I got the similar failure on linux/x86 platform. [EMAIL PROTECTED] minbuild]$ /home/xguo2/internal-source-tree/stack-internal/minbuild/gcc/testsuite/g++/../../g++ -B/home/xguo2/internal-source-tree/stack-internal/minbuild/gcc/testsuite/g++/../../ /home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C -nostdinc++ -I/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/include -I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/libsupc++ -I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/include/backward -I/home/xguo2/internal-source-tree/stack-internal/src/libstdc++-v3/testsuite/util -fmessage-length=0 -Os -fasynchronous-unwind-tables -mpreferred-stack-boundary=4 -L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -L/home/xguo2/internal-source-tree/stack-internal/minbuild/x86_64-unknown-linux-gnu/32/libiberty -lm -m32 -o ./async-unwind-1.exe /home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C: In function void foo(int, ...): /home/xguo2/internal-source-tree/stack-internal/src/gcc/testsuite/g++.dg/torture/stackalign/async-unwind-1.C:74: internal compiler error: in compute_barrier_args_size, at dwarf2out.c:1289 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022
[Bug debug/37022] libffi test suite failures
--- Comment #3 from xuepeng dot guo at intel dot com 2008-08-06 06:38 --- Created an attachment (id=16031) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16031action=view) A smaller case. [EMAIL PROTECTED] stackalign]$ /home/xguo2/app/stack-internal/bin/g++ -m32 -Os -fasynchronous-unwind-tables -mpreferred-stack-boundary=4 a1.C a1.C: In function void foo(int, ...): a1.C:27: internal compiler error: in compute_barrier_args_size, at dwarf2out.c:1289 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37022