[Bug target/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
--- Comment #13 from siarhei dot siamashka at gmail dot com 2010-08-14 16:28 --- (In reply to comment #12) Any news? :) http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00894.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug target/37734] Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-08-15 01:01 --- Here is another test example, now with some performance numbers for gcc 4.5.1 on 64-bit Intel Atom: $ cat fibbonachi.c /***/ #include stdlib.h int fib(int n) { int sum, previous = -1, result = 1; n++; while (--n = 0) { sum = result + previous; previous = result; result = sum; } return result; } int main(void) { if (fib(10) != 1532868155) abort(); return 0; } /***/ $ gcc -O2 -march=atom -o fibbonachi-O2 fibbonachi.c $ gcc -Os -march=atom -o fibbonachi-Os fibbonachi.c $ time ./fibbonachi-O2 real0m3.722s user0m3.652s sys 0m0.000s $ time ./fibbonachi-Os real0m3.078s user0m3.044s sys 0m0.000s Loop code for -O2 optimizations on x86-64: 18: 89 d1 mov%edx,%ecx 1a: 89 c2 mov%eax,%edx 1c: 8d 7f fflea-0x1(%rdi),%edi 1f: 8d 04 0alea(%rdx,%rcx,1),%eax 22: 83 ff ffcmp$0x,%edi 25: 75 f1 jne18 fib+0x18 Loop code for -Os optimizations on x86-64: c: 8d 0c 10lea(%rax,%rdx,1),%ecx f: 89 c2 mov%eax,%edx 11: 89 c8 mov%ecx,%eax 13: ff cf dec%edi 15: 79 f5 jnsc fib+0xc Also on ARM, loop code is suboptimal in all cases (just subs + bge could be used without any need for cmn/cmp): -O2 on ARM: 10: e2433001sub r3, r3, #1 14: e0820001add r0, r2, r1 18: e3730001cmn r3, #1 1c: e1a01002mov r1, r2 20: e1a02000mov r2, r0 24: 1af9bne 10 fib+0x10 -Os on ARM: c: e0831002add r1, r3, r2 10: e241sub r0, r0, #1 14: e1a02003mov r2, r3 18: e1a03001mov r3, r1 1c: e350cmp r0, #0 20: aaf9bge c fib+0xc -Os -mthumb on ARM: 8: 1899addsr1, r3, r2 a: 3801subsr0, #1 c: 461amov r2, r3 e: 460bmov r3, r1 10: 2800cmp r0, #0 12: daf9bge.n 8 fib+0x8 There are still similarities between x86 and ARM here. When using -O2 optimizations, the redundant comparison is performed with -1 constant in both cases. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Known to fail||4.5.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734
[Bug c/45207] The -Os flag generates wrong code for ARM966e-s
--- Comment #7 from siarhei dot siamashka at gmail dot com 2010-08-06 19:36 --- Do you have any packed structs? I wonder if the problem could be somehow related to PR45070. But it's hard to say anything until you narrow down the problem to a smaller testcase. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45207
[Bug c/45176] restrict qualifier is not used in a manually unrolled loop
--- Comment #4 from siarhei dot siamashka at gmail dot com 2010-08-05 13:40 --- Looks like this missed optimization regression was introduced in gcc 4.5 Are any similar fixes possible in 4.5 branch? -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45176
[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
--- Comment #4 from siarhei dot siamashka at gmail dot com 2010-07-28 07:16 --- Could not reproduce the problem with gcc 4.3.5 Disassembly of pr45070.o: 000c next: c: e92d401fpush{r0, r1, r2, r3, r4, lr} 10: e89cldm r0, {r2, r3} 14: e1a04000mov r4, r0 18: e1520003cmp r2, r3 1c: b3a03000movlt r3, #0 20: ba14blt 78 next+0x6c 24: e5903008ldr r3, [r0, #8] 28: e353cmp r3, #0 2c: 0a0ebeq 6c next+0x60 30: e3a03000mov r3, #0 34: e5803008str r3, [r0, #8] 38: e284add r0, r0, #4 3c: ebefbl 0 fetch 40: e1a4mov r0, r4 44: ebf0bl c next 48: e1a00800lsl r0, r0, #16 4c: e1a00840asr r0, r0, #16 50: e5cdstrbr0, [sp] 54: e1a00420lsr r0, r0, #8 58: e5cd0001strbr0, [sp, #1] 5c: e1dd30b0ldrhr3, [sp] 60: e1cd30bcstrhr3, [sp, #12] 64: e1dd30bcldrhr3, [sp, #12] 68: ea02b 78 next+0x6c 6c: e3a03001mov r3, #1 70: e5803008str r3, [r0, #8] 74: e59f3010ldr r3, [pc, #16] ; 8c next+0x80 78: e1cd30bcstrhr3, [sp, #12] 7c: e5dd300cldrbr3, [sp, #12] 80: e5dd000dldrbr0, [sp, #13] 84: e1830400orr r0, r3, r0, lsl #8 88: e8bd801fpop {r0, r1, r2, r3, r4, pc} ^^^ POP instruction just overwrites return value in r0 register here 8c: .word 0x Looks like the function gets treated as if it were returning 'void'. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Keywords||wrong-code Known to fail||4.5.0 Known to work||4.3.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
--- Comment #5 from siarhei dot siamashka at gmail dot com 2010-07-28 07:18 --- The disassembly chunk from the comment above was from gcc 4.5.0, using '-Os -match=armv5te' options. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug target/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-07-28 08:37 --- 'arm_size_return_regs()' returns 2 when generating epilogue for 'next' function here. And as a result, return value not registered in the mask, causing it to be clobbered. Would the following patch be the right fix? Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 162411) +++ gcc/config/arm/arm.c(working copy) @@ -13705,7 +13705,7 @@ !crtl-tail_call_emit) { unsigned long mask; - mask = (1 (arm_size_return_regs() / 4)) - 1; + mask = (1 ((arm_size_return_regs() + 3) / 4)) - 1; mask ^= 0xf; mask = ~saved_regs_mask; reg = 0; -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Component|c++ |target http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug target/45094] [arm] wrong instructions for dword move in some cases
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-07-27 20:07 --- Created an attachment (id=21327) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21327action=view) simplified testcase Confirmed with gcc 4.5.0 here. Also tried but could not reproduce the problem with gcc 4.4 (it just does not seem to be able to emit ldrd/strd instructions with pre/post increment). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45094
[Bug c++/45070] New: Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
Compilation: arm-unknown-linux-gnueabi-g++ -Os -mcpu=cortex-a8 -o test test.cpp Expected results: ./test 65534 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Real results (some garbage data): ./test 544 544 544 544 544 544 544 544 544 544 544 544 544 544 544 544 Note: This is not a big practical issue because Qt 4.7 does not use packed attribute for QChar anymore (a good idea because using this packed attribute results in a horribly slow code): http://qt.gitorious.org/qt/qt/commit/1ec8acd77b6c048f5a68887ac7750b0764ade598 -- Summary: Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2) Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: arm-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)
--- Comment #1 from siarhei dot siamashka at gmail dot com 2010-07-25 23:25 --- Created an attachment (id=21308) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21308action=view) packed-testcase.cpp -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070
[Bug target/43698] [4.5/4.6 Regression] Wrong use of ARMv6 REV instruction for endian bytewapping with -Os or -O2 optimizations
--- Comment #14 from siarhei dot siamashka at gmail dot com 2010-07-22 20:54 --- Thanks, this final variant of fix seems to work fine. Can this patch be backported to 4.5 branch and released with gcc 4.5.1 too? As I see it, the risk should be minimal because current gcc 4.5 branch is so broken on armv6/armv7 because of this bug, that it simply can't become any worse. As recently discovered in MeeGo [1], this bug has a high chance of breaking just about any program which does endian byteswapping. The list of broken packages includes 'dbus' and 'utils-linux-ng' to name a few, but surely there are more. 1. http://bugs.meego.com/show_bug.cgi?id=3936 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug target/43698] [4.5/4.6 Regression] Wrong use of ARMv6 REV instruction for endian bytewapping with -Os or -O2 optimizations
--- Comment #12 from siarhei dot siamashka at gmail dot com 2010-07-19 13:54 --- Updated the summary to better describe the problem (which is distro independent). The fact that this bug breaks pax-utils tool, which is a vital part of gentoo packaging system, thus rendering the system unusable is probably not so interesting in gcc bugzilla context :) -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Summary|[4.5/4.6 Regression] Invalid|[4.5/4.6 Regression] Wrong |code when building gentoo |use of ARMv6 REV instruction |pax-utils-0.1.19 with -Os |for endian bytewapping with |optimizations |-Os or -O2 optimizations http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug target/43703] Unexpected floating point precision loss due to ARM NEON autovectorization
--- Comment #4 from siarhei dot siamashka at gmail dot com 2010-06-15 10:34 --- Created an attachment (id=20913) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20913action=view) a fixed testcase A fixed testcase attached. The main problem here is that denormals are not handled in a 'civilized' way by gcc at the moment. They are just silently and unconditionally treated in a relaxed way, and that might be neither wanted nor expected by the user. And 'readelf -A' shows the following EABI tags for the generated object file, even not marking it in a special way with the regards to denormals handling: Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703
[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32
--- Comment #3 from siarhei dot siamashka at gmail dot com 2010-06-15 20:14 --- The whole point of submitting this PR was to find an efficient way to use NEON instructions to operate on any arbitrary scalar floating point values in order to overcome Cortex-A8 VFP Lite inherent slowness (maybe make it transparent via wrapping it into a C++ class and use operator overloading). Using 'vdup_n_f32' to load a single floating point value seems to be better than 'vset_lane_f32' here because we don't have to deal with uninitialized part of the register. But 'vdup_n_f32' suffers from the similar performance issues (VLD1 instruction is not used directly) and results in redundant instructions emitted when the value is loaded from memory. Optimistically, something like this should have been used instead of 'vdup_n_f32' in this case: static inline float32x2_t vdup_n_f32_mem(float *p) { float32x2_t result; asm (vld1.f32 {%P0[]}, [%1, :32] : =w (result) : r (p) : memory); return result; } If wonder if it is possible to check at compile time whether the operand comes from memory or from a register? Something similar to '__builtin_constant_p' builtin-function? Or use multiple alternatives feature for inline assembly constraints to emit either VMOV or VLD1? Anything else? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364
[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32
--- Comment #4 from siarhei dot siamashka at gmail dot com 2010-06-15 20:34 --- (In reply to comment #3) Or use multiple alternatives feature for inline assembly constraints to emit either VMOV or VLD1? Well, this kind of works :) But is very ugly and fragile: /***/ #include arm_neon.h /* Override a slow 'vdup_n_f32' intrinsic with something better */ static inline float32x2_t vdup_n_f32_fast(float x) { float32x2_t result; asm ( .set vdup_n_f32_fast_CODE_EMITTED,0\n .irp regname,r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14\n .ifeqs \\\regname\, \%1\\n vdup.32 %P0, %1\n .set vdup_n_f32_fast_CODE_EMITTED,1\n .endif\n .ifeqs \[\\regname, #0]\, \%1\\n vld1.f32 {%P0[]}, [\\regname, :32]\n .set vdup_n_f32_fast_CODE_EMITTED,1\n .endif\n .endr\n .if vdup_n_f32_fast_CODE_EMITTED == 0\n .error \Fixme: icky macros from 'vdup_n_f32_fast' failed\\n .endif\n : =w,w (result) : r,Q (x) : memory); return result; } #define vdup_n_f32(x) vdup_n_f32_fast(x) /* Now let's test it for accessing data in registers */ float neon_add_regs(float a, float b) { float32x2_t tmp1, tmp2; tmp1 = vdup_n_f32(a); tmp2 = vdup_n_f32(b); tmp1 = vadd_f32(tmp1, tmp2); return vget_lane_f32(tmp1, 0); } /* ... and in memory */ void neon_add_mem(float * __restrict out, float * __restrict a, float * __restrict b) { float32x2_t tmp1, tmp2; tmp1 = vdup_n_f32(*a); tmp2 = vdup_n_f32(*b); tmp1 = vadd_f32(tmp1, tmp2); *out = vget_lane_f32(tmp1, 0); } /***/ $ objdump -d test.o neon_add_mem: 0: f4e10c9fvld1.32 {d16[]}, [r1, :32] 4: f4e21c9fvld1.32 {d17[]}, [r2, :32] 8: f2400da1vadd.f32d16, d16, d17 c: f4c0080fvst1.32 {d16[0]}, [r0] 10: e12fff1ebx lr 0014 neon_add_regs: 14: ee800b90vdup.32 d16, r0 18: ee811b90vdup.32 d17, r1 1c: f2400da1vadd.f32d16, d16, d17 20: ee100b90vmov.32 r0, d16[0] 24: e12fff1ebx lr -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364
[Bug bootstrap/44469] New: [4.5/4.6 Regression] internal compiler error: in fixup_reorder_chain, at cfglayout.c:797
Target: armv7l-unknown-linux-gnueabi Configured with: ../gcc-4_5-branch/configure --prefix=/home/ssvb/gcc-test/bin --target=armv7l-unknown-linux-gnueabi --enable-languages=c --without-headers Thread model: posix gcc version 4.5.1 20100607 (prerelease) (GCC) $ armv7l-unknown-linux-gnueabi -O2 testcase.i testcase.i: In function a: testcase.i:15:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:797 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. This bug prevents bootstrap on ARM when configured with '--disable-checking' option. Also see PR42347 comment 28 -- Summary: [4.5/4.6 Regression] internal compiler error: in fixup_reorder_chain, at cfglayout.c:797 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: armv7l-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44469
[Bug bootstrap/44469] [4.5/4.6 Regression] internal compiler error: in fixup_reorder_chain, at cfglayout.c:797
--- Comment #1 from siarhei dot siamashka at gmail dot com 2010-06-08 14:45 --- Created an attachment (id=20868) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20868action=view) testcase.i -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44469
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #30 from siarhei dot siamashka at gmail dot com 2010-06-08 14:49 --- (In reply to comment #29) Please file a new PR for that, with preprocessed source and all other relevant info for reproduction. Thanks, filed PR44469 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #28 from siarhei dot siamashka at gmail dot com 2010-05-18 10:09 --- Thanks, this patch fixes bootstrap for powerpc/powerpc64. But still fails for arm on all the same gcc_assert() in another place. Should a new bug be filed about this? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #18 from siarhei dot siamashka at gmail dot com 2010-05-17 07:53 --- Created an attachment (id=20676) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20676action=view) powerpc64-broken-unreachable.i With the attached file (and '-O2 -c' options): 1. powerpc64 crosscompiler running on x86 box - always works fine 2. powerpc64 crosscompiler built with gcc 4.3.4 and running on powerpc64 box - works fine 3. powerpc64 crosscompiler built with gcc 4.5.0 and running on powerpc64 box - ICE -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #19 from siarhei dot siamashka at gmail dot com 2010-05-17 09:06 --- Can anybody knowledgeable verify whether it was commit r151790 ( http://repo.or.cz/w/official-gcc.git/commit/9dbb96fec5e08762f97dda771522283f1fe9710f ) that is causing troubles when __builtin_unreachable() is used in the default switch case? Unfortunately I could not add Andreas Krebbel to CC for this bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #21 from siarhei dot siamashka at gmail dot com 2010-05-17 10:07 --- (In reply to comment #18) Created an attachment (id=20676) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20676action=view) [edit] powerpc64-broken-unreachable.i With the attached file (and '-O2 -c' options): 1. powerpc64 crosscompiler running on x86 box - always works fine 2. powerpc64 crosscompiler built with gcc 4.3.4 and running on powerpc64 box - works fine Hmm, that was happening because I compiled it with --disable-checking. When built with --enable-checking=release, the ICE reproduces just fine on x86 box with powerpc64-unknown-linux-gnu crosscompiler. Well, getting ssh access to a fast powerpc64 box really did miracles :) Even though the problem does not seem to be that complex after all, painfully long compile times discouraged running more tests earlier, so even a small mistake easily could (and apparently did) lead to wrong track. I'm going to check current 4.5 SVN branch now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #22 from siarhei dot siamashka at gmail dot com 2010-05-17 11:31 --- (In reply to comment #20) Perhaps dup of PR44071 that got fixed recently? The problem is still reproducible with SVN rev 159480 in 'branches/gcc-4_5-branch', so the fix from PR44071 does not seem to help here. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug target/43698] [4.5/4.6 Regression] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations
--- Comment #10 from siarhei dot siamashka at gmail dot com 2010-05-17 18:48 --- Maybe I'm too impatient, but is there anything that prevents this patch from getting committed to SVN? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #16 from siarhei dot siamashka at gmail dot com 2010-05-04 07:04 --- So basically what we have is that gcc miscompiles itself somewhere in the code where one of those ~7000 gcc_assert is used. The next step is to identify which one of them triggers this bad behaviour (bisecting not in the svn revisions, but in gcc source files by flipping the use of __builtin_unreachable-based vs. ordinary gcc_assert implementations) and extract a reduced testcase showing __builtin_unreachable failure. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #15 from siarhei dot siamashka at gmail dot com 2010-05-03 23:45 --- As found by Raúl, indeed this regression was introduced in r150091. Reverting this change in gcc 4.5.0 release resolves the problem. Apparently the use of __builtin_unreachable() in gcc_assert macro (activated by !ENABLE_ASSERT_CHECKING) is triggering some kind of wrong-code bug on non x86/x86-64 platforms (at least arm and powerpc) and causes this bootstrap failure. There are some other __builtin_unreachable bugs in gcc bugzilla which are possibly related. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug c++/41201] #pragma GCC target (sse2) doesn't alter __SSE2__ in C++ (as it does in C)
--- Comment #1 from siarhei dot siamashka at gmail dot com 2010-04-27 22:44 --- #pragma GCC target|optimize just does not seem to work with C++. Just stumbled on it trying to narrow down something that looks like wrong-code generation bug in gcc 4.5.0 when compiling qt4. Prepending __attribute__((optimize(-O0))) to each function still works, so no real need to go through the trouble of splitting source files into parts to bisect the issue. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41201
[Bug target/43724] GCC produces suboptimal ARM NEON code for zero vector assignment
--- Comment #1 from siarhei dot siamashka at gmail dot com 2010-04-12 06:17 --- Or just vmov.i32 q8, #0 would be better to avoid any potential data dependency. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43724
[Bug target/43725] New: Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
vstrd3, [r0, #200] ; 0xc8 150: e28dd020add sp, sp, #32 154: ecbd8b10vpop{d8-d15} 158: e12fff1ebx lr This shows multiple performance problems: 1. The use of inherently slower VLDR/VSTR instructions instead of VLD1/VST1 2. Failure to make proper use of ARM Cortex-A8 NEON LS/ALU dual issue 3. Unnecessary spills to stack This is a general issue with NEON intrinsics, causing serious performance problems for practically any nontrivial code. I guess this itself can be a meta-bug, with each individual performance issue tracked separately. -- Summary: Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: armv7l-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
[Bug target/43698] [4.5/4.6 Regression] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations
--- Comment #8 from siarhei dot siamashka at gmail dot com 2010-04-12 09:34 --- (In reply to comment #7) Patch submitted here. http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00401.html Thank you. I have been testing it for two days already. It really helps (in the sense that it is apparently better to have this fix than not to have). I have bootstrapped the hard vfp system successfully and did not notice any other problems so far. Btw, miscompilation (of all the same package) also happens with -O2 optimization settings in some other place, but I did not try to investigate where exactly it fails. But I understand that it is just a workaround for the problem which happens somewhere in the upper layer? If REV instruction did not actually support conditional execution, then the fix would require actually finding the real cause. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-04-12 05:26 --- (In reply to comment #1) mov r3, #0 vdup.32 d16, r3 Also maybe veor.32 d16, d16, d16 here? Or drop this NEON register initialization completely because it is a redundant operation and was not explicitly requested in the original C code? After all, from IHI0042D_aapcs.pdf: The FPSCR is the only status register that may be accessed by conforming code. It is a global register with the following properties: * The condition code bits (28-31), the cumulative saturation (QC) bit (27) and the cumulative exception-status bits (0-4) are not preserved across a public interface. and from ARM ARM: Advanced SIMD arithmetic always uses untrapped exception handling Tracking the cumulative exception-status bits may be tricky in general (using ununitialized value for NEON arithmetics can set them arbitrarily), but as long as they are not used in any way in the function itself, they are irrelevant. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364
[Bug target/43698] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-04-09 08:04 --- (In reply to comment #1) 2. Does gcc-4.4.3 work? Yes, gcc-4.4.3 works (it just does not use 'rev' instruction). So it is a regression in 4.5. Thanks for a very fast response and analysis of the issue. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Known to work||4.4.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug target/43703] New: Unexpected floating point precision loss due to ARM NEON autovectorization
Using gcc-4.5.0-RC-20100406.tar.bz2 // #include stdio.h void __attribute__((noinline)) f(float * __restrict c, float * __restrict a, float * __restrict b) { int i; for (i = 0; i 4; i++) { c[i] = a[i] * b[i]; } } int main() { float a[4], b[4], c[4]; a[0] = 1e-40; b[0] = 1e+38; f(c, a, b); printf(c[0]=%f\n, (double)c[0]); if (c[0] 0.001) printf(precision problem: c[0] was flushed to zero\n); return 0; } // # gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O2 -fno-fast-math test.c # ./a.out c[0]=0.01 # gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O3 -fno-fast-math test.c # ./a.out c[0]=0.00 precision problem: c[0] was flushed to zero Using -O3 option turns on autovectorization, and the results of operations involving denormals get flushed to zero. This happens even if no -ffast-math or any other precision sacrificing options are enabled. -- Summary: Unexpected floating point precision loss due to ARM NEON autovectorization Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: armv7l-unknown-linux-gnueabi GCC host triplet: armv7l-unknown-linux-gnueabi GCC target triplet: armv7l-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703
[Bug target/43703] Unexpected floating point precision loss due to ARM NEON autovectorization
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-04-09 20:34 --- (In reply to comment #1) This is exacted really. Denormals are a weird case in general. Well, denormals may be weird. But what about nan's, inf's and the other IEEE stuff, which is not supported by NEON unit? The compiler here takes the liberty of using NEON whenever it likes, and NEON does not fully support IEEE for sure. After reading man gcc, I had an impression that this should have been controlled by -ffast-math and the related options. Floating point performance of VFP Lite unit is a disaster, and using NEON where appropriate is definitely needed. But IMHO this should be controlled somehow. For example by selectively using pragma optimize to set -ffast-math option in the critical parts of code. Also I don't know how fantastic it is, but having a special data type, something like 'fast_float' with the relaxed precision requirements and suitable for use with NEON would be really nice. Plus your testcase depends on uninitialized values. Yes, the testcase is not quite clean, but is easily fixable. Though this should not cause any problems unless floating point exceptions are enabled, those extra values are just irrelevant. Should I post an updated testcase? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703
[Bug c/43698] New: Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations
Tested with gcc-4.5.0-RC-20100406.tar.bz2 Reduced testcase: /*/ #include stdio.h #include stdint.h char do_reverse_endian = 0; # define bswap_32(x) \ x) 0xff00) 24) | \ (((x) 0x00ff) 8) | \ (((x) 0xff00) 8) | \ (((x) 0x00ff) 24)) #define EGET(X) \ (__extension__ ({ \ uint64_t __res; \ if (!do_reverse_endian) {__res = (X); \ } else if (sizeof(X) == 4) { __res = bswap_32((X)); \ } \ __res; \ })) void __attribute__((noinline)) X(char **phdr, char **data, int *phoff) { *phdr = *data + EGET(*phoff); } int main() { char *phdr; char *data = (char *)0x40164000; int phoff = 0x34; X(phdr, data, phoff); printf(got %p (expecting 0x40164034)\n, phdr); return 0; } /*/ # gcc -Os -o test test.c # ./test got 0x74164000 (expecting 0x40164034) -- Summary: Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: armv7l-unknown-linux-gnueabi GCC host triplet: armv7l-unknown-linux-gnueabi GCC target triplet: armv7l-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698
[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #7 from siarhei dot siamashka at gmail dot com 2010-04-06 11:01 --- Long story short. This bootstrap failure seems to be related to --disable-checking configure option. Reproduced on powerpc-unknown-linux-gnu and armv7l-unknown-linux-gnueabi. I'm re-running the tests now to be completely sure. Maybe it is caused by some bad assert with a side effect? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #10 from siarhei dot siamashka at gmail dot com 2010-04-06 14:44 --- (In reply to comment #8) It would be really helpful if someone can explain how to reproduce this with a cross-compiler. I will analyze/fix this problem when this is reproducible with a cross. I'm afraid this is not (easily) reproducible with a cross-compiler. Now I double checked everything and --disable-checking option really does break bootstrap on ppc and arm. Replacing it with --enable-checking=assert results in a successful build. It's also interesting that this bug does not affect x86 or x86-64. I think a simple script can be used for bisecting and may help to find a problematic gcc_assert (if it's really a problem). But this all will probably take at least a few days to run till completion, neither arm nor ppc hardware that I have is particularly fast... Avoiding the use of --disable-checking option can be used as a workaround for now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #5 from siarhei dot siamashka at gmail dot com 2010-04-03 17:39 --- Got exactly the same ICE on ARM, bootstrapping gcc: /var/tmp/portage/sys-devel/gcc-4.5.0_alpha20100401/work/gcc-4.5-20100401/gcc/sched-deps.c: In function get_dep_weak_1: /var/tmp/portage/sys-devel/gcc-4.5.0_alpha20100401/work/gcc-4.5-20100401/gcc/sched-deps.c:3841:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796 Please submit a full bug report, with preprocessed source if appropriate. See http://bugs.gentoo.org/ for instructions. But preprocessed source feeded to gcc-4.5-20100401 crosscompiler does not result in ICE. I'm going to try bootstrapping again with the patch from PR42509 and report back. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-04-03 21:53 --- (In reply to comment #5) But preprocessed source feeded to gcc-4.5-20100401 crosscompiler does not result in ICE. I'm going to try bootstrapping again with the patch from PR42509 and report back. This patch alone did not help. Will try to bootstrap SVN head now and do a few more tests. It can take many hours because native compilation on ARM is relatively slow. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347
[Bug target/43469] [4.5 Regression] ICE trying to compile glibc for ARM thumb2
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-03-31 22:50 --- (In reply to comment #4) Not exactly a primary or secondary target. CCing maintainer. I have been trying to find a complete list of gcc primary and secondary targets with no luck so far. But at least this this post refers to 'arm-eabi' as a primary target: http://gcc.gnu.org/ml/gcc/2010-03/msg00175.html This bug is also reproducible with 'arm-eabi' target triplet. Sorry for stating the obvious, but arm thumb2 support is getting pretty interesting nowadays and for example ubuntu is switching to it for the whole distro. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added GCC target triplet|armv7a-unknown-linux-gnueabi|armv7a-unknown-linux- ||gnueabi, arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469
[Bug target/43440] Overwriting neon quad register does not clobber all included single registers
--- Comment #8 from siarhei dot siamashka at gmail dot com 2010-03-21 10:05 --- What about just forbidding to use q registers in the inline assembly clobber list? Is it difficult to do? As a nice bonus, the existing potentially unsafe inline assembly will fail to compile, will be spotted and will have to be fixed (forcing the application developer to manually convert clobber list to use d or s registers). It will also solve compatibility problems with the older versions of gcc which still have this bug and still might be in use for a very long time. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440
[Bug c/43469] New: ICE trying to compile glibc for ARM thumb2
= the exact version of GCC Freshly checked out SVN trunk for gcc 4.5.0 (r157602) = the options given when GCC was configured/built; --target=armv7a-unknown-linux-gnueabi --enable-languages=c --without-headers = the complete command line that triggers the bug; armv7a-unknown-linux-gnueabi-gcc -mcpu=cortex-a8 -mthumb -O1 -c localealias.i = the compiler output (error messages, warnings, etc.); localealias.c: In function read_alias_file: localealias.c:362:1: error: unrecognizable insn: (insn 863 209 212 7 ../include/ctype.h:30 (set (const:SI (unspec:SI [ (symbol_ref:SI (__libc_tsd_CTYPE_B) [flags 0xe0] var_decl 0xfff95204b40 __libc_tsd_CTYPE_B) (const_int 3 [0x3]) (const (unspec:SI [ (const_int 2 [0x2]) ] 21)) (const_int 4 [0x4]) ] 20)) (reg:SI 10 sl)) -1 (nil)) localealias.c:362:1: internal compiler error: in extract_insn, at recog.c:2097 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- Summary: ICE trying to compile glibc for ARM thumb2 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: armv7a-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469
[Bug c/43469] ICE trying to compile glibc for ARM thumb2
--- Comment #1 from siarhei dot siamashka at gmail dot com 2010-03-21 19:05 --- Created an attachment (id=20152) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20152action=view) localealias.i -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469
[Bug c/43469] ICE trying to compile glibc for ARM thumb2
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-03-21 19:07 --- works fine with gcc 4.4.3 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #5 from siarhei dot siamashka at gmail dot com 2010-03-20 08:45 --- (In reply to comment #4) Also, what's the configuration in this case i.e what architecture, mode / cpu / fpu ? Tested on ARM Cortex-A8 hardware, the problematic package built either natively or crosscompiled, using gcc 4.4.1 without any vendor patches, 'armv4tl-softfloat-linux-gnueabi' was just a build triplet. No other options were feeded to gcc configure. Is there a smaller testcase which can be looked at ? As I mentioned in comment 3, the code crashed around the place where it accesses local variables on stack and where it could not address them directly (due to immediate offset encoding restrictions), so #4096 deltas were additionally applied. As such, I'm afraid that reducing this problem to a smaller testcase may be extremely difficult. I failed to do this so far (I tried to construct small functions with a huge stack frame exceeding 4K). Otherwise this will end up being a WONTFIX bug because we don't have a clear understanding of what / where the failure is. I had a hope that symptoms description might ring the bell even without a small testcase provided by me. Or somebody more knowledgeable about gcc internals could give a hint about what else could be tried to construct such a small testcase. I can also try to verify if the crash is still happening with gcc 4.4.3 and maybe SVN trunk. That's about all I can do at the moment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-03-20 13:55 --- The crash disappeared when recompiling libXft-2.1.13 library with gcc 4.4.3. Either it was fixed, or something else changed and it is not getting triggered anymore. I guess this bug can be closed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #7 from siarhei dot siamashka at gmail dot com 2010-03-20 13:58 --- Resolved, as now it WORKSFORME. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||WORKSFORME http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug target/43440] Overwriting neon quad register does not clobber all included single registers
--- Comment #5 from siarhei dot siamashka at gmail dot com 2010-03-21 03:33 --- I don't quite understand what's the problem: This patch has the unhappy side effect of clobbering s0, s1 and s2 if s3 is used because that's the only way we can indicate that q0 is clobbered by the write to s0. The proper solution seems to be extremely simple to me and it should do exactly the same what an application programmer would do to workaround the bug. Just when initially parsing clobber list do a simple text substitution q0 - d0, d1. Same for all the other q registers. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440
[Bug target/43440] Overwriting neon quad register does not clobber all included single registers
--- Comment #6 from siarhei dot siamashka at gmail dot com 2010-03-21 03:56 --- (In reply to comment #4) IMO the reasons as described in my email is another motivation for Neon programmers to be using intrinsics rather than inline assembler and to improve in general Neon intrinsics. The problem is that today neon intrinsics have a lot more issues in practice. The resulting code is way too slow to be usable, especially when gcc thinks that it is running out of registers and starts spilling variables to memory. Bug 43118 and bug 43364 are just some very basic examples of performance issues without looking any deeper. Not having many bugs in bugzilla for NEON intrinsics means that either they work good enough or nobody seriously uses them. At least for me it is the latter case. Autovectorization is even worse than intrinsics. Inline assembly has a few bugs, but they can be easily workarounded. Sorry for this rant/offtopic. Just thought that you might be somewhat interested the opinion of someone from the other side of the fence :-) -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440
[Bug inline-asm/41538] Mixing ARM/NEON intrinsic variables and inline assembly
--- Comment #5 from siarhei dot siamashka at gmail dot com 2010-03-14 12:23 --- Do you want to force data into specific neon registers because of the restriction on the neon registers which can be used as scalar operand for multiplication? It works for me. /**/ #include stdint.h #include arm_neon.h void f(int16_t *ptr) { register int16x4_t mul_consts asm (d0); int16x4_t data; int32x4_t tmp; mul_consts = vset_lane_s16(0x1234, mul_consts, 0); asm volatile ( vld1.16 {%P1}, [%2]\n vmull.s16 %q0, %P1, %P3[0]\n vshrn.s32 %P1, %q0, #15\n vst1.16 {%P1}, [%2]\n : =w (tmp), =w (data) : r (ptr), w (mul_consts) : memory ); } /**/ While not forcing 'mul_consts' variable into 'd0' register fails as expected: /tmp/ccvzAXVb.s: Assembler messages: /tmp/ccvzAXVb.s:27: Error: scalar out of range for multiply instruction -- `vmull.s16 q9,d17,d16[0]' So I don't see any problem here. Tested with gcc 4.3.4 and 4.4.3 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41538
[Bug inline-asm/37188] Missing documentation about the use of ARM NEON quad registers in inline asm arguments
--- Comment #3 from siarhei dot siamashka at gmail dot com 2010-03-14 12:44 --- As of today, gcc seems to be clever enough to deduct whether to use single precision or double precision VFP register when given w constraint (so P modifier is not strictly needed). This behavior seems to have been introduced in 4.3.2 gcc version. However, trying to force double precision variables into specific VFP registers breaks it: // #include stdio.h #include stdint.h inline int32_t double_to_fixed_16_16(double dbl) { int32_t fix; register double tmp asm (d0) = dbl; asm volatile ( vcvt.s32.f64 %1, %1, #16\n vmov.f32 %0, %1[0]\n : =r (fix), +w (tmp) ); return fix; } int main() { int32_t i = double_to_fixed_16_16(1.5); printf(%08X\n, i); } // /tmp/ccYfabov.s: Assembler messages: /tmp/ccYfabov.s:24: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16' /tmp/ccYfabov.s:25: Error: only D registers may be indexed -- `vmov.f32 r0,s0[0]' /tmp/ccYfabov.s:45: Error: operand size must match register width -- `vcvt.s32.f64 s0,s0,#16' /tmp/ccYfabov.s:46: Error: only D registers may be indexed -- `vmov.f32 r2,s0[0]' Also NEON quad registers still need explicit 'q' modifier in inline assembly. Updating the issue summary because NEON quad registers are now more problematic than VFP doubles. Thanks for your work on gcc. VFP/NEON support is slowly getting better over time. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Summary|Missing documentation about |Missing documentation about |the use of double precision |the use of ARM NEON quad |floating point registers in |registers in inline asm |inline asm arguments (VFP) |arguments http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188
[Bug c/43364] New: Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32
/***/ #include arm_neon.h void neon_add(float * __restrict out, float * __restrict a, float * __restrict b) { float32x2_t tmp1, tmp2; tmp1 = vset_lane_f32(*a, tmp1, 0); tmp2 = vset_lane_f32(*b, tmp2, 0); tmp1 = vadd_f32(tmp1, tmp2); *out = vget_lane_f32(tmp1, 0); } /***/ neon_add: 0: e5913000ldr r3, [r1] 4: eddf0b07vldrd16, [pc, #28] ; 28 neon_add+0x28 8: e5922000ldr r2, [r2] c: eddf1b05vldrd17, [pc, #20] ; 28 neon_add+0x28 10: ee003b90vmov.32 d16[0], r3 14: ee012b90vmov.32 d17[0], r2 18: f2400da1vadd.f32d16, d16, d17 1c: f4c0080fvst1.32 {d16[0]}, [r0] 20: e12fff1ebx lr 24: e1a0nop (mov r0,r0) gcc fails to use a single instruction vld1.32 {d16[0]}, [r1] instead of 0: e5913000ldr r3, [r1] 4: eddf0b07vldrd16, [pc, #28] ; 28 neon_add+0x28 10: ee003b90vmov.32 d16[0], r3 -- Summary: Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32 Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: arm-unknown-linux-gnueabi GCC host triplet: arm-unknown-linux-gnueabi GCC target triplet: arm-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364
[Bug inline-asm/41538] Mixing ARM/NEON intrinsic variables and inline assembly
--- Comment #2 from siarhei dot siamashka at gmail dot com 2010-03-11 20:29 --- When documentation is missing the needed bits information, these can be typically extracted from the source code. The only problem is that these constraints can be changed any time without notice unless properly documented and exposed to the outside world. There is bug 37188 about it. -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41538
[Bug middle-end/40887] GCC generates suboptimal code for indirect function calls on ARM
--- Comment #6 from siarhei dot siamashka at gmail dot com 2009-12-21 08:27 --- Created an attachment (id=19356) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19356action=view) return-address-prediction-bench.c This looks like a really serious performance issue. Not just indirect call alone is penalized, but the whole return address prediction stack is busted, causing return address mispredictions for all the nested calls. The attached test program demonstrates it. $ time ./return-address-prediction-bench 1 Indirect call for the topmost function real0m0.793s user0m0.789s sys 0m0.000s $ time ./return-address-prediction-bench Indirect call for the leaf function real0m1.797s user0m1.789s sys 0m0.008s gcc 4.4.2, -O2 -mcpu=cortex-a8 Change of function pointer type void (*f)() - void (* volatile f)() can be also used to workaround the problem. In this case execution times for both variants of test are approximately the same. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887
[Bug middle-end/40887] GCC generates suboptimal code for indirect function calls on ARM
--- Comment #7 from siarhei dot siamashka at gmail dot com 2009-12-21 08:53 --- (In reply to comment #4) I would rather split the load out as a separate insn and allow it to be scheduled separately. A question just to clarify the status of this issue. Are you waiting for David (or anybody else) to provide an updated patch with such split load? Are there no other options available besides either a perfect fix or no fix at all? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887
[Bug inline-asm/42321] New: NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly
Test program: // void f() { asm volatile(veor d8, d8, d8 : : :d8,d9,d10,d11,d14,d15); } // $ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c $ objdump -d test.o f: 0: ed2d8b08vpush {d8-d11} 4: ed2deb04vpush {d14-d15} 8: f3088118veord8, d8, d8 c: ecbd8b08vpop{d8-d11} 10: ecbdeb04vpop{d14-d15} 14: e12fff1ebx lr The order of the last two vpop instructions is messed up. -- Summary: NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: armv4tl-softfloat-linux-gnueabi GCC host triplet: armv4tl-softfloat-linux-gnueabi GCC target triplet: armv4tl-softfloat-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42321
[Bug inline-asm/42321] NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly
--- Comment #1 from siarhei dot siamashka at gmail dot com 2009-12-07 14:42 --- Modifying the program to list q-registers in the clobber list provides even more interesting results: // void f() { asm volatile(veor d8, d8, d8 : : :q4,q5,q7); } // $ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c $ objdump -d test.o f: 0: ed2d8b02vpush {d8} 4: ed2dab02vpush {d10} 8: ed2deb02vpush {d14} c: f3088118veord8, d8, d8 10: ecbd8b02vpop{d8} 14: ecbdab02vpop{d10} 18: ecbdeb02vpop{d14} 1c: e12fff1ebx lr Now in addition to the mismatched save/restore order, only lower halves of q-registers get saved. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42321
[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
--- Comment #7 from siarhei dot siamashka at gmail dot com 2009-11-03 20:09 --- Thanks a lot for checking this. And sorry about the confusion caused by attributing slowness of the testcase to the microcoded stuff (which turned out to be not the case) without proper checking this first. So should this bug be split into two? One about the incorrect warning, and another one about generating nonoptimal code at -O2 level (extra load and store operations, which are probably penalized by something like RAW hazard in such a short loop)? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
[Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly
/***/ void __attribute__((noinline)) y() { asm volatile (# nop\n); } void __attribute__((noinline)) x(long c) { while (c--) y(); } int main() { /* Run total 3.2G iterations */ x(16); x(16); return 0; } /***/ $ gcc -O2 -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-O2 test.c test.c: In function x: test.c:9: warning: emitting microcode insn {ai.|addic.} %0,%1,%2 [*adddi3_internal3] #38 $ time ./test-O2 real0m56.385s user0m56.232s sys 0m0.138s $ gcc -Os -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-Os test.c $ time ./test-Os real0m24.149s user0m24.086s sys 0m0.060s -- Summary: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
--- Comment #1 from siarhei dot siamashka at gmail dot com 2009-10-29 15:21 --- -O2: 0010 .x: 10: 2c 23 00 00 cmpdi r3,0 14: 7c 08 02 a6 mflrr0 18: f8 01 00 10 std r0,16(r1) 1c: f8 21 ff 81 stdur1,-128(r1) 20: 41 82 00 1c beq-3c .x+0x2c 24: f8 61 00 70 std r3,112(r1) 28: 48 00 00 01 bl 28 .x+0x18 2c: e8 01 00 70 ld r0,112(r1) 30: 35 20 ff ff addic. r9,r0,-1 34: f9 21 00 70 std r9,112(r1) 38: 40 82 ff f0 bne+28 .x+0x18 3c: 38 21 00 80 addir1,r1,128 40: e8 01 00 10 ld r0,16(r1) 44: 7c 08 03 a6 mtlrr0 48: 4e 80 00 20 blr 4c: 00 00 00 00 .long 0x0 50: 00 00 00 01 .long 0x1 54: 80 00 00 00 lwz r0,0(0) -Os: 0010 .x: 10: fb e1 ff f8 std r31,-8(r1) 14: 7c 08 02 a6 mflrr0 18: f8 01 00 10 std r0,16(r1) 1c: 7c 7f 1b 78 mr r31,r3 20: f8 21 ff 81 stdur1,-128(r1) 24: 48 00 00 08 b 2c .x+0x1c 28: 48 00 00 01 bl 28 .x+0x18 2c: 2f bf 00 00 cmpdi cr7,r31,0 30: 3b ff ff ff addir31,r31,-1 34: 40 9e ff f4 bne+cr7,28 .x+0x18 38: 38 21 00 80 addir1,r1,128 3c: e8 01 00 10 ld r0,16(r1) 40: eb e1 ff f8 ld r31,-8(r1) 44: 7c 08 03 a6 mtlrr0 48: 4e 80 00 20 blr 4c: 00 00 00 00 .long 0x0 50: 00 00 00 01 .long 0x1 54: 80 01 00 00 lwz r0,0(r1) -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added CC||siarhei dot siamashka at ||gmail dot com Keywords||missed-optimization Summary|cell microcode instruction |cell microcode instruction |is generated for a trivial |(addic.) is generated for a |loop with -O2 optimizations,|trivial loop with -O2 |hurting performance badly |optimizations, hurting ||performance badly http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #3 from siarhei dot siamashka at gmail dot com 2009-09-01 15:08 --- It works fine if '-fno-omit-frame-pointer' is removed. I agree that this is quite a large and convoluted function. Unfortunately I did not manage to reduce it to something smaller that would still result in broken behaviour. My only guess is that the stack frame which is bigger than 4K may make some difference. I have a full linux system compiled with -fno-omit-frame-pointer (to get stack backtraces and generate callgraphs in oprofile). If anything simpler happens to to be broken too, I'll try to investigate it and provide additional details. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug c/41196] New: The use of ARM NEON vshll_n_u8 intrinsic results in compile error on valid code
When using vshll_n_u8 intrinsic, gcc 4.4.1 incorrectly rejects shift operand having value = 8, claiming that it is out of range. When using the following test code /*/ #include arm_neon.h uint16x8_t test_vshll_n_u8 (uint8x8_t a) { return vshll_n_u8(a, 8); } /*/ Test with gcc 4.4.1: # gcc -c -O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer test.c test.c: In function test_vshll_n_u8: test.c:6: error: constant out of range It used to work fine with cs2007q3: # gcc -c -O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer test.c # objdump -d test.o test.o: file format elf32-littlearm Disassembly of section .text: test_vshll_n_u8: 0: ec410b17vmovd7, r0, r1 4: f3b26307vshll.i8q3, d7, #8 8: ec510b16vmovr0, r1, d6 c: ec532b17vmovr2, r3, d7 10: e12fff1ebx lr -- Summary: The use of ARM NEON vshll_n_u8 intrinsic results in compile error on valid code Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: armv4tl-softfloat-linux-gnueabi GCC host triplet: armv4tl-softfloat-linux-gnueabi GCC target triplet: armv4tl-softfloat-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41196
[Bug c/41074] New: Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
Terminal emulator from xfce4 segfaults if libXft-2.1.13 is compiled with vanilla gcc 4.4.1 and '-fno-strict-aliasing -g -O2 -fno-omit-frame-pointer' options. Program received signal SIGSEGV, Segmentation fault. 0x408599cc in XftGlyphSpecRender (dpy=value optimized out, op=value optimized out, src=value optimized out, pub=0x1615f0, dst=31457359, srcx=0, srcy=0, glyphs=0xbed2b824, nglyphs=12) at xftrender.c:299 299 elts[nelt].glyphset = font-glyphset; (gdb) info registers r0 0x123ae8 1194728 r1 0x0 0 r2 0x0 0 r3 0xbed2b824 3201480740 r4 0x0 0 r5 0xbed2a964 3201476964 r6 0x1615f0 1447408 r7 0x0 0 r8 0x1e0002b31457323 r9 0x0 0 r100xbed2a964 3201476964 r110xbed2b78c 3201480588 r120x74 116 sp 0xbed29900 0xbed29900 lr 0x40859790 1082496912 pc 0x408599cc 0x408599cc XftGlyphSpecRender+732 fps0x0 0 cpsr 0x6010 1610612752 (gdb) disassemble 0x408599a8 XftGlyphSpecRender+696:mla r10, r5, r9, r10 0x408599ac XftGlyphSpecRender+700:sub r5, r11, #4096 ; 0x1000 0x408599b0 XftGlyphSpecRender+704:str r10, [r5, #-3692] 0x408599b4 XftGlyphSpecRender+708:ldr r10, [r5, #-3632] 0x408599b8 XftGlyphSpecRender+712:str r7, [r5, #-3688] 0x408599bc XftGlyphSpecRender+716:add r5, r10, r7, lsl #2 0x408599c0 XftGlyphSpecRender+720:sub r7, r11, #4096 ; 0x1000 0x408599c4 XftGlyphSpecRender+724:ldr r7, [r7, #-3688] 0x408599c8 XftGlyphSpecRender+728:ldr r8, [r6, #124] 0x408599cc XftGlyphSpecRender+732:ldr r10, [r7, #-3632] 0x408599d0 XftGlyphSpecRender+736:str r8, [r10, r7, lsl #2] -- Summary: Invalid code generation on ARM when using '-fno-omit- frame-pointer' option Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: armv4tl-softfloat-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug c/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option
--- Comment #1 from siarhei dot siamashka at gmail dot com 2009-08-14 22:48 --- Created an attachment (id=18370) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18370action=view) xftrender.i Preprocessed source. I did not manage to reduce it to a smaller testcase yet. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074
[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm
--- Comment #7 from siarhei dot siamashka at gmail dot com 2009-02-10 15:11 --- (In reply to comment #6) This is not a bug, but a problem with your source code. In order to understand why, you need to pre-process the code and look at the output: ... void *memset_arm9(void *a, int b, int c) { return ({ uint8_t *dst = ((uint8_t *)a); uint8_t c = (b); int count = (c); uin t32_t dummy0, dummy1, dummy2; __asm__ __volatile__ ( Notice that first there is a declaration of a variable c (uint8_t), then in the next statement there is a use of c. This use (which is intended to be of the formal parameter passed to memset_arm9 is instead interpreted as the newly declared variable c (the uint8 one). Compiling your testcase with -Wshadow gives: inl.c: In function 'memset_arm9': inl.c:66: warning: declaration of 'c' shadows a parameter inl.c:64: warning: shadowed declaration is here Thanks for having a look at this. Indeed, macros are quite dangerous. Nevertheless, would it make sense to add this -Wshadow option into the set provided by -Wextra, or even introduce something like -Wreally-all option specifically for debugging such cases? Even better (but understandably not realistic) would be to have an option to show this warning only for the code which was expanded by C preprocessor in order to reduce the number of false positives. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693
[Bug target/37734] New: Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero
For the following source: // extern void a(); int unrolled_loop_fn(int count) { while ((count -= 2) = 0) { a(); a(); } if (count 1) { a(); } } // 'gcc -O2 -c test.c' produces the following quite suboptimal code: unrolled_loop_fn: 0: 55 push %ebp 1: 89 e5 mov%esp,%ebp 3: 56 push %esi 4: 8b 75 08mov0x8(%ebp),%esi 7: 53 push %ebx 8: 83 ee 02sub$0x2,%esi b: 85 f6 test %esi,%esi d: 89 f0 mov%esi,%eax f: 78 1c js 2d unrolled_loop_fn+0x2d 11: 89 f3 mov%esi,%ebx 13: 90 nop 14: 8d 74 26 00 lea0x0(%esi),%esi 18: e8 fc ff ff ff call 19 unrolled_loop_fn+0x19 1d: e8 fc ff ff ff call 1e unrolled_loop_fn+0x1e 22: 83 eb 02sub$0x2,%ebx 25: 79 f1 jns18 unrolled_loop_fn+0x18 27: 83 e6 01and$0x1,%esi 2a: 8d 46 felea-0x2(%esi),%eax 2d: a8 01 test $0x1,%al 2f: 74 05 je 36 unrolled_loop_fn+0x36 31: e8 fc ff ff ff call 32 unrolled_loop_fn+0x32 36: 5b pop%ebx 37: 5e pop%esi 38: 5d pop%ebp 39: c3 ret -- Summary: Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734
[Bug target/37734] Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero
--- Comment #1 from siarhei dot siamashka at gmail dot com 2008-10-04 02:48 --- For -Os optimization, the generated code is much better: unrolled_loop_fn: 0: 55 push %ebp 1: 89 e5 mov%esp,%ebp 3: 53 push %ebx 4: 83 ec 04sub$0x4,%esp 7: 8b 5d 08mov0x8(%ebp),%ebx a: eb 0a jmp16 unrolled_loop_fn+0x16 c: e8 fc ff ff ff call d unrolled_loop_fn+0xd 11: e8 fc ff ff ff call 12 unrolled_loop_fn+0x12 16: 83 eb 02sub$0x2,%ebx 19: 79 f1 jnsc unrolled_loop_fn+0xc 1b: 80 e3 01and$0x1,%bl 1e: 74 05 je 25 unrolled_loop_fn+0x25 20: e8 fc ff ff ff call 21 unrolled_loop_fn+0x21 25: 5a pop%edx 26: 5b pop%ebx 27: 5d pop%ebp 28: c3 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734
[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm
--- Comment #5 from siarhei dot siamashka at gmail dot com 2008-09-03 09:52 --- I'm sorry, is anybody investigating this quite serious bug? If nobody has time/motivation to do this work, would it make sense for me to try fixing it myself and submit a patch here? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693
[Bug inline-asm/37188] There is no way to specify double precision floating point registers in inline asm arguments (VFP)
--- Comment #1 from siarhei dot siamashka at gmail dot com 2008-09-02 15:50 --- Well, looks like it is not a missing feature, but just incompleteness of documentation :) It is possible to use double precision floating point registers and NEON 128-bit registers in the following way: -- #include arm_neon.h int16x8_t test_neon(int16x8_t b, int16x8_t c) { int16x8_t a; asm ( vadd.i32 %q0, %q1, %q2 \n\t : =w (a) : w (b), w (c) ); return a; } double test_double(double b, double c) { double a; asm ( faddd %P0, %P1, %P2 \n\t : =w (a) : w (b), w (c) ); return a; } -- Disassembly of section .text: test_quad: 0: e52db004push{fp}; (str fp, [sp, #-4]!) 4: e28db000add fp, sp, #0 ; 0x0 8: ec410b12vmovd2, r0, r1 c: ec432b13vmovd3, r2, r3 10: ed9b6b01vldrd6, [fp, #4] 14: ed9b7b03vldrd7, [fp, #12] 18: f2224846vadd.i32q2, q1, q3 1c: ec510b14vmovr0, r1, d4 20: ec532b15vmovr2, r3, d5 24: e28bd000add sp, fp, #0 ; 0x0 28: e8bd0800pop {fp} 2c: e12fff1ebx lr 0030 test_double: 30: ec410b15vmovd5, r0, r1 34: e52db004push{fp}; (str fp, [sp, #-4]!) 38: ec432b16vmovd6, r2, r3 3c: e28db000add fp, sp, #0 ; 0x0 40: ee357b06faddd d7, d5, d6 44: ec510b17vmovr0, r1, d7 48: e28bd000add sp, fp, #0 ; 0x0 4c: e8bd0800pop {fp} 50: e12fff1ebx lr -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188
[Bug inline-asm/37188] New: There is no way to specify double precision floating point registers in inline asm arguments (VFP)
Gcc manual, 5.38.4 Constraints for Particular Machines section: ARM familyconfig/arm/arm.h fFloating-point register wVFP floating-point register FOne of the floating-point constants 0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0 or 10.0 ... Using w constraint allows to use single precision VFP floating point registers. But this does not work for double precision. -- Summary: There is no way to specify double precision floating point registers in inline asm arguments (VFP) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC host triplet: i486-linux-gnu GCC target triplet: arm-softfloat-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188
[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm
--- Comment #4 from siarhei dot siamashka at gmail dot com 2008-05-13 12:32 --- This bug is still present in gcc 4.3 -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Known to fail||3.3.6 4.0.4 4.1.2 4.2.0 ||4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693
[Bug c++/32687] Invalid code generation for reading signed negative bitfield value (g++ optimization)
--- Comment #2 from siarhei dot siamashka at gmail dot com 2007-07-11 07:06 --- Tried this test with gcc 4.2.0, it also works correctly. So looks like the problem only shows up in gcc 4.1.x -- siarhei dot siamashka at gmail dot com changed: What|Removed |Added Known to work|3.4.6 4.3.0 |3.4.6 4.2.0 4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32687
[Bug regression/32687] New: Invalid code generation for reading signed negative bitfield value (g++ optimization)
Reading signed bitfield value when it needs to be extended to larger type (for example assigning 24-bit value to int) results in zero extending instead of sign extending when compiled with g++ using optimizations (-O1 or higher). Compiling the same code with gcc or disabling optimizations makes the problem disappear. The following code reproduces the problem: #include stdio.h struct TEST_STRUCT { int f_8 : 8; int f_24 : 24; }; int main () { struct TEST_STRUCT x; int a = -123; x.f_24 = a; printf(a=%d (%08X)\n, (int)a, (int)a); printf(x.f_24=%d (%08X)\n, (int)x.f_24, (int)x.f_24); if ((int)x.f_24 != (int)a) printf(test failed\n); else printf(test ok\n); return 0; } Expected correct result: a=-123 (FF85) x.f_24=-123 (FF85) test ok Faulty result: a=-123 (FF85) x.f_24=16777093 (0085) test failed It is a regression as gcc 3.4.6 did not have this bug. Also this problem may be related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32346 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30332 -- Summary: Invalid code generation for reading signed negative bitfield value (g++ optimization) Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32687
[Bug inline-asm/31693] New: Incorrectly assigned registers to operands for ARM inline asm
In the attached testcase, gcc assigns the same register to several inline asm named operands resulting in incorrect code generated. Seems like names of operands do matter ('c' and 'count' are assigned the same register but renaming 'c' operand to 'xxc' for example makes this bug disappear). -- Summary: Incorrectly assigned registers to operands for ARM inline asm Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: siarhei dot siamashka at gmail dot com GCC target triplet: arm-softfloat-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693
[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm
--- Comment #1 from siarhei dot siamashka at gmail dot com 2007-04-25 07:26 --- Created an attachment (id=13436) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13436action=view) testcase for this bug Testcase attached -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693
[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm
--- Comment #2 from siarhei dot siamashka at gmail dot com 2007-04-25 07:28 --- This may be related to #31386 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693