[Bug target/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-08-14 Thread siarhei dot siamashka at gmail dot com


--- Comment #13 from siarhei dot siamashka at gmail dot com  2010-08-14 
16:28 ---
(In reply to comment #12)
 Any news? :)

http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00894.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug target/37734] Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero

2010-08-14 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-08-15 
01:01 ---
Here is another test example, now with some performance numbers for gcc 4.5.1
on 64-bit Intel Atom:

$ cat fibbonachi.c
/***/
#include stdlib.h

int fib(int n)
{
int sum, previous = -1, result = 1;

n++;
while (--n = 0)
{
sum = result + previous;
previous = result;
result = sum;
}

return result;
}

int main(void)
{
if (fib(10) != 1532868155)
abort();
return 0;
}
/***/

$ gcc -O2 -march=atom -o fibbonachi-O2 fibbonachi.c
$ gcc -Os -march=atom -o fibbonachi-Os fibbonachi.c

$ time ./fibbonachi-O2

real0m3.722s
user0m3.652s
sys 0m0.000s

$ time ./fibbonachi-Os

real0m3.078s
user0m3.044s
sys 0m0.000s


Loop code for -O2 optimizations on x86-64:

  18:   89 d1   mov%edx,%ecx
  1a:   89 c2   mov%eax,%edx
  1c:   8d 7f fflea-0x1(%rdi),%edi
  1f:   8d 04 0alea(%rdx,%rcx,1),%eax
  22:   83 ff ffcmp$0x,%edi
  25:   75 f1   jne18 fib+0x18

Loop code for -Os optimizations on x86-64:

   c:   8d 0c 10lea(%rax,%rdx,1),%ecx
   f:   89 c2   mov%eax,%edx
  11:   89 c8   mov%ecx,%eax
  13:   ff cf   dec%edi
  15:   79 f5   jnsc fib+0xc



Also on ARM, loop code is suboptimal in all cases (just subs + bge could be
used without any need for cmn/cmp):

-O2 on ARM:
  10:   e2433001sub r3, r3, #1
  14:   e0820001add r0, r2, r1
  18:   e3730001cmn r3, #1
  1c:   e1a01002mov r1, r2
  20:   e1a02000mov r2, r0
  24:   1af9bne 10 fib+0x10

-Os on ARM:
   c:   e0831002add r1, r3, r2
  10:   e241sub r0, r0, #1
  14:   e1a02003mov r2, r3
  18:   e1a03001mov r3, r1
  1c:   e350cmp r0, #0
  20:   aaf9bge c fib+0xc

-Os -mthumb on ARM:
   8:   1899addsr1, r3, r2
   a:   3801subsr0, #1
   c:   461amov r2, r3
   e:   460bmov r3, r1
  10:   2800cmp r0, #0
  12:   daf9bge.n   8 fib+0x8


There are still similarities between x86 and ARM here. When using -O2
optimizations, the redundant comparison is performed with -1 constant in both
cases.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

  Known to fail||4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734



[Bug c/45207] The -Os flag generates wrong code for ARM966e-s

2010-08-06 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2010-08-06 
19:36 ---
Do you have any packed structs? I wonder if the problem could be somehow
related to PR45070. But it's hard to say anything until you narrow down the
problem to a smaller testcase.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45207



[Bug c/45176] restrict qualifier is not used in a manually unrolled loop

2010-08-05 Thread siarhei dot siamashka at gmail dot com


--- Comment #4 from siarhei dot siamashka at gmail dot com  2010-08-05 
13:40 ---
Looks like this missed optimization regression was introduced in gcc 4.5

Are any similar fixes possible in 4.5 branch?


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45176



[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-07-28 Thread siarhei dot siamashka at gmail dot com


--- Comment #4 from siarhei dot siamashka at gmail dot com  2010-07-28 
07:16 ---
Could not reproduce the problem with gcc 4.3.5

Disassembly of pr45070.o:

000c next:
   c:   e92d401fpush{r0, r1, r2, r3, r4, lr}
  10:   e89cldm r0, {r2, r3}
  14:   e1a04000mov r4, r0
  18:   e1520003cmp r2, r3
  1c:   b3a03000movlt   r3, #0
  20:   ba14blt 78 next+0x6c
  24:   e5903008ldr r3, [r0, #8]
  28:   e353cmp r3, #0
  2c:   0a0ebeq 6c next+0x60
  30:   e3a03000mov r3, #0
  34:   e5803008str r3, [r0, #8]
  38:   e284add r0, r0, #4
  3c:   ebefbl  0 fetch
  40:   e1a4mov r0, r4
  44:   ebf0bl  c next
  48:   e1a00800lsl r0, r0, #16
  4c:   e1a00840asr r0, r0, #16
  50:   e5cdstrbr0, [sp]
  54:   e1a00420lsr r0, r0, #8
  58:   e5cd0001strbr0, [sp, #1]
  5c:   e1dd30b0ldrhr3, [sp]
  60:   e1cd30bcstrhr3, [sp, #12]
  64:   e1dd30bcldrhr3, [sp, #12]
  68:   ea02b   78 next+0x6c
  6c:   e3a03001mov r3, #1
  70:   e5803008str r3, [r0, #8]
  74:   e59f3010ldr r3, [pc, #16]   ; 8c next+0x80
  78:   e1cd30bcstrhr3, [sp, #12]
  7c:   e5dd300cldrbr3, [sp, #12]
  80:   e5dd000dldrbr0, [sp, #13]
  84:   e1830400orr r0, r3, r0, lsl #8
  88:   e8bd801fpop {r0, r1, r2, r3, r4, pc}

^^^ POP instruction just overwrites return value in r0 register here

  8c:   .word   0x

Looks like the function gets treated as if it were returning 'void'.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

   Keywords||wrong-code
  Known to fail||4.5.0
  Known to work||4.3.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-07-28 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2010-07-28 
07:18 ---
The disassembly chunk from the comment above was from gcc 4.5.0, using '-Os
-match=armv5te' options.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug target/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-07-28 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-07-28 
08:37 ---
'arm_size_return_regs()' returns 2 when generating epilogue for 'next' function
here. And as a result, return value not registered in the mask, causing it to
be clobbered.

Would the following patch be the right fix?

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 162411)
+++ gcc/config/arm/arm.c(working copy)
@@ -13705,7 +13705,7 @@
   !crtl-tail_call_emit)
{
  unsigned long mask;
- mask = (1  (arm_size_return_regs() / 4)) - 1;
+ mask = (1  ((arm_size_return_regs() + 3) / 4)) - 1;
  mask ^= 0xf;
  mask = ~saved_regs_mask;
  reg = 0;


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

  Component|c++ |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug target/45094] [arm] wrong instructions for dword move in some cases

2010-07-27 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-07-27 
20:07 ---
Created an attachment (id=21327)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21327action=view)
simplified testcase

Confirmed with gcc 4.5.0 here. Also tried but could not reproduce the problem
with gcc 4.4 (it just does not seem to be able to emit ldrd/strd instructions
with pre/post increment).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45094



[Bug c++/45070] New: Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-07-25 Thread siarhei dot siamashka at gmail dot com
Compilation:
   arm-unknown-linux-gnueabi-g++ -Os -mcpu=cortex-a8 -o test test.cpp

Expected results:
   ./test
   65534 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Real results (some garbage data):
   ./test
   544 544 544 544 544 544 544 544 544 544 544 544 544 544 544 544

Note: This is not a big practical issue because Qt 4.7 does not use packed
attribute for QChar anymore (a good idea because using this packed attribute
results in a horribly slow code):
http://qt.gitorious.org/qt/qt/commit/1ec8acd77b6c048f5a68887ac7750b0764ade598


-- 
   Summary: Miscompiled c++ class with packed attribute on ARM with
-Os optimizations (Qt 4.6.2)
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: arm-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug c++/45070] Miscompiled c++ class with packed attribute on ARM with -Os optimizations (Qt 4.6.2)

2010-07-25 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2010-07-25 
23:25 ---
Created an attachment (id=21308)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21308action=view)
packed-testcase.cpp


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45070



[Bug target/43698] [4.5/4.6 Regression] Wrong use of ARMv6 REV instruction for endian bytewapping with -Os or -O2 optimizations

2010-07-22 Thread siarhei dot siamashka at gmail dot com


--- Comment #14 from siarhei dot siamashka at gmail dot com  2010-07-22 
20:54 ---
Thanks, this final variant of fix seems to work fine. Can this patch be
backported to 4.5 branch and released with gcc 4.5.1 too?

As I see it, the risk should be minimal because current gcc 4.5 branch is so
broken on armv6/armv7 because of this bug, that it simply can't become any
worse. 

As recently discovered in MeeGo [1], this bug has a high chance of breaking
just about any program which does endian byteswapping. The list of broken
packages includes 'dbus' and 'utils-linux-ng' to name a few, but surely there
are more.

1. http://bugs.meego.com/show_bug.cgi?id=3936


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug target/43698] [4.5/4.6 Regression] Wrong use of ARMv6 REV instruction for endian bytewapping with -Os or -O2 optimizations

2010-07-19 Thread siarhei dot siamashka at gmail dot com


--- Comment #12 from siarhei dot siamashka at gmail dot com  2010-07-19 
13:54 ---
Updated the summary to better describe the problem (which is distro
independent).

The fact that this bug breaks pax-utils tool, which is a vital part of gentoo
packaging system, thus rendering the system unusable is probably not so
interesting in gcc bugzilla context :)


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

Summary|[4.5/4.6 Regression] Invalid|[4.5/4.6 Regression] Wrong
   |code when building gentoo   |use of ARMv6 REV instruction
   |pax-utils-0.1.19 with -Os   |for endian bytewapping with
   |optimizations   |-Os or -O2 optimizations


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug target/43703] Unexpected floating point precision loss due to ARM NEON autovectorization

2010-06-15 Thread siarhei dot siamashka at gmail dot com


--- Comment #4 from siarhei dot siamashka at gmail dot com  2010-06-15 
10:34 ---
Created an attachment (id=20913)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20913action=view)
a fixed testcase

A fixed testcase attached.

The main problem here is that denormals are not handled in a 'civilized' way by
gcc at the moment. They are just silently and unconditionally treated in a
relaxed way, and that might be neither wanted nor expected by the user. And
'readelf -A' shows the following EABI tags for the generated object file, even
not marking it in a special way with the regards to denormals handling:
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703



[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32

2010-06-15 Thread siarhei dot siamashka at gmail dot com


--- Comment #3 from siarhei dot siamashka at gmail dot com  2010-06-15 
20:14 ---
The whole point of submitting this PR was to find an efficient way to use NEON
instructions to operate on any arbitrary scalar floating point values in order
to overcome Cortex-A8 VFP Lite inherent slowness (maybe make it transparent via
wrapping it into a C++ class and use operator overloading).

Using 'vdup_n_f32' to load a single floating point value seems to be better
than 'vset_lane_f32' here because we don't have to deal with uninitialized part
of the register. But 'vdup_n_f32' suffers from the similar performance issues
(VLD1 instruction is not used directly) and results in redundant instructions
emitted when the value is loaded from memory. Optimistically, something like
this should have been used instead of 'vdup_n_f32' in this case:

static inline float32x2_t vdup_n_f32_mem(float *p)
{
float32x2_t result;
asm (vld1.f32 {%P0[]}, [%1, :32] : =w (result) : r (p) : memory);
return result;
}

If wonder if it is possible to check at compile time whether the operand comes
from memory or from a register? Something similar to '__builtin_constant_p'
builtin-function? Or use multiple alternatives feature for inline assembly
constraints to emit either VMOV or VLD1? Anything else?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364



[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32

2010-06-15 Thread siarhei dot siamashka at gmail dot com


--- Comment #4 from siarhei dot siamashka at gmail dot com  2010-06-15 
20:34 ---
(In reply to comment #3)
 Or use multiple alternatives feature for inline assembly constraints to emit 
 either VMOV or VLD1?

Well, this kind of works :) But is very ugly and fragile:

/***/
#include arm_neon.h

/* Override a slow 'vdup_n_f32' intrinsic with something better */

static inline float32x2_t vdup_n_f32_fast(float x)
{
float32x2_t result;
asm (
.set vdup_n_f32_fast_CODE_EMITTED,0\n
.irp regname,r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14\n
.ifeqs \\\regname\, \%1\\n
vdup.32 %P0, %1\n
.set vdup_n_f32_fast_CODE_EMITTED,1\n
.endif\n
.ifeqs \[\\regname, #0]\, \%1\\n
vld1.f32 {%P0[]}, [\\regname, :32]\n
.set vdup_n_f32_fast_CODE_EMITTED,1\n
.endif\n
.endr\n
.if vdup_n_f32_fast_CODE_EMITTED == 0\n
.error \Fixme: icky macros from 'vdup_n_f32_fast' failed\\n
.endif\n
: =w,w (result) : r,Q (x) : memory);
return result;
}

#define vdup_n_f32(x) vdup_n_f32_fast(x)

/* Now let's test it for accessing data in registers */

float neon_add_regs(float a, float b)
{
float32x2_t tmp1, tmp2;
tmp1 = vdup_n_f32(a);
tmp2 = vdup_n_f32(b);
tmp1 = vadd_f32(tmp1, tmp2);
return vget_lane_f32(tmp1, 0);
}

/* ... and in memory */

void neon_add_mem(float * __restrict out,
  float * __restrict a,
  float * __restrict b)
{
float32x2_t tmp1, tmp2;
tmp1 = vdup_n_f32(*a);
tmp2 = vdup_n_f32(*b);
tmp1 = vadd_f32(tmp1, tmp2);
*out = vget_lane_f32(tmp1, 0);
}
/***/

$ objdump -d test.o

 neon_add_mem:
   0:   f4e10c9fvld1.32 {d16[]}, [r1, :32]
   4:   f4e21c9fvld1.32 {d17[]}, [r2, :32]
   8:   f2400da1vadd.f32d16, d16, d17
   c:   f4c0080fvst1.32 {d16[0]}, [r0]
  10:   e12fff1ebx  lr

0014 neon_add_regs:
  14:   ee800b90vdup.32 d16, r0
  18:   ee811b90vdup.32 d17, r1
  1c:   f2400da1vadd.f32d16, d16, d17
  20:   ee100b90vmov.32 r0, d16[0]
  24:   e12fff1ebx  lr


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364



[Bug bootstrap/44469] New: [4.5/4.6 Regression] internal compiler error: in fixup_reorder_chain, at cfglayout.c:797

2010-06-08 Thread siarhei dot siamashka at gmail dot com
Target: armv7l-unknown-linux-gnueabi
Configured with: ../gcc-4_5-branch/configure --prefix=/home/ssvb/gcc-test/bin
--target=armv7l-unknown-linux-gnueabi --enable-languages=c --without-headers
Thread model: posix
gcc version 4.5.1 20100607 (prerelease) (GCC)

$ armv7l-unknown-linux-gnueabi -O2 testcase.i
testcase.i: In function ‘a’:
testcase.i:15:1: internal compiler error: in fixup_reorder_chain, at
cfglayout.c:797
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

This bug prevents bootstrap on ARM when configured with '--disable-checking'
option.
Also see PR42347 comment 28


-- 
   Summary: [4.5/4.6 Regression] internal compiler error: in
fixup_reorder_chain, at cfglayout.c:797
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: armv7l-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44469



[Bug bootstrap/44469] [4.5/4.6 Regression] internal compiler error: in fixup_reorder_chain, at cfglayout.c:797

2010-06-08 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2010-06-08 
14:45 ---
Created an attachment (id=20868)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20868action=view)
testcase.i


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44469



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-06-08 Thread siarhei dot siamashka at gmail dot com


--- Comment #30 from siarhei dot siamashka at gmail dot com  2010-06-08 
14:49 ---
(In reply to comment #29)
 Please file a new PR for that, with preprocessed source and all other relevant
 info for reproduction.

Thanks, filed PR44469


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-18 Thread siarhei dot siamashka at gmail dot com


--- Comment #28 from siarhei dot siamashka at gmail dot com  2010-05-18 
10:09 ---
Thanks, this patch fixes bootstrap for powerpc/powerpc64. But still fails for
arm on all the same gcc_assert() in another place. Should a new bug be filed
about this?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-17 Thread siarhei dot siamashka at gmail dot com


--- Comment #18 from siarhei dot siamashka at gmail dot com  2010-05-17 
07:53 ---
Created an attachment (id=20676)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20676action=view)
powerpc64-broken-unreachable.i

With the attached file (and '-O2 -c' options):
1. powerpc64 crosscompiler running on x86 box - always works fine
2. powerpc64 crosscompiler built with gcc 4.3.4 and running on powerpc64 box -
works fine
3. powerpc64 crosscompiler built with gcc 4.5.0 and running on powerpc64 box -
ICE


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-17 Thread siarhei dot siamashka at gmail dot com


--- Comment #19 from siarhei dot siamashka at gmail dot com  2010-05-17 
09:06 ---
Can anybody knowledgeable verify whether it was commit r151790 (
http://repo.or.cz/w/official-gcc.git/commit/9dbb96fec5e08762f97dda771522283f1fe9710f
) that is causing troubles when __builtin_unreachable() is used in the default
switch case? Unfortunately I could not add Andreas Krebbel to CC for this bug.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-17 Thread siarhei dot siamashka at gmail dot com


--- Comment #21 from siarhei dot siamashka at gmail dot com  2010-05-17 
10:07 ---
(In reply to comment #18)
 Created an attachment (id=20676)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20676action=view) [edit]
 powerpc64-broken-unreachable.i
 
 With the attached file (and '-O2 -c' options):
 1. powerpc64 crosscompiler running on x86 box - always works fine
 2. powerpc64 crosscompiler built with gcc 4.3.4 and running on powerpc64 box -
 works fine

Hmm, that was happening because I compiled it with --disable-checking. When
built with --enable-checking=release, the ICE reproduces just fine on x86 box
with powerpc64-unknown-linux-gnu crosscompiler.

Well, getting ssh access to a fast powerpc64 box really did miracles :) Even
though the problem does not seem to be that complex after all, painfully long
compile times discouraged running more tests earlier, so even a small mistake
easily could (and apparently did) lead to wrong track.

I'm going to check current 4.5 SVN branch now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-17 Thread siarhei dot siamashka at gmail dot com


--- Comment #22 from siarhei dot siamashka at gmail dot com  2010-05-17 
11:31 ---
(In reply to comment #20)
 Perhaps dup of PR44071 that got fixed recently?

The problem is still reproducible with SVN rev 159480 in
'branches/gcc-4_5-branch', so the fix from PR44071 does not seem to help here.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug target/43698] [4.5/4.6 Regression] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations

2010-05-17 Thread siarhei dot siamashka at gmail dot com


--- Comment #10 from siarhei dot siamashka at gmail dot com  2010-05-17 
18:48 ---
Maybe I'm too impatient, but is there anything that prevents this patch from
getting committed to SVN?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-04 Thread siarhei dot siamashka at gmail dot com


--- Comment #16 from siarhei dot siamashka at gmail dot com  2010-05-04 
07:04 ---
So basically what we have is that gcc miscompiles itself somewhere in the code
where one of those ~7000 gcc_assert is used. The next step is to identify which
one of them triggers this bad behaviour (bisecting not in the svn revisions,
but in gcc source files by flipping the use of __builtin_unreachable-based vs.
ordinary gcc_assert implementations) and extract a reduced testcase showing
__builtin_unreachable failure.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5/4.6 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-05-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #15 from siarhei dot siamashka at gmail dot com  2010-05-03 
23:45 ---
As found by Raúl, indeed this regression was introduced in r150091. Reverting
this change in gcc 4.5.0 release resolves the problem.

Apparently the use of __builtin_unreachable() in gcc_assert macro (activated by
!ENABLE_ASSERT_CHECKING) is triggering some kind of wrong-code bug on non
x86/x86-64 platforms (at least arm and powerpc) and causes this bootstrap
failure.

There are some other __builtin_unreachable bugs in gcc bugzilla which are
possibly related.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug c++/41201] #pragma GCC target (sse2) doesn't alter __SSE2__ in C++ (as it does in C)

2010-04-27 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2010-04-27 
22:44 ---
#pragma GCC target|optimize just does not seem to work with C++. Just
stumbled on it trying to narrow down something that looks like wrong-code
generation bug in gcc 4.5.0 when compiling qt4.

Prepending __attribute__((optimize(-O0))) to each function still works, so
no real need to go through the trouble of splitting source files into parts to
bisect the issue.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41201



[Bug target/43724] GCC produces suboptimal ARM NEON code for zero vector assignment

2010-04-12 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2010-04-12 
06:17 ---
Or just vmov.i32 q8, #0 would be better to avoid any potential data
dependency.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43724



[Bug target/43725] New: Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics

2010-04-12 Thread siarhei dot siamashka at gmail dot com
vstrd3, [r0, #200]  ; 0xc8
 150:   e28dd020add sp, sp, #32
 154:   ecbd8b10vpop{d8-d15}
 158:   e12fff1ebx  lr

This shows multiple performance problems:
1. The use of inherently slower VLDR/VSTR instructions instead of VLD1/VST1
2. Failure to make proper use of ARM Cortex-A8 NEON LS/ALU dual issue
3. Unnecessary spills to stack

This is a general issue with NEON intrinsics, causing serious performance
problems for practically any nontrivial code. I guess this itself can be a
meta-bug, with each individual performance issue tracked separately.


-- 
   Summary: Poor instructions selection, scheduling and registers
allocation for ARM NEON intrinsics
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: armv7l-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725



[Bug target/43698] [4.5/4.6 Regression] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations

2010-04-12 Thread siarhei dot siamashka at gmail dot com


--- Comment #8 from siarhei dot siamashka at gmail dot com  2010-04-12 
09:34 ---
(In reply to comment #7)
 Patch submitted here. 
 
 http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00401.html

Thank you. I have been testing it for two days already.

It really helps (in the sense that it is apparently better to have this fix
than not to have). I have bootstrapped the hard vfp system successfully and did
not notice any other problems so far. Btw, miscompilation (of all the same
package) also happens with -O2 optimization settings in some other place, but I
did not try to investigate where exactly it fails.

But I understand that it is just a workaround for the problem which happens
somewhere in the upper layer? If REV instruction did not actually support
conditional execution, then the fix would require actually finding the real
cause.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug target/43364] Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32

2010-04-11 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-04-12 
05:26 ---
(In reply to comment #1)
 mov r3, #0
 vdup.32 d16, r3

Also maybe veor.32 d16, d16, d16 here?

Or drop this NEON register initialization completely because it is a redundant
operation and was not explicitly requested in the original C code?

After all, from IHI0042D_aapcs.pdf:
The FPSCR is the only status register that may be accessed by conforming code.
It is a global register with the following properties:
* The condition code bits (28-31), the cumulative saturation (QC) bit (27) and
the cumulative exception-status bits (0-4) are not preserved across a public
interface.

and from ARM ARM:
Advanced SIMD arithmetic always uses untrapped exception handling

Tracking the cumulative exception-status bits may be tricky in general (using
ununitialized value for NEON arithmetics can set them arbitrarily), but as long
as they are not used in any way in the function itself, they are irrelevant.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364



[Bug target/43698] Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations

2010-04-09 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-04-09 
08:04 ---
(In reply to comment #1)
 2. Does gcc-4.4.3 work?

Yes, gcc-4.4.3 works (it just does not use 'rev' instruction). So it is a
regression in 4.5. Thanks for a very fast response and analysis of the issue.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

  Known to work||4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug target/43703] New: Unexpected floating point precision loss due to ARM NEON autovectorization

2010-04-09 Thread siarhei dot siamashka at gmail dot com
Using gcc-4.5.0-RC-20100406.tar.bz2

//
#include stdio.h

void __attribute__((noinline)) f(float * __restrict c,
 float * __restrict a,
 float * __restrict b)
{
int i;
for (i = 0; i  4; i++) {
c[i] = a[i] * b[i];
}
}

int main()
{
float a[4], b[4], c[4];

a[0] = 1e-40;
b[0] = 1e+38;

f(c, a, b);

printf(c[0]=%f\n, (double)c[0]);
if (c[0]  0.001)
printf(precision problem: c[0] was flushed to zero\n);

return 0;
}
//

# gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O2 -fno-fast-math test.c
# ./a.out
c[0]=0.01

# gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O3 -fno-fast-math test.c
# ./a.out
c[0]=0.00
precision problem: c[0] was flushed to zero


Using -O3 option turns on autovectorization, and the results of operations
involving denormals get flushed to zero. This happens even if no -ffast-math
or any other precision sacrificing options are enabled.


-- 
   Summary: Unexpected floating point precision loss due to ARM NEON
autovectorization
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: armv7l-unknown-linux-gnueabi
  GCC host triplet: armv7l-unknown-linux-gnueabi
GCC target triplet: armv7l-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703



[Bug target/43703] Unexpected floating point precision loss due to ARM NEON autovectorization

2010-04-09 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-04-09 
20:34 ---
(In reply to comment #1)
 This is exacted really.  Denormals are a weird case in general.

Well, denormals may be weird. But what about nan's, inf's and the other IEEE
stuff, which is not supported by NEON unit? The compiler here takes the liberty
of using NEON whenever it likes, and NEON does not fully support IEEE for sure.
After reading man gcc, I had an impression that this should have been
controlled by -ffast-math and the related options.

Floating point performance of VFP Lite unit is a disaster, and using NEON where
appropriate is definitely needed. But IMHO this should be controlled somehow.
For example by selectively using pragma optimize to set -ffast-math option in
the critical parts of code.

Also I don't know how fantastic it is, but having a special data type,
something like 'fast_float' with the relaxed precision requirements and
suitable for use with NEON would be really nice.

 Plus your testcase depends on uninitialized values.

Yes, the testcase is not quite clean, but is easily fixable. Though this should
not cause any problems unless floating point exceptions are enabled, those
extra values are just irrelevant. Should I post an updated testcase?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703



[Bug c/43698] New: Invalid code when building gentoo pax-utils-0.1.19 with -Os optimizations

2010-04-08 Thread siarhei dot siamashka at gmail dot com
Tested with gcc-4.5.0-RC-20100406.tar.bz2

Reduced testcase:

/*/
#include stdio.h
#include stdint.h

char do_reverse_endian = 0;

#  define bswap_32(x) \
x)  0xff00)  24) | \
 (((x)  0x00ff)   8) | \
 (((x)  0xff00)   8) | \
 (((x)  0x00ff)  24))

#define EGET(X) \
(__extension__ ({ \
uint64_t __res; \
if (!do_reverse_endian) {__res = (X); \
} else if (sizeof(X) == 4) { __res = bswap_32((X)); \
} \
__res; \
}))

void __attribute__((noinline)) X(char **phdr, char **data, int *phoff)
{
*phdr = *data + EGET(*phoff);
}

int main()
{
char *phdr;
char *data = (char *)0x40164000;
int phoff = 0x34;
X(phdr, data, phoff);
printf(got %p (expecting 0x40164034)\n, phdr);
return 0;
}
/*/

# gcc -Os -o test test.c
# ./test
got 0x74164000 (expecting 0x40164034)


-- 
   Summary: Invalid code when building gentoo pax-utils-0.1.19 with
-Os optimizations
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: armv7l-unknown-linux-gnueabi
  GCC host triplet: armv7l-unknown-linux-gnueabi
GCC target triplet: armv7l-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43698



[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-04-06 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2010-04-06 
11:01 ---
Long story short. This bootstrap failure seems to be related to
--disable-checking configure option. Reproduced on powerpc-unknown-linux-gnu
and armv7l-unknown-linux-gnueabi. I'm re-running the tests now to be completely
sure.

Maybe it is caused by some bad assert with a side effect?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-04-06 Thread siarhei dot siamashka at gmail dot com


--- Comment #10 from siarhei dot siamashka at gmail dot com  2010-04-06 
14:44 ---
(In reply to comment #8)
 It would be really helpful if someone can explain how to reproduce this with a
 cross-compiler. I will analyze/fix this problem when this is reproducible with
 a cross.

I'm afraid this is not (easily) reproducible with a cross-compiler.

Now I double checked everything and --disable-checking option really does
break bootstrap on ppc and arm. Replacing it with --enable-checking=assert
results in a successful build. It's also interesting that this bug does not
affect x86 or x86-64.

I think a simple script can be used for bisecting and may help to find a
problematic gcc_assert (if it's really a problem). But this all will probably
take at least a few days to run till completion, neither arm nor ppc hardware
that I have is particularly fast...

Avoiding the use of --disable-checking option can be used as a workaround for
now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-04-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2010-04-03 
17:39 ---
Got exactly the same ICE on ARM, bootstrapping gcc:

/var/tmp/portage/sys-devel/gcc-4.5.0_alpha20100401/work/gcc-4.5-20100401/gcc/sched-deps.c:
In function ‘get_dep_weak_1’:
/var/tmp/portage/sys-devel/gcc-4.5.0_alpha20100401/work/gcc-4.5-20100401/gcc/sched-deps.c:3841:1:
internal compiler error: in fixup_reorder_chain, at cfglayout.c:796
Please submit a full bug report,
with preprocessed source if appropriate.
See http://bugs.gentoo.org/ for instructions.

But preprocessed source feeded to gcc-4.5-20100401 crosscompiler does not
result in ICE. I'm going to try bootstrapping again with the patch from PR42509
and report back.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug bootstrap/42347] [4.5 Regression] sched-deps.c:3840:1: internal compiler error: in fixup_reorder_chain, at cfglayout.c:796

2010-04-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-04-03 
21:53 ---
(In reply to comment #5)
 But preprocessed source feeded to gcc-4.5-20100401 crosscompiler does not
 result in ICE. I'm going to try bootstrapping again with the patch from 
 PR42509
 and report back.

This patch alone did not help. Will try to bootstrap SVN head now and do a few
more tests. It can take many hours because native compilation on ARM is
relatively slow.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42347



[Bug target/43469] [4.5 Regression] ICE trying to compile glibc for ARM thumb2

2010-03-31 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-03-31 
22:50 ---
(In reply to comment #4)
 Not exactly a primary or secondary target.  CCing maintainer.

I have been trying to find a complete list of gcc primary and secondary targets
with no luck so far. But at least this this post refers to 'arm-eabi' as a
primary target: http://gcc.gnu.org/ml/gcc/2010-03/msg00175.html

This bug is also reproducible with 'arm-eabi' target triplet. Sorry for stating
the obvious, but arm thumb2 support is getting pretty interesting nowadays and
for example ubuntu is switching to it for the whole distro.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 GCC target triplet|armv7a-unknown-linux-gnueabi|armv7a-unknown-linux-
   ||gnueabi, arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469



[Bug target/43440] Overwriting neon quad register does not clobber all included single registers

2010-03-21 Thread siarhei dot siamashka at gmail dot com


--- Comment #8 from siarhei dot siamashka at gmail dot com  2010-03-21 
10:05 ---
What about just forbidding to use q registers in the inline assembly clobber
list? Is it difficult to do?

As a nice bonus, the existing potentially unsafe inline assembly will fail to
compile, will be spotted and will have to be fixed (forcing the application
developer to manually convert clobber list to use d or s registers). It
will also solve compatibility problems with the older versions of gcc which
still have this bug and still might be in use for a very long time.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440



[Bug c/43469] New: ICE trying to compile glibc for ARM thumb2

2010-03-21 Thread siarhei dot siamashka at gmail dot com
= the exact version of GCC

Freshly checked out SVN trunk for gcc 4.5.0 (r157602)

= the options given when GCC was configured/built;

--target=armv7a-unknown-linux-gnueabi --enable-languages=c --without-headers

= the complete command line that triggers the bug;

armv7a-unknown-linux-gnueabi-gcc -mcpu=cortex-a8 -mthumb -O1 -c localealias.i

= the compiler output (error messages, warnings, etc.);

localealias.c: In function ‘read_alias_file’:
localealias.c:362:1: error: unrecognizable insn:
(insn 863 209 212 7 ../include/ctype.h:30 (set (const:SI (unspec:SI [
(symbol_ref:SI (__libc_tsd_CTYPE_B) [flags 0xe0]
var_decl 0xfff95204b40 __libc_tsd_CTYPE_B)
(const_int 3 [0x3])
(const (unspec:SI [
(const_int 2 [0x2])
] 21))
(const_int 4 [0x4])
] 20))
(reg:SI 10 sl)) -1 (nil))
localealias.c:362:1: internal compiler error: in extract_insn, at recog.c:2097
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


-- 
   Summary: ICE trying to compile glibc for ARM thumb2
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: armv7a-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469



[Bug c/43469] ICE trying to compile glibc for ARM thumb2

2010-03-21 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2010-03-21 
19:05 ---
Created an attachment (id=20152)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20152action=view)
localealias.i


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469



[Bug c/43469] ICE trying to compile glibc for ARM thumb2

2010-03-21 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-03-21 
19:07 ---
works fine with gcc 4.4.3


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43469



[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2010-03-20 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2010-03-20 
08:45 ---
(In reply to comment #4)
 Also, what's the configuration in this case i.e what architecture,
 mode / cpu / fpu ?

Tested on ARM Cortex-A8 hardware, the problematic package built either natively
or crosscompiled, using gcc 4.4.1 without any vendor patches,
'armv4tl-softfloat-linux-gnueabi' was just a build triplet. No other options
were feeded to gcc configure.

 Is there  a smaller testcase which can be looked at ?

As I mentioned in comment 3, the code crashed around the place where it
accesses local variables on stack and where it could not address them directly
(due to immediate offset encoding restrictions), so #4096 deltas were
additionally applied. As such, I'm afraid that reducing this problem to a
smaller testcase may be extremely difficult. I failed to do this so far (I
tried to construct small functions with a huge stack frame exceeding 4K).

 Otherwise this will end up being a WONTFIX bug because we don't have a clear 
 understanding of what / where the failure is.

I had a hope that symptoms description might ring the bell even without a small
testcase provided by me. Or somebody more knowledgeable about gcc internals
could give a hint about what else could be tried to construct such a small
testcase.

I can also try to verify if the crash is still happening with gcc 4.4.3 and
maybe SVN trunk. That's about all I can do at the moment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2010-03-20 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-03-20 
13:55 ---
The crash disappeared when recompiling libXft-2.1.13 library with gcc 4.4.3.
Either it was fixed, or something else changed and it is not getting triggered
anymore. I guess this bug can be closed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2010-03-20 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2010-03-20 
13:58 ---
Resolved, as now it WORKSFORME.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||WORKSFORME


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug target/43440] Overwriting neon quad register does not clobber all included single registers

2010-03-20 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2010-03-21 
03:33 ---
I don't quite understand what's the problem: This patch has the unhappy side
effect of clobbering s0, s1 and s2 if s3 is used because that's the only way we
can indicate that q0 is clobbered by the write to s0.

The proper solution seems to be extremely simple to me and it should do exactly
the same what an application programmer would do to workaround the bug. Just
when initially parsing clobber list do a simple text substitution q0 - d0,
d1. Same for all the other q registers.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440



[Bug target/43440] Overwriting neon quad register does not clobber all included single registers

2010-03-20 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2010-03-21 
03:56 ---
(In reply to comment #4)
 IMO the reasons as described in my email  is another motivation for Neon
 programmers to be using intrinsics rather than inline assembler and to improve
 in general Neon intrinsics.

The problem is that today neon intrinsics have a lot more issues in practice.
The resulting code is way too slow to be usable, especially when gcc thinks
that it is running out of registers and starts spilling variables to memory.
Bug 43118 and bug 43364 are just some very basic examples of performance issues
without looking any deeper. Not having many bugs in bugzilla for NEON
intrinsics means that either they work good enough or nobody seriously uses
them. At least for me it is the latter case.

Autovectorization is even worse than intrinsics.

Inline assembly has a few bugs, but they can be easily workarounded.

Sorry for this rant/offtopic. Just thought that you might be somewhat
interested the opinion of someone from the other side of the fence :-)


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440



[Bug inline-asm/41538] Mixing ARM/NEON intrinsic variables and inline assembly

2010-03-14 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2010-03-14 
12:23 ---
Do you want to force data into specific neon registers because of the
restriction on the neon registers which can be used as scalar operand for
multiplication?

It works for me.

/**/
#include stdint.h
#include arm_neon.h

void f(int16_t *ptr)
{
register int16x4_t mul_consts asm (d0);
int16x4_t data;
int32x4_t tmp;
mul_consts = vset_lane_s16(0x1234, mul_consts, 0);
asm volatile (
vld1.16   {%P1}, [%2]\n
vmull.s16  %q0, %P1, %P3[0]\n
vshrn.s32 %P1, %q0, #15\n
vst1.16   {%P1}, [%2]\n
: =w (tmp), =w (data)
: r (ptr), w (mul_consts)
: memory
);
}
/**/

While not forcing 'mul_consts' variable into 'd0' register fails as expected:
/tmp/ccvzAXVb.s: Assembler messages:
/tmp/ccvzAXVb.s:27: Error: scalar out of range for multiply instruction --
`vmull.s16 q9,d17,d16[0]'

So I don't see any problem here. Tested with gcc 4.3.4 and 4.4.3


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41538



[Bug inline-asm/37188] Missing documentation about the use of ARM NEON quad registers in inline asm arguments

2010-03-14 Thread siarhei dot siamashka at gmail dot com


--- Comment #3 from siarhei dot siamashka at gmail dot com  2010-03-14 
12:44 ---
As of today, gcc seems to be clever enough to deduct whether to use single
precision or double precision VFP register when given w constraint (so P
modifier is not strictly needed). This behavior seems to have been introduced
in 4.3.2 gcc version.

However, trying to force double precision variables into specific VFP registers
breaks it:

//
#include stdio.h
#include stdint.h

inline int32_t double_to_fixed_16_16(double dbl)
{
int32_t fix;
register double tmp asm (d0) = dbl;
asm volatile (
vcvt.s32.f64  %1, %1, #16\n
vmov.f32  %0, %1[0]\n
: =r (fix), +w (tmp)
);
return fix;
}

int main()
{
int32_t i = double_to_fixed_16_16(1.5);
printf(%08X\n, i);
}
//

/tmp/ccYfabov.s: Assembler messages:
/tmp/ccYfabov.s:24: Error: operand size must match register width --
`vcvt.s32.f64 s0,s0,#16'
/tmp/ccYfabov.s:25: Error: only D registers may be indexed -- `vmov.f32
r0,s0[0]'
/tmp/ccYfabov.s:45: Error: operand size must match register width --
`vcvt.s32.f64 s0,s0,#16'
/tmp/ccYfabov.s:46: Error: only D registers may be indexed -- `vmov.f32
r2,s0[0]'

Also NEON quad registers still need explicit 'q' modifier in inline assembly.
Updating the issue summary because NEON quad registers are now more problematic
than VFP doubles.


Thanks for your work on gcc. VFP/NEON support is slowly getting better over
time.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

Summary|Missing documentation about |Missing documentation about
   |the use of double precision |the use of ARM NEON quad
   |floating point registers in |registers in inline asm
   |inline asm arguments (VFP)  |arguments


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188



[Bug c/43364] New: Suboptimal code for the use of ARM NEON intrinsic vset_lane_f32

2010-03-14 Thread siarhei dot siamashka at gmail dot com
/***/
#include arm_neon.h

void neon_add(float * __restrict out, float * __restrict a, float * __restrict
b)
{
float32x2_t tmp1, tmp2;
tmp1 = vset_lane_f32(*a, tmp1, 0);
tmp2 = vset_lane_f32(*b, tmp2, 0);
tmp1 = vadd_f32(tmp1, tmp2);
*out = vget_lane_f32(tmp1, 0);
}
/***/

 neon_add:
   0:   e5913000ldr r3, [r1]
   4:   eddf0b07vldrd16, [pc, #28]  ; 28 neon_add+0x28
   8:   e5922000ldr r2, [r2]
   c:   eddf1b05vldrd17, [pc, #20]  ; 28 neon_add+0x28
  10:   ee003b90vmov.32 d16[0], r3
  14:   ee012b90vmov.32 d17[0], r2
  18:   f2400da1vadd.f32d16, d16, d17
  1c:   f4c0080fvst1.32 {d16[0]}, [r0]
  20:   e12fff1ebx  lr
  24:   e1a0nop (mov r0,r0)


gcc fails to use a single instruction

   vld1.32 {d16[0]}, [r1]

instead of

   0:   e5913000ldr r3, [r1]
   4:   eddf0b07vldrd16, [pc, #28]  ; 28 neon_add+0x28
  10:   ee003b90vmov.32 d16[0], r3


-- 
   Summary: Suboptimal code for the use of ARM NEON intrinsic
vset_lane_f32
   Product: gcc
   Version: 4.4.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: arm-unknown-linux-gnueabi
  GCC host triplet: arm-unknown-linux-gnueabi
GCC target triplet: arm-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43364



[Bug inline-asm/41538] Mixing ARM/NEON intrinsic variables and inline assembly

2010-03-11 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2010-03-11 
20:29 ---
When documentation is missing the needed bits information, these can be
typically extracted from the source code.

The only problem is that these constraints can be changed any time without
notice unless properly documented and exposed to the outside world. There is
bug 37188 about it.


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41538



[Bug middle-end/40887] GCC generates suboptimal code for indirect function calls on ARM

2009-12-21 Thread siarhei dot siamashka at gmail dot com


--- Comment #6 from siarhei dot siamashka at gmail dot com  2009-12-21 
08:27 ---
Created an attachment (id=19356)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19356action=view)
return-address-prediction-bench.c

This looks like a really serious performance issue. Not just indirect call
alone is penalized, but the whole return address prediction stack is busted,
causing return address mispredictions for all the nested calls. The attached
test program demonstrates it.

$ time ./return-address-prediction-bench 1
Indirect call for the topmost function

real0m0.793s
user0m0.789s
sys 0m0.000s

$ time ./return-address-prediction-bench
Indirect call for the leaf function

real0m1.797s
user0m1.789s
sys 0m0.008s

gcc 4.4.2, -O2 -mcpu=cortex-a8

Change of function pointer type void (*f)() - void (* volatile f)() can be
also used to workaround the problem. In this case execution times for both
variants of test are approximately the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887



[Bug middle-end/40887] GCC generates suboptimal code for indirect function calls on ARM

2009-12-21 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2009-12-21 
08:53 ---
(In reply to comment #4)
 I would rather split the load out as a separate insn and allow it to be 
 scheduled separately.

A question just to clarify the status of this issue. Are you waiting for David
(or anybody else) to provide an updated patch with such split load? Are there
no other options available besides either a perfect fix or no fix at all?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887



[Bug inline-asm/42321] New: NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly

2009-12-07 Thread siarhei dot siamashka at gmail dot com
Test program:
//
void f()
{
asm volatile(veor d8, d8, d8 : : :d8,d9,d10,d11,d14,d15);
}
//

$ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c
$ objdump -d test.o

 f:
   0:   ed2d8b08vpush   {d8-d11}
   4:   ed2deb04vpush   {d14-d15}
   8:   f3088118veord8, d8, d8
   c:   ecbd8b08vpop{d8-d11}
  10:   ecbdeb04vpop{d14-d15}
  14:   e12fff1ebx  lr

The order of the last two vpop instructions is messed up.


-- 
   Summary: NEON/VFP registers from inline assembly clobber list are
saved/restored incorrectly
   Product: gcc
   Version: 4.4.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: armv4tl-softfloat-linux-gnueabi
  GCC host triplet: armv4tl-softfloat-linux-gnueabi
GCC target triplet: armv4tl-softfloat-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42321



[Bug inline-asm/42321] NEON/VFP registers from inline assembly clobber list are saved/restored incorrectly

2009-12-07 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2009-12-07 
14:42 ---
Modifying the program to list q-registers in the clobber list provides even
more interesting results:
//
void f()
{
asm volatile(veor d8, d8, d8 : : :q4,q5,q7);
}
//

$ gcc -c -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -O2 test.c
$ objdump -d test.o

 f:
   0:   ed2d8b02vpush   {d8}
   4:   ed2dab02vpush   {d10}
   8:   ed2deb02vpush   {d14}
   c:   f3088118veord8, d8, d8
  10:   ecbd8b02vpop{d8}
  14:   ecbdab02vpop{d10}
  18:   ecbdeb02vpop{d14}
  1c:   e12fff1ebx  lr

Now in addition to the mismatched save/restore order, only lower halves of
q-registers get saved.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42321



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2009-11-03 
20:09 ---
Thanks a lot for checking this. And sorry about the confusion caused by
attributing slowness of the testcase to the microcoded stuff (which turned out
to be not the case) without proper checking this first.

So should this bug be split into two? One about the incorrect warning, and
another one about generating nonoptimal code at -O2 level (extra load and store
operations, which are probably penalized by something like RAW hazard in such a
short loop)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-10-29 Thread siarhei dot siamashka at gmail dot com
/***/
void __attribute__((noinline)) y()
{
asm volatile (# nop\n);
}

void __attribute__((noinline)) x(long c)
{
while (c--)
y();
}

int main()
{
/* Run total 3.2G iterations */
x(16);
x(16);
return 0;
}
/***/

$ gcc -O2 -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-O2 test.c
test.c: In function ‘x’:
test.c:9: warning: emitting microcode insn {ai.|addic.} %0,%1,%2   
[*adddi3_internal3] #38

$ time ./test-O2
real0m56.385s
user0m56.232s
sys 0m0.138s

$ gcc -Os -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-Os test.c
$ time ./test-Os

real0m24.149s
user0m24.086s
sys 0m0.060s


-- 
   Summary: cell microcode instruction is generated for a trivial
loop with -O2 optimizations, hurting performance badly
   Product: gcc
   Version: 4.4.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-10-29 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2009-10-29 
15:21 ---
-O2:

0010 .x:
  10:   2c 23 00 00 cmpdi   r3,0
  14:   7c 08 02 a6 mflrr0
  18:   f8 01 00 10 std r0,16(r1)
  1c:   f8 21 ff 81 stdur1,-128(r1)
  20:   41 82 00 1c beq-3c .x+0x2c
  24:   f8 61 00 70 std r3,112(r1)
  28:   48 00 00 01 bl  28 .x+0x18
  2c:   e8 01 00 70 ld  r0,112(r1)
  30:   35 20 ff ff addic.  r9,r0,-1
  34:   f9 21 00 70 std r9,112(r1)
  38:   40 82 ff f0 bne+28 .x+0x18
  3c:   38 21 00 80 addir1,r1,128
  40:   e8 01 00 10 ld  r0,16(r1)
  44:   7c 08 03 a6 mtlrr0
  48:   4e 80 00 20 blr
  4c:   00 00 00 00 .long 0x0
  50:   00 00 00 01 .long 0x1
  54:   80 00 00 00 lwz r0,0(0)


-Os:

0010 .x:
  10:   fb e1 ff f8 std r31,-8(r1)
  14:   7c 08 02 a6 mflrr0
  18:   f8 01 00 10 std r0,16(r1)
  1c:   7c 7f 1b 78 mr  r31,r3
  20:   f8 21 ff 81 stdur1,-128(r1)
  24:   48 00 00 08 b   2c .x+0x1c
  28:   48 00 00 01 bl  28 .x+0x18
  2c:   2f bf 00 00 cmpdi   cr7,r31,0
  30:   3b ff ff ff addir31,r31,-1
  34:   40 9e ff f4 bne+cr7,28 .x+0x18
  38:   38 21 00 80 addir1,r1,128
  3c:   e8 01 00 10 ld  r0,16(r1)
  40:   eb e1 ff f8 ld  r31,-8(r1)
  44:   7c 08 03 a6 mtlrr0
  48:   4e 80 00 20 blr
  4c:   00 00 00 00 .long 0x0
  50:   00 00 00 01 .long 0x1
  54:   80 01 00 00 lwz r0,0(r1)


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com
   Keywords||missed-optimization
Summary|cell microcode instruction  |cell microcode instruction
   |is generated for a trivial  |(addic.) is generated for a
   |loop with -O2 optimizations,|trivial loop with -O2
   |hurting performance badly   |optimizations, hurting
   ||performance badly


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2009-09-01 Thread siarhei dot siamashka at gmail dot com


--- Comment #3 from siarhei dot siamashka at gmail dot com  2009-09-01 
15:08 ---
It works fine if '-fno-omit-frame-pointer' is removed. I agree that this is
quite a large and convoluted function. Unfortunately I did not manage to reduce
it to something smaller that would still result in broken behaviour. My only
guess is that the stack frame which is bigger than 4K may make some difference.

I have a full linux system compiled with -fno-omit-frame-pointer (to get stack
backtraces and generate callgraphs in oprofile). If anything simpler happens to
to be broken too, I'll try to investigate it and provide additional details.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug c/41196] New: The use of ARM NEON vshll_n_u8 intrinsic results in compile error on valid code

2009-08-31 Thread siarhei dot siamashka at gmail dot com
When using vshll_n_u8 intrinsic, gcc 4.4.1 incorrectly rejects shift operand
having value = 8, claiming that it is out of range.

When using the following test code
/*/
#include arm_neon.h
uint16x8_t test_vshll_n_u8 (uint8x8_t a)
{
return vshll_n_u8(a, 8);
}
/*/



Test with gcc 4.4.1:
# gcc -c -O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer
test.c
test.c: In function ‘test_vshll_n_u8’:
test.c:6: error: constant out of range



It used to work fine with cs2007q3:
# gcc -c -O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer
test.c
# objdump -d test.o

test.o: file format elf32-littlearm

Disassembly of section .text:

 test_vshll_n_u8:
   0:   ec410b17vmovd7, r0, r1
   4:   f3b26307vshll.i8q3, d7, #8
   8:   ec510b16vmovr0, r1, d6
   c:   ec532b17vmovr2, r3, d7
  10:   e12fff1ebx  lr


-- 
   Summary: The use of ARM NEON vshll_n_u8 intrinsic results in
compile error on valid code
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: armv4tl-softfloat-linux-gnueabi
  GCC host triplet: armv4tl-softfloat-linux-gnueabi
GCC target triplet: armv4tl-softfloat-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41196



[Bug c/41074] New: Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2009-08-14 Thread siarhei dot siamashka at gmail dot com
Terminal emulator from xfce4 segfaults if libXft-2.1.13 is compiled with
vanilla gcc 4.4.1 and '-fno-strict-aliasing -g -O2 -fno-omit-frame-pointer'
options.

Program received signal SIGSEGV, Segmentation fault.
0x408599cc in XftGlyphSpecRender (dpy=value optimized out, op=value
optimized out,
src=value optimized out, pub=0x1615f0, dst=31457359, srcx=0, srcy=0,
glyphs=0xbed2b824, nglyphs=12)
at xftrender.c:299
299 elts[nelt].glyphset = font-glyphset;

(gdb) info registers
r0 0x123ae8 1194728
r1 0x0  0
r2 0x0  0
r3 0xbed2b824   3201480740
r4 0x0  0
r5 0xbed2a964   3201476964
r6 0x1615f0 1447408
r7 0x0  0
r8 0x1e0002b31457323
r9 0x0  0
r100xbed2a964   3201476964
r110xbed2b78c   3201480588
r120x74 116
sp 0xbed29900   0xbed29900
lr 0x40859790   1082496912
pc 0x408599cc   0x408599cc XftGlyphSpecRender+732
fps0x0  0
cpsr   0x6010   1610612752

(gdb) disassemble
0x408599a8 XftGlyphSpecRender+696:mla r10, r5, r9, r10
0x408599ac XftGlyphSpecRender+700:sub r5, r11, #4096  ; 0x1000
0x408599b0 XftGlyphSpecRender+704:str r10, [r5, #-3692]
0x408599b4 XftGlyphSpecRender+708:ldr r10, [r5, #-3632]
0x408599b8 XftGlyphSpecRender+712:str r7, [r5, #-3688]
0x408599bc XftGlyphSpecRender+716:add r5, r10, r7, lsl #2
0x408599c0 XftGlyphSpecRender+720:sub r7, r11, #4096  ; 0x1000
0x408599c4 XftGlyphSpecRender+724:ldr r7, [r7, #-3688]
0x408599c8 XftGlyphSpecRender+728:ldr r8, [r6, #124]
0x408599cc XftGlyphSpecRender+732:ldr r10, [r7, #-3632]
0x408599d0 XftGlyphSpecRender+736:str r8, [r10, r7, lsl #2]


-- 
   Summary: Invalid code generation on ARM when using '-fno-omit-
frame-pointer' option
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: armv4tl-softfloat-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug c/41074] Invalid code generation on ARM when using '-fno-omit-frame-pointer' option

2009-08-14 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2009-08-14 
22:48 ---
Created an attachment (id=18370)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18370action=view)
xftrender.i

Preprocessed source. I did not manage to reduce it to a smaller testcase yet.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41074



[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm

2009-02-10 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2009-02-10 
15:11 ---
(In reply to comment #6)
 This is not a bug, but a problem with your source code.
 
 In order to understand why, you need to pre-process the code and look at the
 output:
 
 ...
 void *memset_arm9(void *a, int b, int c)
 {
   return ({ uint8_t *dst = ((uint8_t *)a); uint8_t c = (b); int count = (c);
 uin
 t32_t dummy0, dummy1, dummy2; __asm__ __volatile__ (
 
 Notice that first there is a declaration of a variable c (uint8_t), then in 
 the
 next statement there is a use of c.  This use (which is intended to be of the
 formal parameter passed to memset_arm9 is instead interpreted as the newly
 declared variable c (the uint8 one).
 
 
 Compiling your testcase with -Wshadow gives:
 
 inl.c: In function 'memset_arm9':
 inl.c:66: warning: declaration of 'c' shadows a parameter
 inl.c:64: warning: shadowed declaration is here

Thanks for having a look at this. Indeed, macros are quite dangerous.

Nevertheless, would it make sense to add this -Wshadow option into the set
provided by -Wextra, or even introduce something like -Wreally-all option
specifically for debugging such cases?

Even better (but understandably not realistic) would be to have an option to
show this warning only for the code which was expanded by C preprocessor in
order to reduce the number of false positives.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693



[Bug target/37734] New: Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero

2008-10-03 Thread siarhei dot siamashka at gmail dot com
For the following source:
//
extern void a();

int unrolled_loop_fn(int count)
{
while ((count -= 2) = 0) {
a();
a();
}
if (count  1) {
a();
}
}
//

'gcc -O2 -c test.c' produces the following quite suboptimal code:

 unrolled_loop_fn:
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   56  push   %esi
   4:   8b 75 08mov0x8(%ebp),%esi
   7:   53  push   %ebx
   8:   83 ee 02sub$0x2,%esi
   b:   85 f6   test   %esi,%esi
   d:   89 f0   mov%esi,%eax
   f:   78 1c   js 2d unrolled_loop_fn+0x2d
  11:   89 f3   mov%esi,%ebx
  13:   90  nop
  14:   8d 74 26 00 lea0x0(%esi),%esi
  18:   e8 fc ff ff ff  call   19 unrolled_loop_fn+0x19
  1d:   e8 fc ff ff ff  call   1e unrolled_loop_fn+0x1e
  22:   83 eb 02sub$0x2,%ebx
  25:   79 f1   jns18 unrolled_loop_fn+0x18
  27:   83 e6 01and$0x1,%esi
  2a:   8d 46 felea-0x2(%esi),%eax
  2d:   a8 01   test   $0x1,%al
  2f:   74 05   je 36 unrolled_loop_fn+0x36
  31:   e8 fc ff ff ff  call   32 unrolled_loop_fn+0x32
  36:   5b  pop%ebx
  37:   5e  pop%esi
  38:   5d  pop%ebp
  39:   c3  ret


-- 
   Summary: Missing optimization: gcc fails to reuse flags from
already calculated expression for condition check with
zero
   Product: gcc
   Version: 4.3.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734



[Bug target/37734] Missing optimization: gcc fails to reuse flags from already calculated expression for condition check with zero

2008-10-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2008-10-04 
02:48 ---
For -Os optimization, the generated code is much better:

 unrolled_loop_fn:
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   53  push   %ebx
   4:   83 ec 04sub$0x4,%esp
   7:   8b 5d 08mov0x8(%ebp),%ebx
   a:   eb 0a   jmp16 unrolled_loop_fn+0x16
   c:   e8 fc ff ff ff  call   d unrolled_loop_fn+0xd
  11:   e8 fc ff ff ff  call   12 unrolled_loop_fn+0x12
  16:   83 eb 02sub$0x2,%ebx
  19:   79 f1   jnsc unrolled_loop_fn+0xc
  1b:   80 e3 01and$0x1,%bl
  1e:   74 05   je 25 unrolled_loop_fn+0x25
  20:   e8 fc ff ff ff  call   21 unrolled_loop_fn+0x21
  25:   5a  pop%edx
  26:   5b  pop%ebx
  27:   5d  pop%ebp
  28:   c3  ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734



[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm

2008-09-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #5 from siarhei dot siamashka at gmail dot com  2008-09-03 
09:52 ---
I'm sorry, is anybody investigating this quite serious bug? If nobody has
time/motivation to do this work, would it make sense for me to try fixing it
myself and submit a patch here?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693



[Bug inline-asm/37188] There is no way to specify double precision floating point registers in inline asm arguments (VFP)

2008-09-02 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2008-09-02 
15:50 ---
Well, looks like it is not a missing feature, but just incompleteness of
documentation :)

It is possible to use double precision floating point registers and NEON
128-bit registers in the following way:

--

#include arm_neon.h

int16x8_t test_neon(int16x8_t b, int16x8_t c)
{
int16x8_t a;
asm (
vadd.i32 %q0, %q1, %q2 \n\t
: =w (a)
: w (b), w (c)
);
return a;
}

double test_double(double b, double c)
{
double a;
asm (
faddd %P0, %P1, %P2 \n\t
: =w (a)
: w (b), w (c)
);
return a;
}

--

Disassembly of section .text:

 test_quad:
   0:   e52db004push{fp}; (str fp, [sp, #-4]!)
   4:   e28db000add fp, sp, #0  ; 0x0
   8:   ec410b12vmovd2, r0, r1
   c:   ec432b13vmovd3, r2, r3
  10:   ed9b6b01vldrd6, [fp, #4]
  14:   ed9b7b03vldrd7, [fp, #12]
  18:   f2224846vadd.i32q2, q1, q3
  1c:   ec510b14vmovr0, r1, d4
  20:   ec532b15vmovr2, r3, d5
  24:   e28bd000add sp, fp, #0  ; 0x0
  28:   e8bd0800pop {fp}
  2c:   e12fff1ebx  lr

0030 test_double:
  30:   ec410b15vmovd5, r0, r1
  34:   e52db004push{fp}; (str fp, [sp, #-4]!)
  38:   ec432b16vmovd6, r2, r3
  3c:   e28db000add fp, sp, #0  ; 0x0
  40:   ee357b06faddd   d7, d5, d6
  44:   ec510b17vmovr0, r1, d7
  48:   e28bd000add sp, fp, #0  ; 0x0
  4c:   e8bd0800pop {fp}
  50:   e12fff1ebx  lr


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188



[Bug inline-asm/37188] New: There is no way to specify double precision floating point registers in inline asm arguments (VFP)

2008-08-21 Thread siarhei dot siamashka at gmail dot com
Gcc manual, 5.38.4 Constraints for Particular Machines section:

ARM family—‘config/arm/arm.h’
  fFloating-point register
  wVFP floating-point register
  FOne of the floating-point constants 0.0, 0.5, 1.0, 2.0, 3.0,
4.0, 5.0 or
   10.0
...

Using w constraint allows to use single precision VFP floating point
registers. But this does not work for double precision.


-- 
   Summary: There is no way to specify double precision floating
point registers in inline asm arguments (VFP)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
  GCC host triplet: i486-linux-gnu
GCC target triplet: arm-softfloat-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37188



[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm

2008-05-13 Thread siarhei dot siamashka at gmail dot com


--- Comment #4 from siarhei dot siamashka at gmail dot com  2008-05-13 
12:32 ---
This bug is still present in gcc 4.3


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

  Known to fail||3.3.6 4.0.4 4.1.2 4.2.0
   ||4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693



[Bug c++/32687] Invalid code generation for reading signed negative bitfield value (g++ optimization)

2007-07-11 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2007-07-11 
07:06 ---
Tried this test with gcc 4.2.0, it also works correctly. So looks like the
problem only shows up in gcc 4.1.x


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

  Known to work|3.4.6 4.3.0 |3.4.6 4.2.0 4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32687



[Bug regression/32687] New: Invalid code generation for reading signed negative bitfield value (g++ optimization)

2007-07-09 Thread siarhei dot siamashka at gmail dot com
Reading signed bitfield value when it needs to be extended to larger type (for
example assigning 24-bit value to int) results in zero extending instead of
sign extending when compiled with g++ using optimizations (-O1 or higher).
Compiling the same code with gcc or disabling optimizations makes the problem
disappear.

The following code reproduces the problem:

#include stdio.h

struct TEST_STRUCT
{
int f_8  : 8;
int f_24 : 24;
};

int main ()
{
struct TEST_STRUCT x;
int a = -123;
x.f_24 = a;

printf(a=%d (%08X)\n, (int)a, (int)a);
printf(x.f_24=%d (%08X)\n, (int)x.f_24, (int)x.f_24);

if ((int)x.f_24 != (int)a)
printf(test failed\n);
else
printf(test ok\n);
return 0;
}



Expected correct result:
a=-123 (FF85)
x.f_24=-123 (FF85)
test ok

Faulty result:
a=-123 (FF85)
x.f_24=16777093 (0085)
test failed

It is a regression as gcc 3.4.6 did not have this bug. Also this problem may be
related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32346 and
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30332


-- 
   Summary: Invalid code generation for reading signed negative
bitfield value (g++ optimization)
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32687



[Bug inline-asm/31693] New: Incorrectly assigned registers to operands for ARM inline asm

2007-04-25 Thread siarhei dot siamashka at gmail dot com
In the attached testcase, gcc assigns the same register to several inline asm
named operands resulting in incorrect code generated. Seems like names of
operands do matter ('c' and 'count' are assigned the same register but renaming
'c' operand to 'xxc' for example makes this bug disappear).


-- 
   Summary: Incorrectly assigned registers to operands for ARM
inline asm
   Product: gcc
   Version: 4.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: siarhei dot siamashka at gmail dot com
GCC target triplet: arm-softfloat-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693



[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm

2007-04-25 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2007-04-25 
07:26 ---
Created an attachment (id=13436)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13436action=view)
testcase for this bug

Testcase attached


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693



[Bug inline-asm/31693] Incorrectly assigned registers to operands for ARM inline asm

2007-04-25 Thread siarhei dot siamashka at gmail dot com


--- Comment #2 from siarhei dot siamashka at gmail dot com  2007-04-25 
07:28 ---
This may be related to #31386


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693