[Bug lto/49237] error with -flto: 'f' causes a section type conflict

2011-12-16 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49237

--- Comment #2 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 
2011-12-16 19:28:36 UTC ---
I also can't reproduce it anymore.


[Bug rtl-optimization/51014] [4.7 Regression] ICE: in apply_opt_in_copies, at loop-unroll.c:2283 with -O2 -g -funroll-loops

2011-11-08 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51014

Wouter Vermaelen wouter.vermaelen at scarlet dot be changed:

   What|Removed |Added

 CC||wouter.vermaelen at scarlet
   ||dot be

--- Comment #1 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 
2011-11-08 13:23:32 UTC ---
I hit the same ICE, with the same required compiler flags. Here's my reduced
testcase:

struct S {
~S() { delete p; }
int* p;
};
void f(S* b, S* e) {
for (/**/; b != e; ++b) {
b-~S();
}
}


[Bug tree-optimization/50417] New: regression: memcpy with known alignment

2011-09-15 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417

 Bug #: 50417
   Summary: regression: memcpy with known alignment
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


Consider these functions:

void copy1(char* d, const char* s) {
memcpy(d, s, 256);
}
void copy2(short* d, const short* s) {
memcpy(d, s, 256);
}
void copy3(int* d, const int* s) {
memcpy(d, s, 256);
}
void copy4(long* d, const long* s) {
memcpy(d, s, 256);
}

g++-4.5.2 is able to generate better code for the later functions. But when I
test with a recent snapshot (SVN revision 178875 on linux x86_64) it generates
the same code for all versions (same as copy1()).


[Bug tree-optimization/50385] New: missed-optimization: jump to __builtin_unreachable() not removed

2011-09-13 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50385

 Bug #: 50385
   Summary: missed-optimization: jump to __builtin_unreachable()
not removed
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


I'm not sure, but this issue might be the same as bug 49054 (if so, feel free
to delete this one).


#include vector

struct S { int a, b; };
std::vectorS v;

int search_1(int a) {
for (auto it = v.begin(); /**/; ++it)
if (it-a == a) return it-b;
}
int search_2(int a) {
for (auto e : v)
if (e.a == a) return e.b;
__builtin_unreachable();
}


I expected to see the same generated code for both functions. Instead the 2nd
one still contains some useless comparisons and jumps past the end of the
function. Since such a (conditional) jump is anyway undefined behavior it can
as well be removed (including the instructions required to calculate the
condition).

Tested with SVN revision 178775 (20110912) on Linux x86_64.


[Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t)

2011-09-09 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

 Bug #: 50339
   Summary: suboptimal register allocation for abs(__int128_t)
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


This function:

__int128_t abs128(__int128_t a)
{
return (a = 0) ? a : -a;
}

Currently generates the following code (with -O3):
(linux x86_64, g++-4.7.0, SVN revision 178692)

   49 89 f9mov%rdi,%r9
   48 89 f7mov%rsi,%rdi
   49 89 f2mov%rsi,%r10
   48 c1 ff 3f sar$0x3f,%rdi
   48 89 f8mov%rdi,%rax
   48 89 famov%rdi,%rdx
   4c 31 c8xor%r9,%rax
   4c 31 d2xor%r10,%rdx
   48 29 f8sub%rdi,%rax
   48 19 fasbb%rdi,%rdx
   c3  retq   

But the following has 2 'mov' instructions less:

   48 89 f8mov%rdi,%rax
   48 89 f2mov%rsi,%rdx
   48 89 d1mov%rdx,%rcx
   48 c1 f9 3f sar$0x3f,%rcx
   48 31 c8xor%rcx,%rax
   48 31 caxor%rcx,%rdx
   48 29 c8sub%rcx,%rax
   48 19 casbb%rcx,%rdx
   c3  retq


[Bug tree-optimization/49552] missed optimization: test for zero remainder after division by a constant.

2011-06-28 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49552

--- Comment #2 from Wouter Vermaelen wouter.vermaelen at scarlet dot be 
2011-06-28 12:22:11 UTC ---
 Confirmed.  Is this possible for all constant moduli?

It is. I recommend you get a copy of the book I mentioned before. The theory
behind the transformation is much better explained there than I could ever do
here. But I'll try to give a recipe to construct the routine for a specific
constant:
(all examples are for 32-bit, but it should be easy enough to generalize)

There are 3 different cases:

(x % C) == 0

* 'x' is unsigned, 'C' is odd:

return (x * Cinv) = (0x / C);

Where Cinv is the multiplicative inverse of C  (C * Cinv = 1 (modulo pow(2,
32))). Cinv is the same 'magic number' as is used to optimize exact-division
(division where it's known that the remainder will be zero).



* 'x' is unsigned, 'C' is even:

Split 'C' in an odd factor and a power of two.

C = Codd * Ceven
where Ceven = pow(2, k)

Now we test that 'x' is both divisible by 'Codd' and 'Ceven'.

return !(x  (Ceven - 1))  ((x * Codd_inv) = (0x / Codd))

When a rotate-right instruction is available, the expression above can be
rewritten so that it only requires one test:

return rotateRight(x * Codd_inv, k) = (0x / C); // unsigned
comparison



* 'x' is signed, (C can be odd or even)

(I admit, I don't fully understand the theory behind this transformation, so
I'll only give the final result).

constexpr unsigned A = (0x7fff / Codd)  -(1  k);
constexpr unsigned B = k ? (A  (k - 1)) : (A  1);
return rotateRight((x * Codd_inv) + A, k) = B;   // unsigned comparison


[Bug tree-optimization/49552] New: missed optimization: test for zero remainder after division by a constant.

2011-06-27 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49552

   Summary: missed optimization: test for zero remainder after
division by a constant.
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


Just like there are tricks to transform a division by a constant into a
multiplication and some shifts, there are also tricks to test if the remainder
of some division by a constant will be equal to zero.

Some examples:

bool is_mod3_zero(unsigned x)
{
// equivalent to: return (x % 3) == 0;
return (x * 0xaaab) = (0x / 3);
}

bool is_mod28_zero(unsigned x)
{
// equivalent to: return (x % 28) == 0;
// return !(x  3)  ((x * 0xb6db6db7) = (0x / 7));
return rotateRight(x * 0xb6db6db7, 2) = (0x / 28);
}

bool is_signed_mod28_zero(int x)
{
// equivalent to: return (x % 28) == 0;
const unsigned c = (0x7fff / 7)  -(1  2);
unsigned q = rotateRight((x * 0xb6db6db7) + c, 2);
return q = (c  (2 - 1));
}


I found this trick in the book Hacker's delight, chapter 10-16 Test for Zero
Remainder after Division by a Constant. The book also explains the theory
behind this transformation.

It would be nice if gcc could automatically perform this optimization.



Bonus:

bool is_mod3_one(unsigned x)
{
// equivalent to: return (x % 3) == 1;
// only correct if 'x + 2' does not overflow
//(sometimes this can be derived from VRP)
return ((x + (3 - 1)) * 0xaaab) = (0x / 3);
}


[Bug lto/49237] New: error with -flto: 'f' causes a section type conflict

2011-05-30 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49237

   Summary: error with -flto:'f' causes a section type
conflict
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


 cat bug.cc
struct Bar;
struct Base1 {
virtual ~Base1();
};
templatetypename T struct Base2 {
virtual void f(T) = 0;
};
templatetypename struct Foo : Base1, Base2Bar {
virtual void f(Bar) {}
};
template struct FooBar;


 g++-snapshot --version
g++-snapshot (GCC) 4.7.0 20110530 (experimental)

 g++-snapshot bug.cc -c -flto
 g++-snapshot bug.o -flto
In file included from bug.cc:8:0,
 from :14:
bug.cc: In member function ‘f’:
bug.cc:9:15: error: f causes a section type conflict


Without the '-flto' option it works as expected.
This is on linux x86_64 (though I don't think this matters).


[Bug tree-optimization/49203] New: missed-optimization: useless expressions not moved out of loop

2011-05-27 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49203

   Summary: missed-optimization: useless expressions not moved out
of loop
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


Hi all,

Below is (a simplified version of) some real code I recently
encountered. The stores to the 'output' array are written in the inner
loop, but the intention was probably to have them in the outer loop.

Gcc is able to 'correct' this programming mistake, but only partly:
the stores itself are moved to the outer loop, but the instructions that
calculate those values remain in the inner loop.

For this particular example, the best solution is of course to fix the
C code. But maybe this missed-optimization can also occur in other,
more valid, contexts.

Below I've included the generated x86_64 code for this example by
recent versions of both gcc and llvm.

///

unsigned char input[100];
unsigned char output[100];

void f() {
for (int i = 0; i  32; i += 4) {
unsigned tmp = 0;
for (int j = 0; j  16; ++j) {
tmp = (tmp  2) | (input[i + j]  0x03);
output[i + 0] = (tmp  24)  0xFF;
output[i + 1] = (tmp  16)  0xFF;
output[i + 2] = (tmp   8)  0xFF;
output[i + 3] = (tmp   0)  0xFF;
}
}
}

///

g++ (GCC) 4.7.0 20110527 (experimental)
g++ -O2 -S

movl$output, %r10d
movq%r10, %r9
.p2align 4,,10
.p2align 3
.L2:
movl%r9d, %esi
xorl%edx, %edx
xorl%eax, %eax
subl%r10d, %esi
.p2align 4,,10
.p2align 3
.L3:
leal0(,%rax,4), %ecx
leal(%rdx,%rsi), %eax
addl$1, %edx
cltq
movzbl  input(%rax), %eax
andl$3, %eax
orl %ecx, %eax
movl%eax, %r8d
movl%eax, %edi
movl%eax, %ecx
shrl$24, %r8d
shrl$16, %edi
shrl$8, %ecx
cmpl$16, %edx
jne .L3
movb%r8b, (%r9)
movb%dil, 1(%r9)
movb%cl, 2(%r9)
movb%al, 3(%r9)
addq$4, %r9
cmpq$output+32, %r9
jne .L2
rep
ret

///

clang version 3.0 (http://llvm.org/git/clang.git
855f41963e545172a935d07b4713d079e258a207)
clang++ -O2 -S

# BB#0: # %entry
xorl%eax, %eax
.align  16, 0x90
.LBB0_1:# %for.cond4.preheader
# =This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
xorl%esi, %esi
movq$-16, %rdx
.align  16, 0x90
.LBB0_2:# %for.body7
#   Parent Loop BB0_1 Depth=1
# =  This Inner Loop Header: Depth=2
movl%esi, %ecx
movzbl  input+16(%rdx,%rax), %edi
andl$3, %edi
leal(,%rcx,4), %esi
orl %edi, %esi
incq%rdx
jne .LBB0_2
# BB#3: # %for.inc44
#   in Loop: Header=BB0_1 Depth=1
movb%sil, output+3(%rax)
movl%ecx, %edx
shrl$6, %edx
movb%dl, output+2(%rax)
movl%ecx, %edx
shrl$14, %edx
movb%dl, output+1(%rax)
shrl$22, %ecx
movb%cl, output(%rax)
addq$4, %rax
cmpq$32, %rax
jne .LBB0_1
# BB#4: # %for.end47
ret


[Bug tree-optimization/48764] New: wrong-code bug in gcc-4.5.x, related to __restrict

2011-04-25 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48764

   Summary: wrong-code bug in gcc-4.5.x, related to __restrict
   Product: gcc
   Version: 4.5.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


I had originally posted this on gcc-help because I wasn't sure it was an actual
compiler bug or undefined behavior. Ian Lance Taylor replied that he didn't see
any undefined behavior. So I'm reporting it now as a bug.

Here's the original message:
   http://gcc.gnu.org/ml/gcc-help/2011-04/msg00476.html
But I'll repeat it below:





Hi all,

I believe I found a wrong-code bug. The problem triggers when using
gcc-4.5.1, 4.5.2 or 4.5.3, but not when using 4.4.5 or 4.7.0 (snapshot
20110419). It also only triggers with certain optimization levels/flags.
I wonder if this is a known problem and already fixed in 4.7.0, or that
the problem still exists but for some reason doesn't trigger in 4.7.0
(I couldn't easily find something in bugzilla).


Below is a reduced test-case that shows the problem. I tried, but I
couldn't get it smaller than these 4 files (combined about 60 lines).


While reducing this problem I realized that it *might* not be a compiler
bug, but undefined behaviour with the usage of __restrict in
Buffer::read(). What I wanted to express there is that the memory write
done by memcpy() can never overwrite the member variable 'p'. At the
moment I still believe it's a compiler bug, but I'm not 100% sure
anymore.


So is this a compiler bug or undefined behavior in my program? In case
of the latter I would appreciate if someone could explain what the
problem is and maybe suggest a way to fix it.


Thanks.

Wouter


BTW: The code for gcc-4.7.0 is correct but contains some useless extra
instructions (which I tried to avoid with __restrict). I'd also appreciate
hints on how to improve the generate code.
I do realize that the code in this reduced test-case may look a bit silly
and that suggestions to optimize the code may be hard because of this.






/// FooBar.hh /

struct Loader;
struct FooBar {
void load(Loader l);
char c1, c2;
};




/// Loader.hh /

#include cstring

struct Buffer {
Buffer(const char* data) : p(data) {}
void read(void* __restrict out) __restrict {
memcpy(out, p, 1);
++p;
}
const char* p;
};


templatetypename Derived struct Base {
void load2(char t) {
Derived self = static_castDerived(*this);
self.load1(t);
}
int dummy;
};


struct Loader : BaseLoader {
Loader(const char* data) : buffer(data) {}
void load1(char t) { buffer.read(t); }
Buffer buffer;
};




/// FooBar.cc /

#include FooBar.hh
#include Loader.hh
#include cstdio


void FooBar::load(Loader l)
{
l.load1(c1);
//printf(This print hides the bug\n);
l.load2(c2);
}




/// main.cc ///

#include FooBar.hh
#include Loader.hh
#include cstdio


int main()
{
char data[2] = { 3, 5 };
Loader loader(data);
FooBar fb;
fb.load(loader);


if ((fb.c1 == 3)  (fb.c2 == 5)) {
printf(Ok\n);
} else {
printf(Wrong!\n);
}
}




 g++ --version
g++ (GCC) 4.5.3 20110423 (prerelease)


 uname -a
Linux argon 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64
GNU/Linux


 g++ -O3 FooBar.cc -c
 g++ -O3 main.cc -c
 g++ -o bug FooBar.o main.o


 ./bug
Wrong!






 objdump -d FooBar.o  (gcc-4.5.3 prerelease)
  mov0x8(%rsi),%rdx
  lea0x8(%rsi),%rax
  movzbl (%rdx),%edx
  mov%dl,(%rdi)
  mov0x8(%rsi),%rdx  -- WRONG: still uses original value of Buffer::p
  addq   $0x1,(%rax) -- it is only increased here (for the 1st time)
  movzbl (%rdx),%edx
  mov%dl,0x1(%rdi)
  addq   $0x1,(%rax)
  retq



 objdump -d FooBar.o  (gcc-4.7.0 20110419)
  mov0x8(%rsi),%rax
  movzbl (%rax),%edx
  mov%dl,(%rdi)
  lea0x1(%rax),%rdx   -- correct, but I know this is not
  mov%rdx,0x8(%rsi)   -- required for my application
  movzbl 0x1(%rax),%edx
  add$0x2,%rax
  mov%dl,0x1(%rdi)
  mov%rax,0x8(%rsi)
  retq


[Bug lto/48354] New: internal compiler error: in splice_child_die, at dwarf2out.c:8064

2011-03-30 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48354

   Summary: internal compiler error: in splice_child_die, at
dwarf2out.c:8064
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


I got this ICE when trying to compile the openMSX package using -flto. I
managed to reduce it to this:

 cat bug.ii
templatetypename T struct Identity { typedef T type; };
struct S {
typedef void (S::*FP)();
FP fp;
};
void g();
void f() {
typedef IdentityS::type Dummy;
S s;
g();
}

 g++-snapshot -r -nostdlib -g -flto bug.ii
...
bug.ii:11:1: internal compiler error: in splice_child_die, at dwarf2out.c:8064
...

I'm using revision trunk@171714.


This may or may not be a duplicate of bug 46135. Though the testcase looks very
different.


[Bug tree-optimization/46780] New: -fgraphite-identity ICE in refs_may_alias_p_1, at tree-ssa-alias.c:1081

2010-12-03 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46780

   Summary: -fgraphite-identity  ICE in refs_may_alias_p_1, at
tree-ssa-alias.c:1081
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


 cat bug.ii
extern C { double cos(double x); }

static int outbuf[8][8];

int main() {
double buf1[9];
double* buf1_1 = buf1[1];
for (int i = 0; i  8; ++i) {
buf1_1[i] *= cos(i);
}

int buf2[64];
for (int i = 0; i  8; ++i) {
int* buf2_i = buf2[i];
for (int j = 0; j  8; ++j) {
outbuf[i][j] = buf2_i[8 * j];
}
}
}


 g++ -O2 -fgraphite-identity bug.ii
bug.ii: In function ‘int main()’:
bug.ii:19:1: internal compiler error: in refs_may_alias_p_1, at
tree-ssa-alias.c:1081


I'm using SVN revision tr...@167414 on linux x86_64.


[Bug 45764] (tree-optimization) New: wrong code -O2 vs -O3 (problem in vectorizer???)

2010-09-23 Thread wouter.vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45764

   Summary: wrong code  -O2 vs -O3(problem in vectorizer???)
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: wouter.vermae...@scarlet.be


 cat bug.cc
int result[64][16];

int main()
{
double dbuf[1000] = {0.0};
int ibuf[900];

double d1 = 0.0;
double d2 = 0.0;
for (int i = 0; i  900; ++i) {
ibuf[i] = int(d2 - d1);
d1 += dbuf[i];
d2 += dbuf[i + 64];
}

for (int i = 0; i  64; ++i) {
for (int j = 0; j  8; ++j) {
result[i][ j] = ibuf[64 - i + 64 * j];
result[i][15 - j] = ibuf[ i + 64 * j];
}
}
}

 g++ -O2 bug.cc
 ./a.out
 g++ -O3 bug.cc
 ./a.out
Segmentation fault (core dumped)

I'm using SVN revision 164570 on linux_x86_64.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.