[Bug c++/77896] Object vtable lookups are not hoisted out of loops

2016-10-08 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896

--- Comment #5 from Ryan Johnson  ---
In an ideal world, C++ would disallow such behavior by default, with a function
attribute of some kind that flags cases where a type change might occur (kind
of like how c++11 assumes `nothrow()` for destructors unless you specify
otherwise). Not only would it allow better optimizations, it would be safer,
because the compiler could then detect and forbid (or at least warn about)
problematic usage of such a class (like stack-allocating it, or calling a
type-change function when cast as the type that's about to change).

[Bug c++/77896] Object vtable lookups are not hoisted out of loops

2016-10-08 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896

--- Comment #4 from Ryan Johnson  ---
Yikes. That explains it, all right. 

I never would have thought of an object destroying itself and changing its type
with placement new... I guess it must be subject to the same restrictions as
`delete this` [1], because things don't turn out well if the compiler thinks it
knows the type of the object:

 alter.cpp ===
#include 
#include 

struct AlterEgo {
   virtual ~AlterEgo() { }
   virtual void toggle()=0;
};

struct Jekyl : AlterEgo {
   ~Jekyl() { puts("~Jekyl"); }
   void toggle();
};

struct Hyde : AlterEgo {
   ~Hyde() { puts("~Hyde"); }
   void toggle();
};

void Jekyl::toggle()
{ this->~AlterEgo(); new (this) Hyde; }
void Hyde::toggle()
{ this->~AlterEgo(); new (this) Jekyl; }

void whatami(AlterEgo* x)
{
   printf("Jekyl? %p\n", dynamic_cast(x));
   x->toggle();
   printf("Jekyl? %p\n", dynamic_cast(x));
}

int main()
{
   puts("\nWorks ok-ish:");
   Jekyl* x = new Jekyl;
   whatami(x);
   puts("\nJekyl?");
   delete x;

   puts("\nBad idea:");
   Jekyl j;
   j.toggle();
   j.toggle();
   whatami();

   puts("\nJekyl?");
}


$ g++ -Wall alter.cpp && ./a.out

Works ok-ish:
Jekyl? 0x6000104c0
~Jekyl
Jekyl? 0x0

Jekyl?
~Hyde

Bad idea:
~Jekyl
~Hyde
Jekyl? 0x0
~Hyde
Jekyl? 0xcbf0

Jekyl?
~Jekyl

[1] https://isocpp.org/wiki/faq/freestore-mgmt#delete-this

[Bug c++/77896] Object vtable lookups are not hoisted out of loops

2016-10-07 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896

--- Comment #1 from Ryan Johnson  ---
It appears that multiple calls to different virtual functions of the same
object are not optimized, either (each performs the same load-load-jump
operation).

[Bug c++/77896] New: Object vtable lookups are not hoisted out of loops

2016-10-07 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77896

Bug ID: 77896
   Summary: Object vtable lookups are not hoisted out of loops
   Product: gcc
   Version: 6.2.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com
  Target Milestone: ---

C++ virtual function calls normally require two memory loads followed by an
indirect jump: one load fetches the vtable from the object, another to fetch
the function address from the vtable, and the indirect call to invoke the
function. 

Given that an object's vtable is fixed over its lifetime, and the contents of a
given vtable are compile-time constant, I would expect the vtable lookups to be
hoisted out of loops when appropriate. For example:

 foo.cpp ===
struct Foo { virtual void frob(int i)=0; };
void frobN(Foo* f, int n) {
   for (int i=0; i < n; i++)
  f->frob(i);
}


Compiles at -O2 to substantially the same x86 assembly code for gcc-4.9,
gcc-5.2 and gcc-6.2:

_Z5frobNP3Fooi:
testl   %esi, %esi
jle .L10
pushq   %r12
movl%esi, %r12d
pushq   %rbp
movq%rdi, %rbp
pushq   %rbx
xorl%ebx, %ebx
.L5:
movq0(%rbp), %rax
movl%ebx, %esi
addl$1, %ebx
movq%rbp, %rdi
call*(%rax)
cmpl%ebx, %r12d
jne .L5
popq%rbx
popq%rbp
popq%r12
.L10:
rep ret

I would have expected to see something more like this (obtained using the bound
member function extension):

_Z5frobNP3Fooi:
.LFB12:
pushq   %r13
pushq   %r12
pushq   %rbp
pushq   %rbx
subq$8, %rsp
movq(%rdi), %rax
testl   %esi, %esi
movq(%rax), %r13
jle .L1
movq%rdi, %r12
movl%esi, %ebp
xorl%ebx, %ebx
.L5:
movl%ebx, %esi
addl$1, %ebx
movq%r12, %rdi
call*%r13
cmpl%ebx, %ebp
jne .L5
.L1:
addq$8, %rsp
popq%rbx
popq%rbp
popq%r12
popq%r13
ret

Altering the test case to trigger speculative devirtualization as follows:

 bug2.cpp ===
#include 
struct Foo { virtual void frob(int i)=0; };
void frobN(Foo* f, int n)
{
   for (int i=0; i < n; i++)
  f->frob(i);
}
struct Bar : Foo { 
   void frob(int i) { printf("Bar:%d\n", i); }
};
int main()
{
   Bar b;
   frobN(, 10);
}
=

Shows that even the speculative devirtualization is stuck inside the loop body:

_Z5frobNP3Fooi:
testl   %esi, %esi
jle .L13
pushq   %r12
movl%esi, %r12d
pushq   %rbp
movq%rdi, %rbp
pushq   %rbx
xorl%ebx, %ebx
jmp .L8
.L16:
xorl%eax, %eax
movl$.LC0, %edi
addl$1, %ebx
callprintf
cmpl%ebx, %r12d
je  .L15
.L8:
movq0(%rbp), %rax
movl%ebx, %esi
movq(%rax), %rax
cmpq$_ZN3Bar4frobEi, %rax
je  .L16
addl$1, %ebx
movq%rbp, %rdi
call*%rax
cmpl%ebx, %r12d
jne .L8
.L15:
popq%rbx
popq%rbp
popq%r12
.L13:
rep ret

If the vtable lookup could be hoisted, the speculative de-virt could become
very powerful by replicating the loop, something like this:

_Z5frobNP3Fooi:
testl   %esi, %esi
jle .L10
pushq   %r12
movl%esi, %r12d
pushq   %rbp
movq%rdi, %rbp
pushq   %rbx
xorl%ebx, %ebx
movq0(%rbp), %rax
pushq   %r12
movq(%rax), %r13
cmpq$_ZN3Bar4frobEi, %r13
je  .L16
.L5:
movl%ebx, %esi
addl$1, %ebx
movq%rbp, %rdi
call*%r13
cmpl%ebx, %r12d
jne .L5
jmp .L10
.L16:
xorl%eax, %eax
movl$.LC0, %edi
movl%ebx, %esi
addl$1, %ebx
callprintf
cmpl%ebx, %r12d
jne .L16
popq%r13
.L10:
popq%rbx
popq%rbp
popq%r12
rep ret

[Bug c++/68859] New: Add a less strict/smarter version of -Wreorder

2015-12-11 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68859

Bug ID: 68859
   Summary: Add a less strict/smarter version of -Wreorder
   Product: gcc
   Version: 5.2.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com
  Target Milestone: ---

I am working with a large legacy code base that triggers a huge number of
warnings when compiled with -Wreorder (or -Wall, which enables it). 

I am not making any excuses for that code, but still it would be nice to have a
weaker-but-smarter variant of -Wreorder that only triggers when the
initialization order actually matters. For example, in the .cpp below, the
warning triggered by struct `definitely_bad` is helpful and identifies a real
bug. The warning for struct `not_a_problem`, on the other hand, is
significantly less interesting because each member is initialized completely
independently of the others (***). The middle example is evil because it calls
a member function of a partially constructed object, so I don't think it much
matters whether this smarter warning would trip or not in that case.

= example.cpp ==
struct definitely_bad {
int val;
int *ptr;
definitely_bad(int *p) : ptr(p), val(*ptr) { }
};
struct bad_for_a_different_reason {
int val;
int *ptr;
bad_for_a_different_reason(int *p)
: ptr(p), val(do_something()) { }
int do_something();
};
struct not_a_problem {
int val;
int *ptr;
not_a_problem(int* p, int v) : ptr(p), val(v) { }
};


Basically, I could imagine building a dependency graph that tracks which member
initializers depend on other members, and then trigger the warning only if the
true initialization order is not a valid partial order in that graph.

(***) I realize that members could have constructors with global side effects
(e.g. calls to printf or changes to global variables). I that case changing the
initialization order would still be observable, but this seems like a rare
enough case that the proposed warning could ignore it (leaving the existing
-Wreorder to flag it if the user desires).

[Bug c++/68859] Add a less strict/smarter version of -Wreorder

2015-12-11 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68859

--- Comment #1 from Ryan Johnson  ---
(I would be happy to do some legwork on this if somebody is willing to send a
few pointers by PM. I know the code in gcc/cp/init.c, particularly functions
`perform_member_init` and `sort_mem_initializers` are relevant, but would need
some help figuring out how to traverse trees and pick out uses of member
variables from other members' initializers)

[Bug c++/67866] New: False positive -Wshift-count-overflow on template code that checks for shift count overflow

2015-10-06 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67866

Bug ID: 67866
   Summary: False positive -Wshift-count-overflow on template code
that checks for shift count overflow
   Product: gcc
   Version: 5.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com
  Target Milestone: ---

The following code snippet evokes an obviously unhelpful warning:

 bug.cpp 
template 
unsigned long int m()
{
unsigned long int max_value = 1;
if (M < 64)
max_value = (max_value << M) - 1;
else
max_value = ~(max_value - 1);

return max_value;
}

int main()
{
m<64>();
}
=

Output is:
g++-5.2 -Wshift-count-overflow bug.cpp -O3
bug.cpp: In instantiation of 'long unsigned int m() [with int M = 64]':
bug.cpp:14:11:   required from here
bug.cpp:6:32: warning: left shift count >= width of type
[-Wshift-count-overflow]
 max_value = (max_value << M) - 1;
^


[Bug c++/61991] Destructors not always called for statically initialized thread_local objects

2015-07-10 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991

--- Comment #1 from Ryan Johnson scovich at gmail dot com ---
C++14 (N3652 [1]) specifically alters the Standard to state that a thread_local
object with static or constexpr initialization may have a non-trivial
destructor (implying that such a destructor should actually run):

 Variables with static storage duration (3.7.1) or thread storage duration
 (3.7.2) shall be zero-initialized (8.5) before any other initialization takes
 place. A constant initializer for an object o is an expression that is a
 constant expression, except that it may also invoke constexpr constructors for
 o and its subobjects even if those objects are of non-literal class types
 [ Note: such a class may have a non-trivial destructor ].

[1] https://isocpp.org/files/papers/N3652.html


[Bug c++/65656] __builtin_constant_p should always be constexpr

2015-06-25 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65656

--- Comment #3 from Ryan Johnson scovich at gmail dot com ---
(In reply to Jason Merrill from comment #2)
 Author: jason
 Date: Tue Apr 28 14:43:59 2015
 New Revision: 222531
 
 URL: https://gcc.gnu.org/viewcvs?rev=222531root=gccview=rev
 Log:
   PR c++/65656
   * constexpr.c (cxx_eval_builtin_function_call): Fix
   __builtin_constant_p.
 
 Added:
 trunk/gcc/testsuite/g++.dg/cpp0x/constexpr-builtin3.C
 Modified:
 trunk/gcc/cp/ChangeLog
 trunk/gcc/cp/constexpr.c

Any reason this bug should not be closed as 'fixed' ?

[Bug c++/65656] New: __builtin_constant_p should be constexpr

2015-04-01 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65656

Bug ID: 65656
   Summary: __builtin_constant_p should be constexpr
   Product: gcc
   Version: 4.8.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

Consider the following program compiled with `gcc -std=c++11'
= bug.cpp =
#include cstdio
int main(int argc, char *argv[]) { 
  constexpr bool x = __builtin_constant_p(argc);
  std::printf(x=%d\n, x);
}
===

With optimizations disabled, it correctly treats __builtin_constant_p() as
constexpr and prints 0 as expected (because the value of argc is not a
compile-time constant).

With optimizations enabled (-O1 or higher), compilation fails:
bug.cpp: In function ‘int main(int, char**)’:
bug.cpp:3:48: error: ‘argc’ is not a constant expression
constexpr bool x = __builtin_constant_p(argc);
^
Clang 3.4 handles the case just fine. 

While I can 100% understand that the return value of __builtin_constant_p()
might change depending on what information the optimizer has available, I'm
pretty sure __builtin_constant_p() should always return a value computable at
compile time.

NOTE: this issue is *NOT* the same as Bug #54021, in spite of the two sharing
the same title. The latter is mis-named: It actually requests support for
constant folding for ternary expressions involving __builtin_constant_p, even
when optimizations are disabled and such folding would not normally occur.

[Bug c++/61991] New: Destructors not always called for statically initialized thread_local objects

2014-08-01 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991

Bug ID: 61991
   Summary: Destructors not always called for statically
initialized thread_local objects
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

If a thread_local object is statically initialized---trivial or constexpr
constructor---but has a non-trivial destructor, the destructor will only run if
some *other* thread_local object needs dynamic initialization *and* if at least
one such object is accessed by the thread during its lifetime. Accessing
members of the object itself does nothing, because it is statically
initialized.

Example:

#include cstdio
static thread_local struct X { 
~X() { 
printf(bye!\n); 
}
} x;
static thread_local int y = printf(initialized y\n);
int main() { 
   //printf(%d\n, y);
}

Compiling the above with `g++ -std=gnu++11 bug.cpp' gives an executable that
produces no output when run. 

Uncomment the printf in main() and recompile, and the resulting executable
prints:

initialized y
14
bye!

The only hint of trouble at compile time is that the compiler may warn about an
unused variable. However, that warning only comes if the offending object is
never accessed otherwise (perhaps because it is an exit guard of some type),
has static storage class, *and* no other dynamic thread_local storage exists...
an unlikely combination.

Looking at the assembly code output, the problem is obvious: X::~X is only
registered with __cxa_thread_exit if __tls_init is called, and the latter is
only called if the thread accesses a TLS object that needs dynamic
initialization. 

Under Cygwin, I hit the further problem that __tls_init doesn't even contain
the any calls to __cxa_thread_exit. That's probably a separate bug, though, and
I don't know whose problem it might be.

If there's no easy fix, might I suggest a loud warning somewhere in the docs
might be appropriate so people have a way to know about the limitation? I tried
searching for this online, but Google didn't turn anything up.

(Discovered in 4.8.3, still there in 4.9.0, and given the nature of the bug I
suspect it's in more recent versions, too).


[Bug inline-asm/49611] Inline asm should support input/output of flags

2014-06-02 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611

--- Comment #9 from Ryan Johnson scovich at gmail dot com ---
(In reply to Andi Kleen from comment #7)
 You can do many of these things these days with asm goto, however it
 typically requires non-structured control flow (goto labels).
I filed this bug after determining that asm goto was unsuitable for this
purpose.

Goto labels are not a problem per se (actually kind of slick), but asm goto
requires all outputs to pass through memory and so is only good for control
flow (not computation plus exceptional case). It also requires the actual
branching and all attendant glue to happen in assembly. Both limitations
increase bulk and hamper the optimizer, and go against (what I thought was) the
intention that inline asm normally be used for very small snippets of code the
compiler can't handle. At some point you may as well just setcc and do a new
comparison/branch outside the asm block; less bug-prone and would probably
yield faster and cleaner code, too.


[Bug inline-asm/49611] Inline asm should support input/output of flags

2014-05-30 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611

--- Comment #6 from Ryan Johnson scovich at gmail dot com ---
(In reply to Jeremy from comment #5)
 It may not be possible, but perhaps a simpler thing might be for
 the asm() to notionally return a single boolean value which
 reflects ONE flag only.

Interesting!

Ironically, limiting it to one flag opens the way to cleanly specify branching
based on multiple flags. The optimizer just needs to recognize that when it
sees two otherwise-equivalent (non-volatile) asm statements with different
asm_return attribute, it's really just one asm statement that sets multiple
flags. Thus: 

#ifdef USE_ASM
#define CMP(a,b) asm(cmp %0 %1 : : r(a), r(b))
#define BELOW(a,b) (__attribute__((asm_return(cc_carry))) CMP(a,b))
#define EQUAL(a,b) (__attribute__((asm_return(cc_zero))) CMP(a,b))
#else
#define BELOW(a,b) ((a)  (b))
#define EQUAL(a,b) ((a) == (b))
#endif
int do_it(unsigned int a, unsigned int b, int c, int d, int e, int f) {
int x;
if (BELOW(a,b))
x = c+d;
else if (EQUAL(a,b))
x = d+e;
else
x = c+e;
return x+f;
}

Would produce the same assembly code output---with only one
comparison---whether USE_ASM was defined or not.

Even more fun would be if the optimizer could recognize conditionals that
depend on multiple flags (like x86 less or equal) and turn this:

if ((__attribute__((asm_return(cc_zero))) CMP(a,b) 
|| __attribute__((asm_return(cc_overflow))) CMP(a,b) 
!= __attribute__((asm_return(cc_sign))) CMP(a,b))
do_less_or_equal();
do_something_else();

into:

cmp %[a] %[b]
jg 1f
call do_less_or_equal
1:
call do_something_else

Much of the flag-wrangling machinery seems to already exist, because the
compiler emits the above asm if you replace the inline asm with either a = b
or a  b || a == b (assuming now that a and b are signed ints).


[Bug c++/61372] New: Add warning to detect noexcept functions that might throw

2014-05-30 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61372

Bug ID: 61372
   Summary: Add warning to detect noexcept functions that might
throw
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

The C++11 standard adds the noexcept specification that lets the programmer
assert that a function does not throw any exceptions (terminating execution if
that assertion ever turns out to be false at runtime). 

Unfortunately, there is currently no reliable way for a programmer to validate,
at compile time, her assertion that a function does or does not throw. 

The closest thing is -Wnoexcept, which detects the (very narrow) case where the
following all apply to some function F:
1. F lacks the noexcept declaration or has declared noexcept(false)
2. The compiler has determined that F cannot throw
3. F causes some noexcept operator to evaluate to false

Unfortunately, that narrow formulation makes it really hard to validate much of
anything (see example and further discussion below).

It would be very helpful to have a warning flag which tells the compiler to
report cases where a function's noexcept specification contradicts the
compiler's analysis of the function body. Perhaps -Wnoexcept-mismatch={1,2,3}?

1 (high priority): functions declared noexcept(true) but which contain
expressions that might throw. This validates stated noexcept assumptions,
helping to avoid issues like PR #56166.

2 (medium priority): Also report functions declared noexcept(false) that in
fact cannot throw (e.g. cases #1 and #2 for -Wnoexcept). This improves the
accuracy of noexcept  validation, and also improves performance in general (by
eliminating unwind handlers). And makes it easier to avoid/fix things like PR
#52562. 

3 (low priority): Also report functions which lack any noexcept declaration but
which cannot throw (similar to -Wsuggest-attribute for const, pure, etc.). This
also improves accuracy of noexcept, but the programmer would have to decide
whether to make the API change (marking the function noexcept) or whether it's
important to retain the ability to throw in the future.

Probably none of the above warnings should be enabled by default, but it might
make sense to enable -Wnoexcept-mismatch=1 with -Wall and -Wnoexcept-mismatch=2
with -Wextra.

To implement the warning, the compiler would make a pass over each function
body (after applying most optimizations, especially inlining and dead code
elimination). It would then infer a noexcept value by examining all function
calls that remain, and compare that result with the function's actual noexcept
specification (or lack thereof). No need for any kind of IPA: if a callee lies
about its noexcept status, it's the callee's problem.

===
Workaround using -Wnoexcept
===

One might try to combine static_assert with noexcept, e.g:

// example.cpp 
void might_throw(int); // lacks noexcept
void also_might_throw(); // lacks noexcept
void never_throw(int a) noexcept(noexcept(might_throw(a)) 
noexcept(also_might_throw())) {
if (a)
might_throw(a);
also_might_throw();
}
void foo(int a) noexcept(noexcept(might_throw(a))) {
might_throw(a);
}
static_assert(noexcept(foo(0)), never_throw might throw);


There are two glaring problems with that approach, however:

1. Every expression in the function body must be part of the noexcept clause,
effectively replicating the function body in its signature (but without the
ability to declare local variables).

   - Maintaining the noexcept chain across code changes would be tedious and
error-prone for all but the smallest and most stable functions (= ie the ones
least in need of verification).

   - Operator overloading means you can't even assume basic expressions like
a+b are nothrow. To get complete coverage would require either a very careful
analysis (error prone) or cracking the entire function body into an AST atomic
expressions (tedious *and* error prone).

   - Macro expansions would add even more headaches, because they may expand to
more than one statement and/or include control flow. 

2. The static_assert must choose one set of inputs for each function call it
passes to operator noexcept. 

   - An optimizer (especially after inlining and constant propagation) could
conceivably report that the function is noexcept for that particular input,
when in fact other inputs exist that could cause an exception to be thrown
(this does not seem to be the case currently). 

   - There may not be any easy way to come up with a valid input (objects that
lack a default constructor, etc.). Using hacks like (*(T*)0) would violate all
sorts of compiler/optimizer assumptions and risks breaking the analysis

[Bug c++/14932] [3.4/4.0 Regression] cannot use offsetof to get offsets of array elements in g++ 3.4.0 prerelease

2014-05-28 Thread scovich at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14932

Ryan Johnson scovich at gmail dot com changed:

   What|Removed |Added

 CC||scovich at gmail dot com

--- Comment #16 from Ryan Johnson scovich at gmail dot com ---
A very similar problem arises with gcc-4.8.2 (and 4.9.0):

#include stdio.h
struct foo {
char data[10];
};
int main() {
int x = 4;
printf(%zd\n, offsetof(struct foo, data[x]));
return 0;
}


gcc-4.8.2 accepts it (with -xc), as does clang-3.0. In both cases, the
resulting binary prints 4 as expected. g++-4.8.2 rejects:

bug.cpp: In function ‘int main()’:
bug.cpp:9:47: error: ‘x’ cannot appear in a constant-expression
 printf(%zd\n, offsetof(struct foo, data[x]));

So again, I don't think this bug is fixed... but I'll happily file a new PR if
that's preferred.

[Bug rtl-optimization/10474] shrink wrapping for functions

2013-11-25 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474

--- Comment #18 from Ryan Johnson scovich at gmail dot com ---
(In reply to Martin Jambor from comment #17)
 The testcase is now shrink-wrapped on ppc64 and x86_64, it is not on
 others such as i?86 because parameter-passing ABI basically prevents
 it.  If any of the three testcases pass also on any other platform
 (e.g. Ramana claimed it also works on AArch32 [1]), feel free to add
 it to the dg target in the testcase(s).
 
 For my part, I now consider this to be fixed.
 
 [1] http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02726.html

Great! Does this mean shrink-wrapping will be in gcc-4.9, at least for x86_64
and ppc64?


[Bug rtl-optimization/10474] shrink wrapping for functions

2013-11-25 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474

--- Comment #20 from Ryan Johnson scovich at gmail dot com ---
Hi Martin,

(PM reply because I don't have up-to-date information to file a proper 
bug report with)

On 25/11/2013 9:57 AM, jamborm at gcc dot gnu.org wrote:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474

 --- Comment #19 from Martin Jambor jamborm at gcc dot gnu.org ---
 (In reply to Ryan Johnson from comment #18)
 Great! Does this mean shrink-wrapping will be in gcc-4.9, at least for
 x86_64 and ppc64?
 Well, a fairly basic (but not altogether unreasonable) shrink-wrapping
 was in gcc 4.8 (and earlier versions) too and that has not changed
 at all.  The problem with this and similar testcases was that the
 register allocator made decisions which made shrink-wrapping
 impossible (or at least too difficult to perform).  The change I
 committed and which will be a part of gcc 4.9 fixes this for a class
 of pseudo-registers which commonly result in this problem but other
 cases will still remain unresolved, for example PR 51982.  For some
 statistics about what impact the implemented technique has, see the
 email accompanying the first submission of the patch:
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01719.html

 If you find another similar example which is important and clearly
 possible to shrink-wrap but we don't do it, feel free to submit a
 new missed-optimization bug and CC me.

One that comes to mind right off, but is from several years ago and 
possibly no longer true: on platforms like solaris/sparc, accesses to 
thread-local storage require a function call to retrieve the base of 
thread-local storage; the compiler seems to emit the call once, in the 
function prologue. I strongly suspect (but can't confirm, since I no 
longer have access to Solaris/sparc) that such a 
function-call-in-prologue would confound subsequent efforts at shrink 
wrapping. I don't know how often this sort of scenario arises any more, 
though. It may be that the new emutls stuff has changed everything, 
because on cygwin and gcc-4.8 I now see separate calls into emutls for 
every TLS access.

As for PR 51982, it looks like having flow-sensitive local analysis 
could go a long way: just as it can be useful know that an escaped 
pointer has not *yet* escaped (e.g. PR 50346), here it would be useful 
to know that the stack frame, though perhaps eventually needed, is not 
needed just yet. Then, generation of the stack frame can be pushed down 
to the first basic block(s) where the need for a stack frame is 
undisputed, after any conditions that gate it. But I've been told that 
teaching gcc to think that way would not be easy...

In any case, thanks for the improvement to a hairy problem.

Regards,
Ryan


[Bug c++/58050] New: RVO fails when calling static function through unnamed temporary

2013-08-01 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58050

Bug ID: 58050
   Summary: RVO fails when calling static function through unnamed
temporary
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

Return value optimization is not applied when calling a static member function
via an unnamed temporary (value or pointer, it doesn't matter). Calling the
function directly, or through a named value/pointer, works as expected:

// --- bug.cpp ---
extern C int puts(char const*);
struct B {
~B() { puts(\t~B); }
};
struct A {
static B make() { return B(); }
} a;
A *ap() { return a; }
int main () {
puts(b1);
{B b = A::make();}
puts(b2);
{B B = a.make();}
puts(b3);
{B b = ap()-make();}
puts(b4);
{B b = A().make();}
}
// --- end bug.cpp ---

Output is (same for both 4.8.1 and 4.6.3):
$ g++ bug.cpp  ./a.out
b1
~B
b2
~B
b3
~B
~B
b4
~B
~B

The workaround is simple enough to apply, if you happen to notice all the extra
object copies being made; I isolated the test case from an app that used 5x
more malloc bandwidth than necessary because a single static function called
the wrong way returned a largish STL object by value.


[Bug c++/58051] New: No named return value optimization when returned object is implicitly converted

2013-08-01 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58051

Bug ID: 58051
   Summary: No named return value optimization when returned
object is implicitly converted
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

The following test case introduces an extra object copy, even though none
should be required:

// --- bug.cpp ---
extern C void puts(char const *);
struct A {
A()=default;
A(A )=default;
A(A const ) { puts(copy); }
~A() { puts(~A); }
};
struct B {
A _a;
B(A a) : _a((A)(a)) { }
};
B go() {
A rval;
return rval;
}
int main () { go(); }
// --- end bug.cpp ---

(when compiled with both `gcc-4.8.1 -std=gnu++11' and `gcc-4.6.3 -std=gnu++0x')

RVO works properly if go() returns A() or std::move(rval).


[Bug c++/58022] New: Compiler rejects abstract class in template class with friend operator

2013-07-29 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58022

Bug ID: 58022
   Summary: Compiler rejects abstract class in template class with
friend operator
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

First, apologies for the vague subject line, I really don't know what to call
this bug...

Consider the following test case:

// --- begin bug.cpp ---
#include iostream
using namespace std;
template class T class foo;
template class T ostream  operator(ostream o, const fooT l);
template class T class foo  {
friend ostream operator T (ostream o, const fooT l);
};
class bar;
foobar fb;
class bar { virtual void baz()=0; };
// --- end bug.cpp ---

The test case was isolated using multidelta on a large code base that compiles
cleanly with gcc-4.7 and earlier.

Compiling it with gcc-4.8.1 gives the error: cannot allocate an object of
abstract type ‘bar’, and identifying this function in ostream:

  templatetypename _CharT, typename _Traits
inline basic_ostream_CharT, _Traits
operator(basic_ostream_CharT, _Traits __out, _CharT __c)
{ return __ostream_insert(__out, __c, 1); }

Replacing using namespace std with std::ostream everywhere allows it to
compile, as does moving the definition of bar above the friend declaration.

I'm not 100% certain the code is valid C++, seeing as how it instantiates a
template using an incomplete type, but there are still several issues:

1. The compiler gives no hint whatsoever where the real problem is, leaving the
user to infer the context in some other way; it took 2h with multidelta to
isolate the above test case and finally see what had happened.

2. The declaration of operator (which accepts a const ref) should not
interfere with the one in ostream (which accepts a value); without the const
ref declaration the compiler (rightfully!) complains that template-id
‘operator bar’ for ‘std::ostream operator(std::ostream, const
foobar)’ does not match any template declaration

3. At no point is bar actually instantiated, passed by value, or its members
accessed; even if operator did do one of those things, operator is never
actually called with foobar as an argument, so the template shouldn't be
instantiated.

For now, the workaround seems to be ensuring that bar is fully defined before
any template class mentions it, but that's not going to be easy given how hard
it is to find the problem (and the fact that the foo template is in a utility
library and really should be included first under normal circumstances).

[Bug c++/58022] [4.8 Regression] Compiler rejects abstract class in template class with friend operator

2013-07-29 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58022

--- Comment #3 from Ryan Johnson scovich at gmail dot com ---
(In reply to Paolo Carlini from comment #1)
 Please try to reduce the testcase further, no includes. You have a number of
 options here: http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction

Sorry, I thought ostream was an important part of the bug and did some work
to put it back in... Here's the fully reduced case:

// --- begin bug.cpp ---
templatetypename _CharT
class basic_ostream;

typedef basic_ostreamchar ostream;

templatetypename T
basic_ostreamT operator(basic_ostreamT, T);

template class T class foo;

template class T ostream  operator(ostream, const fooT);

template class T class foo  {
friend ostream operator T (ostream, const fooT);
};

class bar;

foobar fb;

class bar { virtual void baz()=0; };
// --- end bug.cpp ---


[Bug c++/57971] New: Improve copy elision when returning structs by value

2013-07-24 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57971

Bug ID: 57971
   Summary: Improve copy elision when returning structs by value
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scovich at gmail dot com

Hi all,

In the testcase below, bar() and baz() perform copy elision as expected, but
blah() does not, in spite of its being functionally identical to baz():

#include cstdio
struct foo {
foo() { printf(make\n); }
foo(foo const ) { printf(copy\n); }
void frob() { printf(frob\n); }
};

foo bar(bool) {
foo f;
f.frob();
return f;
}
foo baz(bool mknew) {
if (mknew)
return foo();
return bar(mknew);
}
foo blah(bool mknew) {
if (mknew)
return foo();
foo f = bar(mknew);
return f;
}

int main() {
printf(*** bar ***\n);
bar(false);
printf(*** baz ***\n);
baz(false);
printf(*** blah ***\n);
blah(false);
}

Output is:

$ g++ -Wall bug.cpp  ./a.out
*** bar ***
make
frob
*** baz ***
make
frob
*** blah ***
make
frob
copy

I assume that bar() and baz() exploit the named and unnamed return value
optimizations, respectively, but blah() is missed because it needs both
optimizations together.


[Bug c++/55288] New: Improve handling/suppression of maybe-uninitialized warnings

2012-11-12 Thread scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55288



 Bug #: 55288

   Summary: Improve handling/suppression of maybe-uninitialized

warnings

Classification: Unclassified

   Product: gcc

   Version: 4.7.1

Status: UNCONFIRMED

  Severity: enhancement

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: scov...@gmail.com





Created attachment 28669

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28669

maybe-uninitialized false positive



Enabling -Wmaybe-unused (part of -Wall) can result in false positives, which is

fine (the warning is still quite useful). However, there is currently no way to

disable such warnings on a per-variable basis. 



It is possible, but ineffective, to push a diagnostic pragma to ignore such

warnings: Warnings are generated where the uninitialized value (!= variable) is

eventually consumed, and that can easily happen outside the range covered by

the pragma. Inlining makes the problem much worse [1]. 



The attached test case (reduced from actual code) illustrates the problem

clearly, failing to compile with `-O2 -Wall -Werror' even though (a) the value

*is* always written before being read and (b) even though the containing

function has maybe-uninitialized warnings disabled. Adding -DWORKS allows it to

compile by disabling the warning for the call site, even though the offending

variable is not in scope at any part of the source code where the pragma is in

effect.



Since the compiler can clearly track which variable was the problem, I would

instead propose a new variable attribute, ((maybe_uninitialized)), to suppress

all maybe-uninitialized warnings the marked variable might trigger for its

consumers. That way, known false positives can be whitelisted without disabling

a useful warning for large swaths of unrelated code [2].



[1] First, it can vastly expand the number of problematice end points that lie

outside the pragma (they may even reside in different files). Second, the

resulting error message is extremely unhelpful, because it names the variable

that was originally uninitialized, rather than the variable that ended up

holding the poisoned value at the point of use (the former might not even be

in the same file, let alone be in scope, and there's no easy way to figure out

which of its uses causes the problem). It would be much better in this case if

the diagnostic listed the call site(s) and/or assignments that led to the

identified line of code depending on the potentially-uninitialized value,

similar to how template substitution failures or errors in included headers are

handled today.



[2] Another potential solution would be to propagate the pragma to inlined call

sites, but that seems like a horrifically hacky and error prone solution.


[Bug c++/55288] Improve handling/suppression of maybe-uninitialized warnings

2012-11-12 Thread scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55288



--- Comment #2 from Ryan Johnson scovich at gmail dot com 2012-11-12 21:11:43 
UTC ---

(In reply to comment #1)

 Why don't just initialize the variable? It seems simpler than implementing yet

 another special attribute in GCC.



In the original program, the variable is a largish struct, the function is

hot, and the 'valid' execution path is not the most common one. Avoiding

unnecessary initialization there has a measurable impact on performance. 



Note that, in other parts of the code that gcc understands better, the

initialization is unnecessary (no warning) and gets optimized away even if I do

have it in place... much to my chagrin once, after I did a lot of work to

refactor a complex function, only to realize that gcc emitted *exactly* the

same machine code afterward, because it had already noticed and eliminated the

dead stores. 



There's also a philosophical argument to be made... if we agree that all

warnings subject to false positives should be supressible, the current

mechanism for maybe-uninitialized is inadequate, and a variable attribute would

resolve the issue very nicely. There's precedent for this: you *could* use

#ifndef NDEBUG (or even pragma diagnostic) to avoid unused-variable warnings

for helper variables used by multiple assertions scattered over a region of

code, but setting ((unused)) on the offending variable is much easier to read

and maintain, while still allowing other unused variables to be flagged

properly.


[Bug inline-asm/49611] Inline asm should support input/output of flags

2012-04-12 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611

--- Comment #3 from Ryan Johnson scovich at gmail dot com 2012-04-12 16:39:32 
UTC ---
FYI: based on a discussion from quite some time ago [1], it seems that the
Linux kernel folks would be tickled pink to have this feature, and discussed
several potential ways to implement it.

[1] http://lkml.indiana.edu/hypermail/linux/kernel/0111.2/0256.html


[Bug middle-end/32074] Optimizer does not exploit assertions

2012-03-28 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32074

--- Comment #5 from Ryan Johnson scovich at gmail dot com 2012-03-29 02:46:50 
UTC ---
(In reply to comment #4)
 We have __builtin_unreachable() now which should allow for this optimization.

I've been using __builtin_unreachable() for some time now, and it's very nice
for its intended purpose (telling gcc when it's safe to produce better code).
I've noticed, though, that the ``x'' passed to assert(x) in already-existing
code is often too expensive (or side effect-ful) to optimize away when
converted to ``if(!(x)) { __builtin_unreachable(); }'' 

I would therefore advise against executing the expression passed to assertions
under NDEBUG. I use the following in my own code instead:

#ifdef NDEBUG
#define ASSUME(x) do { if (!(x)) __builtin_unreachable(); } while (0)
#else
#define ASSUME assert
#endif

The idea is to state assumptions that might help the compiler generate better
code... treating them like assertions in debug mode to catch faulty
assumptions. Assertions, meanwhile, should retain their traditional purpose of
debugging aid or sanity test and continue to disappear completely in NDEBUG
node.

Recommend to close as WONTFIX, unless there are other reasons to keep it open.


[Bug c++/52637] New: ICE producing debug info for c++11 code using templates/decltype/lambda

2012-03-20 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52637

 Bug #: 52637
   Summary: ICE producing debug info for c++11 code using
templates/decltype/lambda
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The following code snippet produces an ICE when compiled by gcc-4.7.0-RC1 with
flags `-std=gnu++11 -g -c' (gcc-4.6.2 and 4.5.3 accept it):

=== bug.cpp ===
template typename T
struct foo {
foo(T fn) { }
};

template class T, typename V
void bar(T*, V) {
auto x = [] { };
auto y = foodecltype(x)(x);
}

template typename T
void bar(T* t) { bar(t, [] { }); }

struct baz {
void bar() { ::bar(this); }
};
===

$ ~/apps/gcc-4.7-RC1/bin/g++ -std=gnu++11 -g bug.cpp
bug.cpp:17:2: internal compiler error: in output_die, at dwarf2out.c:8463
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

The testcase was distilled from a multi-thousand line app with help from
multidelta.

My platform is i686-pc-cygwin, in case that matters.


[Bug bootstrap/52513] gcc-4.7.0-RC-20120302 fails to build for i686-pc-cygwin

2012-03-07 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52513

--- Comment #2 from Ryan Johnson scovich at gmail dot com 2012-03-07 13:02:50 
UTC ---
(In reply to comment #1)
 4.6 should be broken as well for you?
Oops. I reported wrong in my OP. I've actually been using a home-built 4.6.2
for some time now... and it is the host compiler for this build:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/Ryan/apps/gcc-4.6.2/libexec/gcc/i686-pc-cygwin/4.6.2/lto-wrapper.exe
Target: i686-pc-cygwin
Configured with: ../gcc-4.6.2-src/configure --prefix=/home/Ryan/apps/gcc-4.6.2
Thread model: single
gcc version 4.6.2 (GCC)


 
 Can you check why configure thinks spawnve is available in process.h
 (contrary to the warning we see in your snippet)?
Sorry, I may not have been clear on this. Google reported that spawnve lives in
process.h. A quick search on my filesystem shows that spawnve actually lives in
cygwin/process.h, not process.h as expected by pex-unix.c. Configure
probably only tested linker status for the function and therefore wouldn't have
noticed. 

Perhaps the file moved recently (since 1.7.9 or 10)? I've sent mail to the
cygwin list to see if anybody there knows. Meanwhile, soft-linking process.h to
where gcc expects it lets the compile continue. I'll report back if any further
issues arise.

 What Windows version are you using?
W7-x64


[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member

2012-03-07 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346

--- Comment #6 from Ryan Johnson scovich at gmail dot com 2012-03-07 13:31:19 
UTC ---
(In reply to comment #5)
 On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
  
  --- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 
  12:40:25 UTC ---
  (In reply to comment #3)
   Well, it's a tree optimization issue.  It's simple - the local aggregate f
   escapes the function via the member function call to baz:
   
   bb 5:
 foo::baz (f);
   
   and as our points-to analysis is not flow-sensitive for memory/calls this
   causes f to be clobbered by the call to bar
  
  Is flow-sensitive analysis within single functions prohibitively expensive? 
  All
  the papers I can find talk about whole-program analysis, where it's very
  expensive in both time and space; the best I could find (CGO'11 best paper)
  gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
 
 It would need a complete rewrite, it isn't integratable into the current
 solver (which happens to be shared between IPA and non-IPA modes).
That makes sense...

Wild idea: would it be possible to annotate references as escaped or not
escaped yet ? Anything global or passed into the function would be marked as
escaped, while anything allocated locally would start out as not escaped;
assigning to an escaped location or passing to a function would mark it as
escaped if it wasn't already. The status could be determined in linear time
using local information only (= scalable), and would benefit strongly as
inlining (IPA or not) eliminates escape points.

Alternatively (or maybe it's really the same thing?), I could imagine an SSA
operation which moves the non-escaped variable into an escaped one (which
happens to live at the same address) just before it escapes? That might give
the same effect with no changes to the current flow-insensitive algorithm, as
long as the optimizer knew how to adjust things to account for inlining.


[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member

2012-03-07 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346

--- Comment #8 from Ryan Johnson scovich at gmail dot com 2012-03-07 14:28:29 
UTC ---
(In reply to comment #7)
 On Wed, 7 Mar 2012, scovich at gmail dot com wrote:
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
  
  --- Comment #6 from Ryan Johnson scovich at gmail dot com 2012-03-07 
  13:31:19 UTC ---
  (In reply to comment #5)
   On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
   
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346

--- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 
12:40:25 UTC ---
(In reply to comment #3)
 Well, it's a tree optimization issue.  It's simple - the local 
 aggregate f
 escapes the function via the member function call to baz:
 
 bb 5:
   foo::baz (f);
 
 and as our points-to analysis is not flow-sensitive for memory/calls 
 this
 causes f to be clobbered by the call to bar

Is flow-sensitive analysis within single functions prohibitively 
expensive? All
the papers I can find talk about whole-program analysis, where it's very
expensive in both time and space; the best I could find (CGO'11 best 
paper)
gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
   
   It would need a complete rewrite, it isn't integratable into the current
   solver (which happens to be shared between IPA and non-IPA modes).
  That makes sense...
  
  Wild idea: would it be possible to annotate references as escaped or not
  escaped yet ? Anything global or passed into the function would be marked 
  as
  escaped, while anything allocated locally would start out as not escaped;
  assigning to an escaped location or passing to a function would mark it as
  escaped if it wasn't already. The status could be determined in linear time
  using local information only (= scalable), and would benefit strongly as
  inlining (IPA or not) eliminates escape points.
 
 Well, you can compute the clobber/use sets of individual function calls,
 IPA PTA computes a simple mod-ref analysis this way.  You can also
 annotate functions whether they make arguments escape or whether it
 reads from them or clobbers them.
 
 The plan is to do some simple analysis and propagate that up the
 callgraph, similar to pure-const analysis.  The escape part could
 be integrated there.

That sounds really slick to have in general... but would it actually catch the
test case above? What you describe seems to depend on test() having information
about foo::baz() -- which it does not -- while analyzing the body of test()
could at least identify the part of f's lifetime where it cannot possibly have
escaped.

Or does the local analysis come for free once those IPA changes are in place?


[Bug c++/52529] New: Compiler rejects template code inconsistently

2012-03-07 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52529

 Bug #: 52529
   Summary: Compiler rejects template code inconsistently
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The following code does or does not compile (with varying errors) under
4.7.0-RC20120302 and 4.6.2 depending on the choice of FIRST..FOURTH; all four
variants compile under 4.5.3. 

I've narrowed down the test case as far as I could, but I don't really
understand what's going wrong. Does the code break some subtle lookup rule
which gcc recently became more strict about? There seem to be two issues,
because either AM or foo() is enough to trigger an error; only A1::fooT
compiles. 

=== bug.cpp ===
template long N struct A {
template typename T long foo(typename T::X *x);
};

template typename T struct B {
typedef typename T::X X;
enum { M=1 };
static void bar(X *x);
};

struct C { struct X; };
int main() { BC::bar(0); }

template typename T void BT::bar(X *x) {
#if defined(FIRST)
AM().foo(x);
#elif defined(SECOND)
AM().fooT(x);
#elif defined(THIRD)
A1().foo(x);
#elif defined(FOURTH)
A1().fooT(x);
#endif
}
=== end bug.cpp ===


Sample error message from 4.7.0, since it has the clearest error messages
(4.6.2 gets confused by earlier errors when attempting to suggest a candidate):

 FIRST 
bug.cpp: In instantiation of ‘static void BT::bar(BT::X*) [with T = C;
BT::X = C::X]’:
bug.cpp:12:20:   required from here
bug.cpp:16:5: error: no matching function for call to ‘A1l::foo(BC::X*)’
bug.cpp:16:5: note: candidate is:
bug.cpp:2:32: note: templateclass T long int A::foo(typename T::X*) [with T =
T; long int N = 1l]
bug.cpp:2:32: note:   template argument deduction/substitution failed:
bug.cpp:16:5: note:   couldn't deduce template parameter ‘T’

 SECOND 
bug.cpp: In static member function ‘static void BT::bar(BT::X*)’:
bug.cpp:18:17: error: expected primary-expression before ‘’ token

 THIRD 
bug.cpp: In instantiation of ‘static void BT::bar(BT::X*) [with T = C;
BT::X = C::X]’:
bug.cpp:12:20:   required from here
bug.cpp:20:5: error: no matching function for call to ‘A1l::foo(BC::X*)’
bug.cpp:20:5: note: candidate is:
bug.cpp:2:32: note: templateclass T long int A::foo(typename T::X*) [with T =
T; long int N = 1l]
bug.cpp:2:32: note:   template argument deduction/substitution failed:
bug.cpp:20:5: note:   couldn't deduce template parameter ‘T’

 FOURTH 
[successful compilation]


[Bug bootstrap/52513] New: gcc-4.7.0-RC-20120302 fails to build for i686-pc-cygwin

2012-03-06 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52513

 Bug #: 52513
   Summary: gcc-4.7.0-RC-20120302 fails to build for
i686-pc-cygwin
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The RC doesn't build on i686-pc-cygwin:

gcc -c -DHAVE_CONFIG_H -g -fkeep-inline-functions  -I.
-I../../gcc-4.7.0-RC-20120302/libiberty/../include  -W -Wall -Wwrite-strings
-Wc++-compat -Wstrict-prototypes -pedantic 
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c -o pex-unix.o
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c: In function
‘pex_unix_exec_child’:
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:2: warning: implicit
declaration of function ‘spawnvpe’ [-Wimplicit-function-declaration]
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:18: error: ‘_P_NOWAITO’
undeclared (first use in this function)
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:549:18: note: each undeclared
identifier is reported only once for each function it appears in
../../gcc-4.7.0-RC-20120302/libiberty/pex-unix.c:551:2: warning: implicit
declaration of function ‘spawnve’ [-Wimplicit-function-declaration]
Makefile:892: recipe for target `pex-unix.o' failed
make[3]: *** [pex-unix.o] Error 1
make[3]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj/libiberty'
Makefile:8642: recipe for target `all-stage1-libiberty' failed
make[2]: *** [all-stage1-libiberty] Error 2
make[2]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj'
Makefile:15771: recipe for target `stage1-bubble' failed
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/home/Ryan/apps/gcc-4.7.0-obj'
Makefile:897: recipe for target `all' failed
make: *** [all] Error 2

The needed declarations seem to live in Windows headers (process.h?)

This is using all official (and latest) cygwin packages:
binutils-2.22.51
cygwin-1.7.11s(0.259/5/3)
gcc-4.5.3
gmp-4.3.2
make-3.82.90
mpfr-3.0.1

Configure command: ../gcc-4.7.0-RC-20120302/configure
--prefix=$HOME/apps/gcc-4.7 --enable-languages=c,c++,lto


[Bug c++/52166] New: c++0x required to import standard c++ headers?

2012-02-07 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52166

 Bug #: 52166
   Summary: c++0x required to import standard c++ headers?
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


Several of the standard C++ wrapper versions of C headers can only be imported
with c++0x support enabled (tested on both cygwin and x86_64-linux):

#error This file requires compiler and library support for the upcoming ISO C++
standard, C++0x. This support is currently experimental, and must be enabled
with the -std=c++0x or -std=gnu++0x compiler options.

Workaround is to #include foo.h instead of cfoo, but it's annoying given
that the former is supposedly The Right Way for C++ programs to bring in the
header.

Affected files:

ccomplex
cfenv
cinttypes
cstdint


[Bug tree-optimization/50346] Function call foils VRP/jump-threading of redundant predicate on struct member

2011-10-12 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346

--- Comment #4 from Ryan Johnson scovich at gmail dot com 2011-10-12 12:40:25 
UTC ---
(In reply to comment #3)
 Well, it's a tree optimization issue.  It's simple - the local aggregate f
 escapes the function via the member function call to baz:
 
 bb 5:
   foo::baz (f);
 
 and as our points-to analysis is not flow-sensitive for memory/calls this
 causes f to be clobbered by the call to bar

Is flow-sensitive analysis within single functions prohibitively expensive? All
the papers I can find talk about whole-program analysis, where it's very
expensive in both time and space; the best I could find (CGO'11 best paper)
gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 


 as neither the bodies of baz nor bar are visible there is nothing we can do

Would knowing the body of bar() help if the latter cannot be inlined?


[Bug c++/50346] New: Function call foils VRP/jump-threading of redundant predicate on struct member

2011-09-09 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346

 Bug #: 50346
   Summary: Function call foils VRP/jump-threading of redundant
predicate on struct member
Classification: Unclassified
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


When compiling the following code with options `-O3 -DBUG' :

// === bug.cpp ===
struct foo {
bool b;
foo() : b(false) { }
void baz();
};

bool bar();
void baz();

void test() {
foo f;
bool b = false;
if (bar()) b = f.b = true;
#ifndef BUG
if (f.b != b) __builtin_unreachable();
#endif
if (f.b) f.baz();
}
// === end ==

gcc fails to eliminate the second (redundant) if statement:

_Z4testv:
.LFB3:
subq$24, %rsp
movb$0, 15(%rsp)=== assign f.b = 0
call_Z3barv=== cannot access f.b
testb   %al, %al
je  .L2
movb$1, 15(%rsp)
.L3:
leaq15(%rsp), %rdi
call_ZN3foo3bazEv
addq$24, %rsp
ret
.L2:
cmpb$0, 15(%rsp)=== always compares equal
jne .L3
addq$24, %rsp
ret

Compiling with `-O3 -UBUG' gives the expected results:

_Z4testv:
.LFB3:
subq$24, %rsp
movb$0, 15(%rsp)
call_Z3barv
testb   %al, %al
je  .L1
leaq15(%rsp), %rdi
movb$1, 15(%rsp)
call_ZN3foo3bazEv
.L1:
addq$24, %rsp
ret

This sort of scenario comes up a lot with RAII-related code, particularly when
some code paths clean up the object manually before the destructor runs
(obviating the need for the destructor to do it again).

While it should be possible to give hints using __builtin_unreachable(), it's
not always easy to tell where to put it, and it may need to be placed multiple
times to be effective.


[Bug c++/50312] New: ICE when calling offsetof() illegally on incomplete template class

2011-09-06 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50312

 Bug #: 50312
   Summary: ICE when calling offsetof() illegally on incomplete
template class
Classification: Unclassified
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The following (admittedly illegal) code:

//== begin ==
#include cstddef
#ifdef BUG
template typename T=int
#define EXTRA 
#else
#define EXTRA
#endif
struct foo {
int bar;
enum { END = offsetof(foo, bar) };
};
foo EXTRA a;
//== end ===

Causes an ICE when compiled with -DBUG:
$ g++ -DBUG bug.cpp
bug.cpp: In instantiation of ‘foo’:
bug.cpp:12:11:   instantiated from here
bug.cpp:10:10: internal compiler error: in tree_low_cst, at tree.h:4233
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

The above is for g++-4.6.1; the same error occurs for g++-4.5.0 (but at
tree.c:6202).

Compiling without -DBUG triggers a much more helpful diagnostic:
$ g++ bug.cpp
bug.cpp:10:18: error: invalid use of incomplete type ‘struct foo {aka struct
foo}’
bug.cpp:8:8: error: forward declaration of ‘struct foo {aka struct foo}’


[Bug inline-asm/49611] Inline asm should support input/output of flags

2011-07-04 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611

--- Comment #2 from Ryan Johnson scovich at gmail dot com 2011-07-04 20:32:01 
UTC ---
(In reply to comment #1)
 Making this work reliably is probably more work than making GCC use the flags
 from more cases from regular C code.

Does that mean each such case would need to be identified individually and then
hard-wired into i386.md? The existence of modes like CCGC, CCGOC, CCNO, etc. in
i386-modes.def made me hope that some high-level mechanism existed for
reasoning about the semantics of condition codes. Or does that mechanism exist,
and is just difficult to expose to inline asm for some reason?


[Bug inline-asm/49611] New: Inline asm should support input/output of flags

2011-07-01 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49611

   Summary: Inline asm should support input/output of flags
   Product: gcc
   Version: 4.5.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: inline-asm
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The main reason I find myself writing inline asm is to do clever things with
the flags register, especially in conjunction with unusual instructions. 

Some examples:

1. Using the sparc brz instruction if the compiler doesn't emit it (e.g. bug
#40067). 

2. Using the carry flag in x86 to determine whether the unsigned comparison a
!= b was greater or less than, using subtract-with-borrow (seen in gnu libc):
sbb %eax, %eax; sbb $-1, %eax leaves %eax containing -1 if a  b and +1 if a
 b. 

3. AMD's Advanced Synchronization Facility which proposes a jmp-like
instruction for starting hardware transactions. Its effect is similar to
fork(): on the first time past sets flags and eax to zero; a transaction
failure resumes from the same PC, but with eax and flags set to reflect an
error code. 

4. In my experience, the main reason people would want asm goto to allow
outputs is because they can't export flags (otherwise the goto can become
control flow in C/C++).

In all three cases the inline asm becomes needlessly long simply because uses
of the flags generated within the asm block will only work reliably within that
asm block (including branches, loops, etc.).

Consider the following concrete example:

#define EOL \n
#define EOLT EOL \t
long pstrcmp(unsigned char const* a, unsigned char const* b, long* pout, long
pin=0) {
long delta, tmp;
asm(# EOL
1:EOLT
movzb  (%[a], %[n]), %k[tmp]  EOLT
movzb  (%[b], %[n]), %k[delta]EOLT
cmpb   %b[delta], %b[tmp] EOLT
jnz2f EOLT
testb  %b[tmp], %b[tmp]   EOLT
jz 3f EOLT
sub%[m1], %[n]EOLT
jmp1b EOL
2:EOLT
sbb%[delta], %[delta] EOLT
sbb%[m1], %[delta]EOL
3:
: [a] +r(a), [b] +r(b), [n] +r(pin),
  [delta] =q(delta), [tmp] =q(tmp)
: [m1] i(-1)
);
*pout = pin;
return delta;
}

With inline asm support for flags it would look more like this:

long pstrcmp(unsigned char const* a, unsigned char const* b, long* pout, long
pin=0) {
long delta, tmp;
  again:
if (a[pin] == b[pin]) {
if (a[pin] != 0) {
pin++;
goto again;
}
else {
delta = b[pin];
}
}
else {
asm(sbb%[delta], %[delta] EOLT
sbb%[m1], %[delta]
: [delta] =r
: [m1] i(-1), flags(a[pin] != b[pin])
);
}
*pout = pin;
return delta;
}

The intent is that the flags input specifier tells the compiler to arrange
for flags to be set at entry to the asm block as if the expression passed to it
had just completed (the compiler would warn/error if it were unclear the effect
evaluating the expression would have on flags). In theory the optimizer should
be able to eliminate common expressions and shuffle code to avoid materializing
the flags at all. 

Using flags as output (perhaps to pass as input to another inline asm block)
might look like this:

asm(cmp %0, %1 : =flags(flags) : r(a), r(b));
...
asm(jz 1f : : flags(flags));

The flags should probably take type 'int' in C.

Ideally, the compiler could even recognize and optimize patterns like this:

asm(cmp %0, %1 : =flags(flags) : r(a), r(b));
enum { CF=1 };
if (flags  CF) {
...
}
else if (flags  ZF) {
...
}


[Bug middle-end/49035] New: Avoid setting up stack frame for short, hot code paths

2011-05-17 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49035

   Summary: Avoid setting up stack frame for short, hot code paths
   Product: gcc
   Version: 4.5.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


I often find myself writing functions of the following form:

void foo () {
if (common_case) {
/* do a little work and return */
}
/* uncommon case: do a lot of work, call functions, etc. */
}

The resulting assembly code always sets up a stack frame in the function
prologue, even though the function usually executes as a leaf using few (or
zero) of the callee-save registers and stack slots it saves. 

Here's an example which is only slightly contrived:

=== rfe.cpp 
struct link {
link* prev;
long go_slow;
void frob(link* parent, link* grandparent);
};

link* foo(link* list) {
link* prev = list-prev;
while (__builtin_expect(prev-go_slow, 0)) {
link* pprev = __sync_lock_test_and_set(prev-prev, 0);
pprev-frob(prev, list);
prev = pprev;
}
return prev;
}
=== rfe.cpp 

Compiling the above with `x86_64-unknown-linux-gnu-g++-4.5.2 -O3 -S' yields

_Z3fooP4link:
.LFB0:
movq%rbx, -24(%rsp)
movq%rbp, -16(%rsp)
movq%rdi, %rbx
movq%r12, -8(%rsp)
subq$24, %rsp
movq(%rdi), %rax
cmpq$0, 8(%rax)
jne .L8
.L2:
movq(%rsp), %rbx
movq8(%rsp), %rbp
movq16(%rsp), %r12
addq$24, %rsp
ret
.L8:
xorl%r12d, %r12d
.L6:
movq%r12, %rbp
xchgq   (%rax), %rbp
movq%rbx, %rdx
movq%rax, %rsi
movq%rbp, %rdi
call_ZN4link4frobEPS_S0_
cmpq$0, 8(%rbp)
jne .L4
movq%rbp, %rax
jmp .L2
.L4:
movq%rbp, %rax
jmp .L6


Ideally, it would look like this instead:

_Z3fooP4link:
.LFB0:
;; *** hot path executes as leaf ***
movq(%rdi), %rax
cmpq$0, 8(%rax)
jne .L8
ret
.L8:
;; *** set up stack frame ***
movq%rbx, -24(%rsp)
movq%rbp, -16(%rsp)
movq%rdi, %rbx
movq%r12, -8(%rsp)
subq$24, %rsp
;; ***
xorl%r12d, %r12d
.L6:
movq%r12, %rbp
xchgq   (%rax), %rbp
movq%rbx, %rdx
movq%rax, %rsi
movq%rbp, %rdi
call_ZN4link4frobEPS_S0_
cmpq$0, 8(%rbp)
jne .L4
;; *** tear down stack frame ***
movq%rbp, %rax
movq(%rsp), %rbx
movq8(%rsp), %rbp
movq16(%rsp), %r12
addq$24, %rsp
;; *** 
ret
.L4:
movq%rbp, %rax
jmp .L6


The effect can sometimes be simulated using an inlined foo which includes the
fast path and a call to the (non-inlined) slow path, but the whims of function
inlining often conspire against it even when callers are able to inline foo
(e.g. foo is not a library function).

There's probably some overlap with partial inlining here: the ideal case
essentially splits the slow path off into its own function (called using tail
recursion); presumably partial inlining would inline the fast path while having
all callers jump to the same copy of the slow path function. However, the
optimization is arguably useful even if foo is never inlined at all.

Thoughts?
Ryan


[Bug middle-end/49035] Avoid setting up stack frame for short, hot code paths

2011-05-17 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49035

--- Comment #1 from Ryan Johnson scovich at gmail dot com 2011-05-18 02:56:23 
UTC ---
Update: using __attribute__((noinline)) it is actually possible to force the
compiler to do the right thing, though it makes the code significantly less
readable:

=== example.cpp 
struct link {
link* prev;
long go_slow;
void frob(link* parent, link* grandparent);
};

link* __attribute__((noinline)) foo_slow(link* list, link* prev) {
do {
link* pprev = __sync_lock_test_and_set(prev-prev, 0);
pprev-frob(prev, list);
prev = pprev;
} while(__builtin_expect(prev-go_slow, 0));
return prev;
}

link* foo_fast(link* list) {
link* prev = list-prev;
if (__builtin_expect(prev-go_slow, 0))
return foo_slow(list, prev);
return prev;
}
=== example.cpp 

The above compiles down to something much better, though the calling convention
requires an extra movq and there are more jumps than required (the compiler
probably doesn't ever perform tail recursion using a conditional jump):

_Z8foo_fastP4link:
movq(%rdi), %rax
cmpq$0, 8(%rax)
jne .L7
rep
ret
.L7:
movq%rax, %rsi
jmp _Z8foo_slowP4linkS0_

_Z8foo_slowP4linkS0_:
movq%rbp, -16(%rsp)
movq%r12, -8(%rsp)
xorl%ebp, %ebp
movq%rbx, -24(%rsp)
movq%rdi, %r12
subq$24, %rsp
.L2:
movq%rbp, %rbx
xchgq   (%rsi), %rbx
movq%r12, %rdx
movq%rbx, %rdi
call_ZN4link4frobEPS_S0_
cmpq$0, 8(%rbx)
jne .L3
movq%rbx, %rax
movq8(%rsp), %rbp
movq(%rsp), %rbx
movq16(%rsp), %r12
addq$24, %rsp
ret
.L3:
movq%rbx, %rsi
jmp .L2


[Bug c++/46143] New: __attribute__((optimize)) emits wrong code

2010-10-22 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143

   Summary: __attribute__((optimize)) emits wrong code
   Product: gcc
   Version: 4.5.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


Created attachment 22129
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22129
Test case showing wrong code with __attribute__((optimize(0)))

Applying '__attribute__((optimize(0)))' to a function causes it to call the
wrong variant/clone of an optimized callee that returns a struct by value.

The attached test case reproduces the problem when compiled with `g++ -O3 -DBUG
bug.cpp' 

The problem seems to be the way gcc optimizes return-by-value. The statement:

iterator it = v.begin()

becomes

tmp = alloca(sizeof(iterator))
vector::begin(tmp, v)
iterator it(*(iterator*)tmp)

However, gcc actually calls the wrong variant of vector::begin, with the latter
thinking its first argument is v._M_impl._M_start (an iterator to be copied)
and which has optimized away the struct completely to return only a pointer. It
therefore allocates a temporary and proceeds to initialize it using the
(uninitialized) return-value it was passed, then returns the temporary's
contents to the caller (main). As a result, 'it' points to whatever happened to
be on the stack at the time of the call. 

Note that the test case smashes the stack only to make the symptoms consistent;
the bug remains with or without it.

The relevant disassembly follows:

main:
# call vector::begin(rval_ptr, v)
subq$24, %rsp# allocate hidden tmp1
movqv(%rip), %rdx
movq%rdx, %rsi   # second arg is v
movq%rsp, %rdi   # first arg is tmp1
call_ZNSt6vectorIP3fooSaIS1_EE5beginEv.clone.1
...

_ZNSt6vectorIP3fooSaIS1_EE5beginEv.clone.1:
subq$24, %rsp# allocate hidden tmp2
movq%rdi, %rsi   # second arg expects v but gets tmp1
movq%rsp, %rdi   # first arg is tmp2
call   
_ZN9__gnu_cxx17__normal_iteratorIPP3fooSt6vectorIS2_SaIS2_EEEC2ERKS3_.clone.0
movq(%rsp), %rax # return the contents of tmp2
addq$24, %rsp
ret

_ZN9__gnu_cxx17__normal_iteratorIPP3fooSt6vectorIS2_SaIS2_EEEC2ERKS3_.clone.0:
movq%rsi, (%rdi) # tmp2 = tmp1
ret


[Bug c++/46143] __attribute__((optimize)) emits wrong code

2010-10-22 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143

Ryan Johnson scovich at gmail dot com changed:

   What|Removed |Added

  Attachment #22129|0   |1
is obsolete||

--- Comment #1 from Ryan Johnson scovich at gmail dot com 2010-10-22 22:18:16 
UTC ---
Created attachment 22130
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22130
Test case showing wrong code with __attribute__((optimize(0)))

Oops... the previous version had stray marks from emacs+gdb.


[Bug c++/46143] __attribute__((optimize)) emits wrong code

2010-10-22 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46143

--- Comment #4 from Ryan Johnson scovich at gmail dot com 2010-10-22 23:06:53 
UTC ---
As I said, the stack smashing was only there to make the behavior consistent.
If the offending stack location happens to contain zero, the bug would go
unnoticed (try adding 'long n[1]' as another local, for me it makes the symptom
go away unless the stack smash exposes it.

In any case, here's a minimal testcase which doesn't do anything evil:

#include vector
#include cassert

typedef std::vectorint intv;

int
#ifdef BUG
__attribute__((optimize(0)))
#endif
main() {
intv v;
intv::iterator it = v.begin();
assert(it == v.begin());
return 0;
}


[Bug lto/45959] [4.6 Regression] ICE: tree code 'template_type_parm' is not supported in gimple streams with -flto/-fwhopr

2010-10-11 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45959

Ryan Johnson scovich at gmail dot com changed:

   What|Removed |Added

 CC||scovich at gmail dot com

--- Comment #6 from Ryan Johnson scovich at gmail dot com 2010-10-11 13:53:42 
UTC ---
Actually, this isn't a regression -- not on 4.6, at least. The following
minimal test case makes x86_64-unknown-linux-gnu-gcc-4.5.1 die with the same
error message:

$ cat  lto-bug.h EOF
#pragma interface
templateclass T struct foo;
templateclass T
struct fooT* : fooT {
fooT*(T* t) : fooT(*t) { }
};
EOF

$ cat  lto-bug.C EOF
#pragma implementation lto-bug.h
#include lto-bug.h
EOF

$ gcc-4.5.1 -flto lto-bug.C
In file included from lto-bug.C:2:0:
lto-bug.h:6:2: internal compiler error: tree code ‘template_type_parm’ is not
supported in gimple streams
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Removing anything at all makes the ICE disappear

I don't have a copy of the 4.6 sources to test whether the just checked-in fix
takes care of this... reopen?


[Bug c++/45968] New: ICE: tree code 'template_type_parm' is not supported in gimple streams with -flto

2010-10-11 Thread scovich at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45968

   Summary: ICE: tree code 'template_type_parm' is not supported
in gimple streams with -flto
   Product: gcc
   Version: lto
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: scov...@gmail.com


The following minimal test case makes x86_64-unknown-linux-gnu-gcc-4.5.1 ICE
when the -flto flag is supplied:

$ cat  lto-bug.h EOF
#pragma interface
templateclass T struct foo;
templateclass T
struct fooT* : fooT {
fooT*(T* t) : fooT(*t) { }
};
EOF

$ cat  lto-bug.C EOF
#pragma implementation lto-bug.h
#include lto-bug.h
EOF

$ gcc-4.5.1 -flto lto-bug.C
In file included from lto-bug.C:2:0:
lto-bug.h:6:2: internal compiler error: tree code ‘template_type_parm’ is not
supported in gimple streams
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Removing anything at all makes the ICE disappear

This seemed similar to bug #45959, but applying the patch mentioned there to
gcc-4.5.1-src/gcc/cp/pt.c:11322 does not help so I'm filing a new bug for this.


[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces

2010-05-07 Thread scovich at gmail dot com


--- Comment #5 from scovich at gmail dot com  2010-05-07 20:12 ---
Belated follow-up: I just tried to use sparc-sun-solaris2.10-gcc-4.4.0 (built
from sources) and it does not emit the DW_AT_call_* debug attributes which gdb
expects in order to unwind inlined functions. 

I have searched both the gdb and gcc docs and cannot find any mention of
(modern) machines/systems/situations where this is not supported; given that
the required attributes are missing it seems like a gcc problem (feeding the .s
file to gas doesn't help, so I doubt it's the sun assembler/linker, either)

gcc -v
Using built-in specs.
Target: sparc-sun-solaris2.10
Configured with: ../gcc-4.4.0/configure
--prefix=/export/home/ryanjohn/apps/gcc-4.4.0
--with-gmp=/export/home/ryanjohn/apps --with-mpfr=/export/home/ryanjohn/apps
--without-gnu-ld --without-gnu-as
Thread model: posix
gcc version 4.4.0 (GCC) 


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
   GCC host triplet||sparc-sun-solaris2.10
  Known to fail||4.4.0
 Resolution|WORKSFORME  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces

2010-05-07 Thread scovich at gmail dot com


--- Comment #6 from scovich at gmail dot com  2010-05-07 21:20 ---
Aha! The problem is not that gcc fails to emit the proper debug info, it's that
it doesn't always track well which instructions came from which function. 

For example, if we compile this toy program:

int  volatile global;
int foo(int a) {
return a + global;
}
int bar(int a) {
return global + foo(a);
}
int baz(int a) {
return global + bar(a);
}
int main(int argc, char const* argv[]) {
return global + baz(argc);
}

Running it in gdb will seem to begin execution at exit from bar:
Dump of assembler code for function main:
   0x000106cc +0: sethi  %hi(0x20800), %g1
   0x000106d0 +4: ld  [ %g1 + 0x124 ], %g4! 0x20924 global
= 0x000106d4 +8: ld  [ %g1 + 0x124 ], %g3
   0x000106d8 +12:ld  [ %g1 + 0x124 ], %g2
   0x000106dc +16:ld  [ %g1 + 0x124 ], %g1
   0x000106e0 +20:add  %g4, %g1, %g1
   0x000106e4 +24:add  %g1, %g3, %g1
   0x000106e8 +28:add  %g1, %g2, %g1
   0x000106ec +32:retl
   0x000106f0 +36:add  %g1, %o0, %o0
End of assembler dump.

Apparently someone made the reasonable judgment call that it was better to only
enter inlined functions once rather than jumping around, and even then only
if code from later in the containing function hasn't already run. Putting a
printf in foo() gave the expected result. 


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||WORKSFORME


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces

2010-04-23 Thread scovich at gmail dot com


--- Comment #4 from scovich at gmail dot com  2010-04-23 23:29 ---
 Try the -i option of addr2line.

Ah, very nice. It turns out I was using a 4.0-series gcc, which according to
gdb's docs doesn't output quite enough debug information to reconstruct inlined
stack traces; 4.1 and later do. Time for an upgrade!

Thanks!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug debug/43828] New: Emit debug info allowing inlined functions to show in stack traces

2010-04-21 Thread scovich at gmail dot com
It would be very nice if gcc emitted debug information that allowed profilers
and debuggers the option to extract a stack trace which included calls to
inlined functions. This would allow developers much greater insight into the
behavior of optimized code. 

C++ programs would benefit disproportionately, especially those which use the
STL heavily -- disabling inlining produces a very different executable which
makes profiling worse than useless and can mask heisenbugs.

Profiling would become significantly more accurate because it could determine
how much of a function's overheads remain even after inlining, which is pretty
much impossible right now. It would also allow profilers to generate
functional call graphs which show all uses of a function, inlined or not. 

Debugging would also improve because the user would be able to navigate a stack
trace which corresponds to the code they're trying to debug, even if the actual
calls were optimized away. Questions like which of this function's 10 calls to
std::vector::begin seg faulted? would suddenly be *much* easier to answer, and
in an intuitive way. With some work it would probably even be possible to
maintain mappings for local vars/params (assuming they exist at the time). 

All this virtual stack trace functionality would need to remain separate (and
probably not the default) so as to not confuse (impede) folks who are used to
(prefer) the current behavior. 

NOTE: I realize that full support for this would require changes to other
projects like gdb and gprof, but gcc could solve the chicken-and-egg problem by
emitting appropriate debug info as a first step; perhaps the new debug info
changes introduced with 4.5.0 already do (some of) this?


-- 
   Summary: Emit debug info allowing inlined functions to show in
stack traces
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: debug
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug debug/43828] Emit debug info allowing inlined functions to show in stack traces

2010-04-21 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2010-04-21 09:29 ---
(In reply to comment #0)
One more way debugging would improve: it would become possible to set
breakpoints in inlined functions


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43828



[Bug bootstrap/43301] New: top-level configure script ignores ---with-build-time-tools

2010-03-08 Thread scovich at gmail dot com
./configure ... --with-build-time-tools=$MY_TOOLS ignores $MY_TOOLS (though it
correctly warns when $MY_TOOLS is not an absolute path).

Let's just say this led to extremely frustrating behavior until I decided to
start digging...

Suggested patch to correct the problem:

Index: /home/Ryan/apps/gcc-4.5-src/configure.ac
===
--- /home/Ryan/apps/gcc-4.5-src/configure.ac(revision 157227)
+++ /home/Ryan/apps/gcc-4.5-src/configure.ac(working copy)
@@ -3221,7 +3221,9 @@
   [  --with-build-time-tools=PATH
   use given path to find target tools during the
build],
   [case x$withval in
- x/*) ;;
+ x/*)
+   with_build_time_tools=$withval
+   ;;
  *)
with_build_time_tools=
AC_MSG_WARN([argument to --with-build-time-tools must be an absolute
path])


-- 
   Summary: top-level configure script ignores ---with-build-time-
tools
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
  GCC host triplet: i686-pc-cygwin
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301



[Bug bootstrap/43301] top-level configure script ignores ---with-build-time-tools

2010-03-08 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2010-03-09 01:04 ---
(In reply to comment #0)
 Let's just say this led to extremely frustrating behavior until I decided to
 start digging...

To be more specific, the gcc/as wrapper is generated with:

ORIGINAL_AS_FOR_TARGET=
ORIGINAL_LD_FOR_TARGET=
ORIGINAL_PLUGIN_LD_FOR_TARGET=
ORIGINAL_NM_FOR_TARGET=

Which causes the building of libgcc to fail later on at gcc/as line 83 with a
message about exec: not found


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301



[Bug c/35503] Warning about restricted pointers?

2009-11-26 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2009-11-27 07:45 ---
I've also run into this. Perhaps the machinery which tracks strict aliasing
(and generates best-effort warnings) could be used here?

... adding this comment instead of filing a duplicate :P


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 CC||scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35503



[Bug middle-end/42077] New: std::set: dereferencing pointer '__x.15' does break strict-aliasing rules

2009-11-17 Thread scovich at gmail dot com
With gcc-4.4.2 the following code generates warnings about strict aliasing:

=| bug.cpp |===
#include set
#ifdef SHOW_BUG
struct foo {
int i;
bool operator(foo const o) const { return i  o.i; }
};
#else
typedef int foo;
#endif
int main() { std::setfoo().insert((foo) {0}); }
=| bug.cpp |===

$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.4.2-src/configure --prefix=/home/rjo/apps/gcc-4.4.2
--with-gmp=/home/rjo/apps --with-mpfr=/home/rjo/apps --disable-nls
--disable-multilib
Thread model: posix
gcc version 4.4.2 (GCC)

$ gcc -Wall -O3 -DSHOW_BUG bug.cpp
bug.cpp: In function 'int main()':
bug.cpp:5: warning: dereferencing pointer '__x.15' does break strict-aliasing
rules
/home/rjo/apps/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/bits/stl_tree.h:525:
note: initialized from here
/home/rjo/experiments/scratch.cpp:5: warning: dereferencing pointer '__x.15'
does break strict-aliasing rules
/home/rjo/apps/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/bits/stl_tree.h:525:
note: initialized from here

In addition to the problem of the STL (appearing to?) break strict-aliasing
rules, it looks like bug #38477 is back

Below is the (hopefully) relevant snippet of CFG. It looks exactly like the
issue described in bug #38477, with static cast voodoo:

static const _Val std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::_S_value(const std::_Rb_tree_node_base*) [with _Key = foo, _Val = foo,
_KeyOfValue = std::_Identityfoo, _Compare = std::lessfoo, _All
oc = std::allocatorfoo] (const struct _Rb_tree_node_base * __x)
{
  const struct _Rb_tree_node * __x.15;
  const struct foo  D.8747;

bb 2:
  __x.15 = (const struct _Rb_tree_node *) __x;
  D.8747 = __x.15-_M_value_field;
  return D.8747;

}

Also, is there a particular reason the diagnostic says does break instead of
breaks ? The former may be technically correct English but sounds strange,
even if there's a corresponding may break diagnostic.


-- 
   Summary: std::set: dereferencing pointer '__x.15' does break
strict-aliasing rules
   Product: gcc
   Version: 4.4.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42077



[Bug tree-optimization/39390] [4.4 Regression] Bogus aliasing warning with std::set

2009-11-17 Thread scovich at gmail dot com


--- Comment #10 from scovich at gmail dot com  2009-11-17 11:16 ---
(In reply to comment #3)
 the warning is for dead code.  Thus this is not a
 wrong-code problem.

Just to verify, does this (and comment #7) mean that the warning is harmless
and can be ignored?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39390



[Bug bootstrap/42028] New: Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH

2009-11-13 Thread scovich at gmail dot com
Bootstrapping of gcc-4.4.2 fails on my machine because the stage 1 compiler has
a runtime dependency on mpfr and gmp, which are not in my LD_LIBRARY_PATH
because I only built them in order to compile gcc. 

Using --with-gmp, --with-mpfr and --with-build-libsubdir at configure time lets
it compile but doesn't help it run. 

Given that the inputs to configure make it pretty clear mpfr and gmp are not in
standard locations, and the finished compiler won't have any dependencies on
those libraries, I would expect the build system to ensure they are accessible
to any dependent intermediate binaries it runs.

The workaround is to set up an LD_LIBRARY_PATH, so this is more annoyance than
anything else.


-- 
   Summary: Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH
   Product: gcc
   Version: 4.4.2
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42028



[Bug bootstrap/42028] Bootstrap fails for mpfr/gmp not in LD_LIBRARY_PATH

2009-11-13 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2009-11-13 10:35 ---
Hmm.. it seems the final executable depends on mpfr and gmp as well... I could
have sworn the docs said it was a build-time dependency only.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42028



[Bug inline-asm/40124] Inline asm should support limited control flow

2009-05-13 Thread scovich at gmail dot com


--- Comment #9 from scovich at gmail dot com  2009-05-13 07:55 ---
RE: __builtin_expect -- Thanks! It did help quite a bit, even though the
compiler was already emitting not-taken branch hints on its own.

RE: Filing bugs -- I have. This RFE arose out of Bug #40078, which was
triggered by attempts to work around Bug #40067. I still have some issues with
overconservative use of branch delay slots and possibly loop pipelining, but
haven't gotten to filing them yet. I've also filed other bugs in the past where
it would have been nice to work around using inline asm but control flow was a
pain.

In the end, is there any particular reason *not* to make inline asm easier to
use and more transparent to the compiler, given points #1 and #2? Invoking
point #3, what significant uses of computed gotos exist, other than to work
around switch statements that compile suboptimally? The docs don't mention any,
and yet we have them instead of (or in addition to) bug reports. 

I'd take a stab at implementing this myself -- it's probably a one-liner -- but
I've never hacked gcc before and have no clue where that one line might lurk. 

BTW, how does one exploit the compiler's overflow catching? I tried testing a+b
 a and a+b  b (for unsigned ints) with no luck, and there's no __builtin test
for overflow or carry. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug inline-asm/40124] Inline asm should support limited control flow

2009-05-13 Thread scovich at gmail dot com


--- Comment #11 from scovich at gmail dot com  2009-05-13 09:51 ---
 If you allow inline asms to change control flow, even just 
 to labels whose address has been taken through label, you 
 penalize a lot of code which doesn't change the control 
 flow, as the compiler will have to assume each inline asm 
 which could possibly get at an label address (not just 
 directly, but through global variables, pointers etc.) can
 jump to it.

I'm going to invoke #3 again to respond to these concerns:

a. This RFE is specifically limited to local control flow only, so the compiler
can safely ignore any label not in the asm's enclosing function, as well as
labels whose addresses are never taken (or provably never used). Computed gotos
appear to make the same assumptions, based on the docs' strong warning not
allow labels to leak out of their enclosing function in any way. 

b. While it's always possible that an asm could jump to a value loaded from an
arbitrary, dynamically-generated address, the same is true for computed gotos.
Either way, compiler analysis or not, doing so would almost certainly send you
to la-la land because label values aren't known until assembler time or later
and have no guaranteed relationship with each other. The only way to get a
valid label address is using one directly, or computing it with some sort of
base+(label-base). Either way requires taking the address of the desired label
at some point and tipping off the compiler.

c. It's pretty easy to write functions whose computed gotos defy static
analysis, but most of the time the compiler does pretty well. Well-written asm
blocks should access memory via m constraints -- which the compiler can
analyze -- rather than manually dereferencing a pointer passed in with an r
constraint. This is especially true for asm blocks with no internal control
flow, which this RFE encourages. 

d. If a code path is short/simple enough that incoming jumps penalize it
heavily (whether from computed gotos or jumps from asm), it's probably also
small enough that the compiler (or programmer, if need be) can duplicate it for
the short path. A big, ugly code path probably wouldn't even notice an extra
control flow arc or two.

In the end, a big goal of this RFE is to allow programmers to make the compiler
aware of control flow arcs they're already adding (or tempted to add) behind
its back. It therefore wouldn't strike me as much of a limitation if jumps to
labels not explicitly passed to the asm are unsupported and may lead to
undefined behavior.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug inline-asm/40124] New: Inline asm should support limited control flow

2009-05-12 Thread scovich at gmail dot com
)
movqlabels.1894(%rip), %rax
jmp *%rax
.p2align 4,,7
.L5:
leaq12(%rsp), %rdx
callhandle_overflow
.L4:
movl12(%rsp), %eax
movq16(%rsp), %rbx
movq24(%rsp), %rbp
addq$32, %rsp
ret


-- 
   Summary: Inline asm should support limited control flow
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: inline-asm
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug inline-asm/40124] Inline asm should support limited control flow

2009-05-12 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2009-05-12 16:13 ---
Overflow and adc were only examples. Other instructions that set cc, or other
conditions (e.g. parity) would not have that optimization.

Another use is the ability to jump out of an inline asm to handle an uncommon
case (if writing hand-tuned asm for speed, for example).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug inline-asm/40124] Inline asm should support limited control flow

2009-05-12 Thread scovich at gmail dot com


--- Comment #4 from scovich at gmail dot com  2009-05-12 16:36 ---
I'm actually running sparcv9-sun-solaris2.10 (the examples used x86 because
more people know it and its asm is easier to read).

My use case is the following: I'm implementing high-performance synchronization
primitives and the compiler isn't generating good enough code -- partly because
it doesn't pipeline spinloops, and partly because it has no way to know what
stuff is truly critical path and what just needs to happen eventually.

Here's a basic idea of what I've been looking at:

long mcs_lock_acquire(mcs_lock* lock, mcs_qnode* me) {
 again:
/* initialize qnode, etc */
membar_producer();
mcs_qnode* pred = atomic_swap(lock-tail, me);
if(pred) {
pred-next = me;
while(int flags=me-wait_flags) {
if(flags  ERROR) {
/* recovery code */
goto again;
}
}
}
membar_enter();
return (long) pred;
}

This code is absolutely performance-critical because every instruction on the
critical path delays O(N) other threads -- even a single extra load or store
causes noticeable delays. I was trying to rewrite just the while loop above in
asm to be more efficient, but it is hard because of that goto inside.
Basically, the error isn't going anywhere once it shows up, so we don't have to
check it nearly as often as the flags==0 case, and it can be interleaved across
as many loop iterations as needed to make its overhead disappear. Manually
unrolling and pipelining the loop helped a bit, but the compiler still tended
to cluster things together more than was strictly necessary (leading to bursts
of saturated pipeline alternating with slack).

For CC stuff, especially x86-related, I bet places like fftw and gmp are good
sources of frustration to mine.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug inline-asm/40124] Inline asm should support limited control flow

2009-05-12 Thread scovich at gmail dot com


--- Comment #7 from scovich at gmail dot com  2009-05-12 17:01 ---
Isn't __builtin_expect a way to send branch prediction hints? I'm not having
trouble with that AFAIK. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40124



[Bug middle-end/37722] destructors not called on computed goto

2009-05-09 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2009-05-09 08:16 ---
Computed gotos can easily make it impossible for the compiler to call
constructors and destructors consistently. This is a major gotcha of computed
gotos for people who have used normal gotos in C++ and expect destructors to be
handled properly. Consider this program, for instance:

#include stdio.h
templateint i
struct foo {
foo() { printf(%s%d\n, __FUNCTION__, i); }
~foo() { printf(%s%d\n, __FUNCTION__, i); }
};
enum {RETRY, INSIDE, OUTSIDE, EVIL};
int bar(int idx) {
static void* const gotos[] = {RETRY, INSIDE, OUTSIDE, EVIL};
bool first = true;
{
RETRY:
foo1 f1;
if(first) {
first = false;
goto *gotos[idx];
}
INSIDE:
return 1;
}
if(0) {
foo2 f2;
EVIL:
return 2;
}
 OUTSIDE:
return 0;
}
int main() {
for(int i=RETRY; i = EVIL; i++)
printf(%d\n, bar(i));
return 0;
}

Not only does it let you jump out of a block without calling destructors, it
lets you jump into one without calling constructors:

$ g++-4.4.0 -Wall -O3 scratch.cpp  ./a.out
foo1
foo1
~foo1
1
foo1
~foo1
1
foo1
0
foo1
~foo2
2

Ideally, the compiler could analyze possible destinations of the goto
(best-effort, of course) and emit suitable diagnostics:

scratch.cpp:16: warning: computed goto bypasses destructor of 'foo1 f1'
scratch.cpp:13: warning:   declared here

scratch.cpp:23: warning: possible jump to label 'EVIL'
scratch.cpp:16: warning:   from here
scratch.cpp:22: warning:   crosses initialization of 'foo2 f2'

In this particular example the compiler should be able to figure out that no
labels reach a live f1 and call its destructor properly. If it's not feasible
to analyze the possible destinations of the computed goto, regular control flow
analysis should at least be able to identify potentially dangerous labels and
gotos, e.g.:

scratch.cpp:16: warning: computed goto may bypass destructor of 'foo1 f1'
scratch.cpp:13: warning:   declared here

scratch.cpp:23: warning: jump to label 'EVIL'
scratch.cpp:8:  warning:   using a computed goto
scratch.cpp:22: warning:   may cross initialization of 'foo2 f2'


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 CC||scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37722



[Bug c/40067] New: gcc should use brz(brnz) instead of cmp/be(bne) when possible

2009-05-08 Thread scovich at gmail dot com
Compiling the following function with -O3 gives the following assembly output:

void spin(int volatile* ptr) {
while(*ptr);
return;
}

spin:
.LLFB1:
.register   %g2, #scratch
lduw[%o0], %g1  ! 8 *zero_extendsidi2_insn_sp64/2   [length
= 1]
cmp %g1, 0  ! 9 *cmpsi_insn [length = 1]
be,pn   %icc, .LL3  ! 10*normal_branch  [length = 1]
 mov0, %g1  ! 17*movdi_insn_sp64/1  [length = 1]
.LL6:   
lduw[%o0], %g2  ! 20*zero_extendsidi2_insn_sp64/2   [length
= 1]
cmp %g2, 0  ! 22*cmpsi_insn [length = 1]
bne,pt  %icc, .LL6  ! 23*normal_branch  [length = 1]
 add%g1, 1, %g1 ! 19*adddi3_sp64/1  [length = 1]
.LL3:   
jmp %o7+8   ! 55*return_internal[length = 1]
 mov%g1, %o0! 30*movdi_insn_sp64/1  [length = 1]

Manually replacing the cmp/b* pairs with br* instructions gives 10-11% more
iterations/sec on my machine:

.global spin_brz
spin_brz:
.register %g2, #scratch
ld[%o0], %g1
brz,pn%g1, spin_brz_done
clr   %g1
spin_brz_again:
ld[%o0], %g2
brnz,pt   %g2, spin_brz_again
add   %g1, 0x1, %g1
spin_brz_done:
retl
mov   %g1, %o0
.size   spin_brz, .- spin_brz


-- 
   Summary: gcc should use brz(brnz) instead of cmp/be(bne) when
possible
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: sparc-sun-solaris2.10


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067



[Bug middle-end/40067] gcc should use brz(brnz) instead of cmp/be(bne) when possible

2009-05-08 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2009-05-08 09:38 ---
Sorry, the C code should have been:

long spin(int volatile* ptr) {
long rval=0;
while(*ptr) rval++;
return rval;
}


-- 

scovich at gmail dot com changed:

   What|Removed |Added

  Component|c   |middle-end
Version|unknown |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067



[Bug target/40067] use brz instead of cmp/be with 32-bit values

2009-05-08 Thread scovich at gmail dot com


--- Comment #3 from scovich at gmail dot com  2009-05-08 11:30 ---
   What|Removed |Added

 GCC target triplet|sparc-sun-solaris2.10   |sparc64-sun-solaris2.10

I think this affects 32-bit sparc as well, unless the br* instructions are new
in sparcv9 (they don't seem to be). The only difference with v9 seems to be
that 32-bit code needs to use ldsw to sign-extend if needed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40067



[Bug middle-end/40078] New: passing label to inline asm i constraint generates bad code

2009-05-08 Thread scovich at gmail dot com
Somewhat to my surprise, the gcc accepts the following inline asm syntax:

asm(jmp %0 : : i(some_label));

The output is what you'd expect: assuming some_label (in C/C++) is associated
with the assembler label .LLBF4 gives:

jmp .LLBF4

Unfortunately, the optimizer plays havoc with things by happily eliminating the
code associated with that label if it is otherwise unused. Consider the
following code: 

static inline int foo() { return 10; }
int could_be_anything;
long test_label(int volatile* ptr) {
int rval = 0;
int dummy = 5;
static void* const gotos[2] = {DONE, ERROR};
asm volatile(jmp %0\n\t nop: :i(ERROR),i(foo),r(dummy));
//goto *gotos[could_be_anything];
 DONE:
return rval;
 ERROR:
rval = 1;
goto DONE;
}

This function should return 1 after jumping from ERROR to DONE. Instead, the
code for ERROR is eliminated by the optimizer; you either get a return value of
zero or an infinite loop depending on whether the label started below or above
the asm block (I get the latter):

foo:
jmp %o7+8
 mov10, %o0
.size   foo, .-foo
test_label:
.LL4:
.LL5:
mov 5, %g1
! 8 test_label.c 1
jmp .LL4
 nop
! 0  2
jmp %o7+8
 mov0, %o0
.size   test_label, .-test_label


The compiler correctly recognizes that both dummy and foo are in use and does
not eliminate them, but ERROR gets the axe unless the computed goto is enabled. 

In theory the inconsistency should be easy to fix -- just mark the label as
in-use if it gets passed to a live inline asm block (exactly how functions and
variables are currently treated). If for some reason that's impossible or
undesirable, it should at least generate a diagnostic.


-- 
   Summary: passing label to inline asm i constraint  generates
bad code
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: sparc-sun-solaris2.10


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40078



[Bug middle-end/40078] passing label to inline asm i constraint generates bad code

2009-05-08 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2009-05-08 23:24 ---
Sorry to bring this back up, but I'm not sure if comments show up in a
meaningful way on closed bugs...

1. where does is it documented that inline asm can't change control flow? I
can't find it in the info pages, nor anywhere in google except this bug and
another which was also resolved-invalid with a comment that it's clearly
commented. The docs say you can do control flow within a single asm (if your
assembler supports local labels), but the only other mention is the part that
says you can't jump between asm blocks because the compiler has no way to know
that you did it.

2. It makes sense that anything related to stack frames (ret, call) would be a
snake pit, but is there some reason why local gotos are inherently unsafe?
Unlike an asm-asm jump, the compiler knows all the places control might go (you
can only jump out once, after all), and presumably users wouldn't pass in
labels they don't intend to use. It seems like the compiler could just treat
the asm block accepting labels as a basic block containing a computed goto --
control could fall out the bottom of the block or jump any of the labels which
were passed in. 

3. Supporting local gotos would help work around the annoyance of getting
condition codes out of an asm block efficiently -- pass in the label to a
branch instruction and voila!

In any case, I'm happy to accept a, go away, but would be extremely
interested to hear the reasons behind this limitation given that it seems so
close to working on accident.


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40078



[Bug libgomp/29986] testsuite failures

2008-04-09 Thread scovich at gmail dot com


--- Comment #4 from scovich at gmail dot com  2008-04-09 15:18 ---
If it's any help, adding some inline asm to the file makes the Sun toolchain
croak on my machine.

SunOS 5.10 Generic_118833-23 sun4v sparc SUNW,Sun-Fire-T200
Sun C 5.9 SunOS_sparc Patch 124867-01 2007/07/12
Solaris Link Editors: 5.10-1.482

// begin tls-bug.c
void membar_producer() { asm volatile(membar #StoreStore); }
static __thread bool val;
int main() { return val; }
// end tls-bug.c

This bug seems to show up in arbitrary ways for each of the three compilers on
my machine:
$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-01 2007/07/12
$ gcc -v
Reading specs from /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/specs
Configured with:
/gates/sfw10/builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure
--prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as
--with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++
--enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
$ ~/apps/gcc/4.3/bin/gcc-4.3 -v
Using built-in specs.
Target: sparc64-sun-solaris2.10
Configured with: ../configure --prefix=/export/home/ryanjohn/apps/gcc/4.3
--build=sparc64-sun-solaris2.10 --program-suffix=-4.3
--with-mpfr=/export/home/ryanjohn/apps --with-gmp=/export/home/ryanjohn/apps
--disable-multilib --with-as=/usr/ccs/bin/as --without-gnu-as
--with-ld=/usr/ccs/bin/ld --without-gnu-ld
Thread model: posix
gcc version 4.3.0 (GCC) 

Note that all three use the same copy of ld

$ cc tls-bug.c
$ cc -g tls-bug.c

$ CC tls-bug.c
ld: fatal: relocation error: R_SPARC_TLS_GD_HI22: file tls-bug.o:
symbolunknown: bad symbol type SECT: symbol type must be TLS
$ CC -g tls-bug.c

$ gcc -m64 tls-bug.c
$ gcc -m64 -g tls-bug.c
ld: fatal: relocation error: R_SPARC_TLS_DTPOFF64: file /var/tmp//ccuJHWqp.o:
symbol done: offset 0x7d901c33 is non-aligned
collect2: ld returned 1 exit status

$ gcc-4.3 tls-bug.c
ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//ccUeK1AZ.o:
symbol unknown: bad symbol type SECT: symbol type must be TLS
collect2: ld returned 1 exit status
$ gcc-4.3 tls-bug.c -g
ld: fatal: relocation error: R_SPARC_TLS_LE_HIX22: file /var/tmp//cceRP4ZP.o:
symbol unknown: bad symbol type SECT: symbol type must be TLS
collect2: ld returned 1 exit status


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29986



[Bug c++/34184] Scope broken for inherited members inside template class?

2007-12-11 Thread scovich at gmail dot com


--- Comment #4 from scovich at gmail dot com  2007-12-11 17:27 ---
(In reply to comment #3)
 Note you can declare a specialization of fooT::bar which shows that the code
 is really dependent.
 

Duh! That's perfect. Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184



[Bug target/34115] atomic builtins not supported on i686?

2007-11-28 Thread scovich at gmail dot com


--- Comment #9 from scovich at gmail dot com  2007-11-28 14:20 ---
(In reply to comment #8)
 (In reply to comment #7)
  Too bad they aren't defined for any machine I've tried so far...
 
 The explanation is very simple: the new macros are implemented only in 
 mainline
 (would be 4.3.0).
 
Any chance of backporting? (I know, probably not)

The only question left is whether the compiler is supposed to emit a warning
when it doesn't support the intrinsics (like the docs say) or whether the user
should just be ready for linker errors.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115



[Bug target/34115] atomic builtins not supported on i686?

2007-11-27 Thread scovich at gmail dot com


--- Comment #7 from scovich at gmail dot com  2007-11-28 01:56 ---
(In reply to comment #2)
 I think this is essentially invalid. Note that now we also have the various 
 __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros:
 
   http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
 

Too bad they aren't defined for any machine I've tried so far...
ia64-linux-gnu (4.1.2 Debian)
x86_64-unknown-linux-gnu (4.2.0)
sparc-sun-solaris2.10 (4.1.1)
powerpc64-unknown-linux-gnu (4.1.2 Gentoo)
i686-pc-cygwin (4.2.2)

All these actually *do* support CAS, and emit perfectly respectable .asm... as
long as you don't wrap them in any #ifdef's.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115



[Bug c++/34184] Scope broken for inherited members inside template class?

2007-11-22 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2007-11-23 02:06 ---
Subject: Re:  Scope broken for inherited members inside template class?

On 22 Nov 2007 21:03:11 -, pinskia at gcc dot gnu dot org
[EMAIL PROTECTED] wrote:
 The issue comes down to if bar is dependent here and if so is baz's base.

 The namelookup rules for being dependent are weird and hard to understand
 really and actually changes namelookup in some cases so we have to go what the
 standard says.


Somehow I'm not surprised the standard is confusing on this point...
my (unqualified) opinion is that bar and baz don't depend on foo's
template parameter. While fooint::bar is certainly not the same
class as foochar::bar, bar's relationship to baz is always the same
for any instantiation -- part of the template itself.

Imagine that, instead of declaring bar and baz inside templateclass
T foo we put them inside an anonymous namespace. What aspect of name
lookup has changed so that baz can suddenly find 'bar::i' without
hand-holding?

I just found an interesting phenomenon -- if baz declares it is using
bar::i life is suddenly good. Apparently the compiler is able to
determine that bar is, in fact, a parent class of baz, and that the
using clause is valid. However, it doesn't notice a problem with
using bar::j until you actually instantiate a foo*::baz. Given
that the compiler does know bar is a base class of baz, shouldn't any
name lookup within baz check for matching members of bar?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184



[Bug c++/34184] New: Scope broken for inherited members inside template class?

2007-11-21 Thread scovich at gmail dot com
The following code fails to compile 

this.cpp: In member function 'int fooT::baz::foo()':
this.cpp:8: error: 'i' was not declared in this scope

// begin this.cpp
template class T
struct foo {
  struct bar {
int i;
  };
  struct baz : bar {
int foo() { return i; }
  };
};

int main() { }
// end this.cpp

Changing it to 'this-val' solves the problem, but is unwieldy for classes with
lots of members. I'm unsure what the Standard says, but I thought you only
needed 'this-' when the member depends on information the compiler won't have
until template instantiation time. However, that doesn't really apply here --
foo and bar do not depend on the template's type, so the compiler should be
able to figure things out well before the template gets instantiated.

FWIW Sun's CC accepts the code with no warnings. It's usually much more strict
than gcc (to the point of being really frustrating). Even if the Standard says
gcc is right, it would be very convenient if gcc matched CC on this
extension.


-- 
   Summary: Scope broken for inherited members inside template
class?
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34184



[Bug c/34115] New: atomic builtins not supported on i686?

2007-11-15 Thread scovich at gmail dot com
Linking fails for the program below, with the error:

undefined reference to `___sync_val_compare_and_swap_4'

// gcc -Wall atomic.c
int main() {
  int *a, b, c;
  return __sync_val_compare_and_swap(a, b, c);
}

According to the atomic builtins docs (), Not all operations are supported by
all target processors. If a particular operation cannot be implemented on the
target processor, a warning will be generated and a call an external function
will be generated. The external function will carry the same name as the
builtin, with an additional suffix `_n' where n is the size of the data type.

If CAS is not supported, how come I don't get a warning? Why would i686 *not*
support compare and swap? The cmpxchg instruction has been around since 80486,
according to the intel IA-32 processor manual. 

Also, does an unsupported builtin mean the user is responsible to write that
function, or simply that the compiler must make a function call to synthesize
its behavior?

FWIW, my x86_64 cross-compile gcc 4.2.0 handles it fine, emitting a
lock+cmpxchg pair.


-- 
   Summary: atomic builtins not supported on i686?
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
  GCC host triplet: i686-pc-cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115



[Bug target/34115] atomic builtins not supported on i686?

2007-11-15 Thread scovich at gmail dot com


--- Comment #5 from scovich at gmail dot com  2007-11-16 01:00 ---
Subject: Re:  atomic builtins not supported on i686?

On 15 Nov 2007 23:53:06 -, joseph at codesourcery dot com
[EMAIL PROTECTED] wrote:
  Because the default arch for i686-linux-gnu is i386.
 Which is a stupid inconsistency and arguably a bug.

++

BTW, -march=i686 works beautifully. Close the bug? or rename it as a
RFE to have i686-* default to -march=i686?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115



[Bug target/34115] atomic builtins not supported on i686?

2007-11-15 Thread scovich at gmail dot com


--- Comment #6 from scovich at gmail dot com  2007-11-16 01:04 ---
(In reply to comment #5)
 Subject: Re:  atomic builtins not supported on i686?
 
 On 15 Nov 2007 23:53:06 -, joseph at codesourcery dot com
 [EMAIL PROTECTED] wrote:
   Because the default arch for i686-linux-gnu is i386.
  Which is a stupid inconsistency and arguably a bug.
 
 ++
 
 BTW, -march=i686 works beautifully. Close the bug? or rename it as a
 RFE to have i686-* default to -march=i686?
 

Oh, and is there supposed to be a warning about unsupported atomic ops or not?
If not the docs should say to expect a linker error instead (and also
mention/link those macros Paolo pointed out).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34115



[Bug debug/32990] [Regression] gdb has symbol table issues

2007-08-13 Thread scovich at gmail dot com


--- Comment #7 from scovich at gmail dot com  2007-08-13 21:10 ---
(In reply to comment #6)
 Sorry, my mistake.  I meant readelf -wi (lowercase I).
 

Unfortunately, I recompiled with 4.1 to get on with debugging, and also updated
to 20070810 later that day. Now the bug won't cooperate and show up any more.
Maybe the changes over the last three weeks fixed the problem? 

Also unfortunately, I will lose access to the code once my internship ends this
week.  It might be best to close this bug or leave it in WAITING as a
placeholder in case anyone else sees the same thing in an easier-to-replicate
context...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990



[Bug debug/32990] [Regression] gdb has symbol table issues

2007-08-10 Thread scovich at gmail dot com


--- Comment #3 from scovich at gmail dot com  2007-08-10 16:20 ---
Created an attachment (id=14050)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14050action=view)
Output of readelf -wf

I'm attaching the output of `readelf -wf' This time around some of offending PC
are 0x41ac8c, 0x41bc1c, 0x41bc2d, 0x41bc44, 0x41bc45, 0x41bc55, 0x41bc56,
0x41bc63, 0x41bc64.

Also in case it helps, `readelf -a' prints the following warning/error
messages:
readelf: Warning: There is a hole [0xe1fc - 0xe238] in .debug_loc section.
readelf: Warning: There is a hole [0x100dc - 0x10118] in .debug_loc section.
readelf: Warning: There is a hole [0x13860 - 0x1389c] in .debug_loc section.
readelf: Warning: There is a hole [0x138ac - 0x138e8] in .debug_loc section.
readelf: Warning: There is a hole [0x13c3c - 0x13c78] in .debug_loc section.
readelf: Warning: There is a hole [0x13f34 - 0x13f70] in .debug_loc section.
readelf: Warning: There is a hole [0x13f80 - 0x13fbc] in .debug_loc section.
readelf: Warning: There is a hole [0x14148 - 0x14184] in .debug_loc section.
readelf: Warning: There is a hole [0x15908 - 0x15944] in .debug_loc section.
readelf: Warning: There is a hole [0x16618 - 0x16654] in .debug_loc section.
readelf: Warning: There is a hole [0x17f54 - 0x17f90] in .debug_loc section.
readelf: Warning: There is a hole [0x17fec - 0x18028] in .debug_loc section.
readelf: Warning: There is a hole [0x1824c - 0x18288] in .debug_loc section.
readelf: Warning: There is a hole [0x184ac - 0x184e8] in .debug_loc section.
readelf: Warning: There is a hole [0x18590 - 0x185cc] in .debug_loc section.
readelf: Warning: There is a hole [0x22a08 - 0x22a44] in .debug_loc section.
readelf: Warning: There is a hole [0x232f0 - 0x2332c] in .debug_loc section.
readelf: Warning: There is a hole [0x26944 - 0x26980] in .debug_loc section.
readelf: Warning: There is a hole [0x29320 - 0x2935c] in .debug_loc section.
readelf: Warning: There is a hole [0x29878 - 0x298b4] in .debug_loc section.
readelf: Warning: There is a hole [0x29910 - 0x2994c] in .debug_loc section.
readelf: Error: Range lists in .debug_info section aren't in ascending order!
readelf: Warning: There is a hole [0x50 - 0xb0] in .debug_ranges section.
readelf: Warning: There is an overlap [0x2fe0 - 0x50] in .debug_ranges section.
readelf: Warning: There is a hole [0xb0 - 0x3010] in .debug_ranges section.
readelf: Warning: There is an overlap [0x30b0 - 0x2fe0] in .debug_ranges
section.
readelf: Warning: There is a hole [0x3010 - 0x56e0] in .debug_ranges section.
readelf: Warning: There is a hole [0x7610 - 0x76d0] in .debug_ranges section.
readelf: Warning: There is an overlap [0x7700 - 0x7610] in .debug_ranges
section.
readelf: Warning: There is a hole [0x76d0 - 0x9b40] in .debug_ranges section.
readelf: Warning: There is an overlap [0xd700 - 0x9a20] in .debug_ranges
section.
readelf: Warning: There is a hole [0x9b40 - 0xd700] in .debug_ranges section.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990



[Bug debug/32990] [Regression] gdb has symbol table issues

2007-08-10 Thread scovich at gmail dot com


--- Comment #5 from scovich at gmail dot com  2007-08-10 16:50 ---
Murphy strikes again -- 5 minutes after closing this bug it popped back up in
spite of a clean compile. Apparently `make clean' can change which PC causes
complaints but doesn't necessarily fix the problem. 


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990



[Bug debug/32990] [Regression] gdb has symbol table issues

2007-08-10 Thread scovich at gmail dot com


--- Comment #4 from scovich at gmail dot com  2007-08-10 16:39 ---
The problem comes from adding a member function to a header file and only
recompiling some of the source files that include it (make depend missed
something). It looked like a regression because changing versions of gcc
required a clean recompile.


-- 

scovich at gmail dot com changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990



[Bug c++/32990] New: [Regression] gdb has symbol table issues

2007-08-04 Thread scovich at gmail dot com
When debugging code produced by g++-4.3.0-20070716 the debugger regularly
outputs the following error message when stopping at breakpoints or examining
stack frames:

error: warning: (Internal error: pc 0x419e59 in read in psymtab, but not in
symtab.) 

Compiling the same code with g++-4.1.2 and running the same breakpoints results
in no problems. I'm using gdb-6.6-debian, if that's any help.

Unfortunately I have no idea how to narrow the test case down, and am not
allowed to submit my program (it's from work). Some searching on Google
indicates that .linkonce functions might be part of the issue (I have tons of
those), but other than that I'm at a loss to narrow down the problem.  If
anyone has ideas I'm happy to try them out and report back.


-- 
   Summary: [Regression] gdb has symbol table issues
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32990



[Bug c++/32992] New: Incorrect code generated for anonymous union

2007-08-04 Thread scovich at gmail dot com
Compiling and running the code below produces the following output:
$ g++ -Wall -DT=long -g3 union-bug.C  ./a.out
array=   0x7fff11782ef0
a=   0x7fff11782f20
B={1,3,5}
A={-1719443200,4196007,-1719451616}

A and B should contain the same values, but A contains garbage instead because
the two members of the union do not reside at the same address.

Changing T 'int' produces the expected output:
$ g++ -Wall -DT=int -g3 union-bug.C  ./a.out
array=   0x7fff1fbf6370
a=   0x7fff1fbf6370
B={1,3,5}
A={1,3,5}

'char' and 'short' also work correctly; '__m128i' (SSE register) breaks. 

This bug affects both gcc-4.1.2 and gcc-4.3

// union-bug.C
#include cstdio
struct A {
  T _a;
  T _b;
  T _c;
};
struct B {
  T _array[3];
  operator A() {
union {
  T array[3];
  A a;
};
printf(array=   %p\na=   %p\n, array, a);
for(int i=0; i  3; i++)
  array[i] = _array[i];
return a;
  }
};
int main() {
  B b = {{1,3,5}};
  A a = b;
  printf(B={%d,%d,%d}\n, (int) b._array[0], (int) b._array[1], (int)
b._array[2]);
  printf(A={%d,%d,%d}\n, (int) a._a, (int) a._b, (int) a._c);
  return 0;
}


-- 
   Summary: Incorrect code generated for anonymous union
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32992



[Bug c++/32912] New: [Regression] ICE with C99 compound literal expression

2007-07-27 Thread scovich at gmail dot com
Compiling the test case below gives the following ICE:

 bug.C: In function 'void bar()':
 bug.C:30: internal compiler error: in build_int_cst_wide, at tree.c:890
 Please submit a full bug report,

I think this might be related to bug 20103, except that gcc-4.1 handles the
test case just fine. The test case also compiles in 4.3 with -O{0,1,s} instead
of -O{2,3}. -ftree-pre is the culprit flag -- removing it from -O2 or adding it
to -O1 toggles the bug.

// g++-4.3-20070716 -msse3 -O2 bug.C
#include emmintrin.h

// Must be a vector, not a scalar   
#if 0
typedef long v2d;
#else
typedef __m128i v2d;
#endif

v2d rval();
v2d g;
struct A { // Must have 2+ members  
   v2d a;
   v2d b;
};
struct B { // Need a struct containing an A 
   A a;
};
struct C {
   operator A() {
  v2d l;
  A a;
  // Must compute (a ^ ~b). Neither (a ^ b) nor (a + ~b) breaks.
  a.a ^= ~a.b; // globals, locals, and rvals don't break
  a.a ^= ~(v2d) {0,0}; // members and compound literals do  
  return a;
   }
};
void foo(B);
void bar() {
   foo((B){C()});
}


-- 
   Summary: [Regression] ICE with C99 compound literal expression
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32912



[Bug rtl-optimization/32725] Unnecessary reg-reg moves

2007-07-26 Thread scovich at gmail dot com


--- Comment #6 from scovich at gmail dot com  2007-07-26 22:51 ---
I've observed several more pieces of code where this bug comes up, and it
always seems to be a case of (a) the compiler duplicating a register to
preserve the value after a 2-operand insn can clobbers the original, then (b)
later failing to notice that the other use(s) got optimized away, never
existed, or were reads that got scheduled before the clobber. 

Perhaps a register renaming pass later in the compilation process might solve
the issue? (I don't know how much that would slow down compilation, though). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725



[Bug c++/32870] New: Unclear error message when declaring struct in wrong namespace

2007-07-23 Thread scovich at gmail dot com
Compiling this code:

struct Foo {
  struct Bar;
  Foo();
};
namespace Baz {
  Foo::Foo() { }
  struct Foo::Bar { };
}

Gives the following two error messages:
test.C:6: error: definition of 'void Foo::Foo()' is not in namespace enclosing
'Foo'
test.C:7: error: declaration of 'struct Foo::Bar' in 'Baz' which does not
enclose 'struct Foo'

The first error is nice and clear; the second would be much easier to
understand  quickly if it also identified 'Baz' as a namespace.

Note: this bug dates back at least as far as g++-3.4.4 (tested on 3.4.4, 4.1.2,
4.2.0 and 4.3-20070716)


-- 
   Summary: Unclear error message when declaring struct in wrong
namespace
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32870



[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args

2007-07-11 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2007-07-11 15:03 ---
(In reply to comment #1)
 Confirmed, not a regression.
 

Also affects 4.3. Changing target


-- 

scovich at gmail dot com changed:

   What|Removed |Added

Version|4.1.2   |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661



[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args

2007-07-11 Thread scovich at gmail dot com


--- Comment #3 from scovich at gmail dot com  2007-07-11 15:10 ---
This bug also causes _mm_cvtsi128_si64x() (which calls
__builtin_ia32_vec_ext_v2di) to emit suboptimal code.

// g++-4.3-070710 -mtune=core2 -O3 -S -dp
#include emmintrin.h
long vector2long(__m128i* src) { return _mm_cvtsi128_si64x(*src); }

Becomes

_Z11vector2longPU8__vectorx:
.LFB529:
movdqa  (%rdi), %xmm0   # 6 *movv2di_internal/2 [length = 3]
movd%xmm0, %rax # 25*movdi_1_rex64/14   [length = 4]
ret # 28return_internal [length = 1]

This might be related to bug 32708 (and therefore have a similar fix?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661



[Bug middle-end/32729] New: Loop unrolling not performed with large constant loop bound

2007-07-11 Thread scovich at gmail dot com
Consider the following functions:

// g++ -mtune=core2 -O3 -S -dp
void loop(int* dest, int* src, int count) {
  for(int i=0; i  count; i++)
dest[i] = src[i];
}
void loop_few(int* dest, int* src) { loop(dest, src, 8); }
void loop_many(int* dest, int* src) { loop(dest, src, 64); }

loop() unrolls 8x, as expected. loop_few() peels completely, as expected.
However, loop_many() neither peels nor unrolls. 

_Z9loop_manyPiS_:
xorl%edx, %edx  # 34*movdi_xor_rex64[length = 2]
.L47:
movl(%rsi,%rdx,4), %eax # 11*movsi_1/1  [length = 3]
movl%eax, (%rdi,%rdx,4) # 12*movsi_1/2  [length = 3]
incq%rdx# 13*adddi_1_rex64/1[length = 3]
cmpq$64, %rdx   # 15cmpdi_1_insn_rex64/1[length = 4]
jne .L47# 16*jcc_1  [length = 2]
rep ; ret   # 35return_internal_long[length = 1]

Ideally the optimizer would unroll 8x, then notice that (count%8==0) and
eliminate the partial unroll code. However, even a stock unroll would be better
than nothing.


-- 
   Summary: Loop unrolling not performed with large constant loop
bound
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729



[Bug middle-end/32729] Loop unrolling not performed with large constant loop bound

2007-07-11 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2007-07-11 16:36 ---
(In reply to comment #0)
 // g++ -mtune=core2 -O3 -S -dp
Oops... that doesn't actually unroll loop() all, though it still peels
loop_few().

Adding -funroll-loops (supposedly enabled by -O3?) unrolls loop()
Adding -funroll-all-loops does nothing

Nested loops also have issues:

void nested_loop(int* dest, int* src) {
  for(int i=0; i  2; i++)
for(int j=0; j  2; j++)
  dest[4*i+j] = src[4*j+i];
}

becomes

_Z11nested_loopPiS_:
.LFB533:
xorl%edx, %edx  # 39*movdi_xor_rex64[length = 2]
.L47:
movl(%rsi), %ecx# 13*movsi_1/1  [length = 2]
movl%ecx, (%rdi,%rdx,4) # 14*movsi_1/2  [length = 3]
movl16(%rsi), %eax  # 15*movsi_1/1  [length = 3]
addq$4, %rsi# 17*adddi_1_rex64/1[length = 4]
movl%eax, 4(%rdi,%rdx,4)# 16*movsi_1/2  [length = 4]
addq$4, %rdx# 18*adddi_1_rex64/1[length = 4]
cmpq$8, %rdx# 20cmpdi_1_insn_rex64/1[length = 4]
jne .L47# 21*jcc_1  [length = 2]
rep ; ret   # 40return_internal_long[length = 1]


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729



[Bug middle-end/32729] Regression: Loop unrolling not performed with large constant loop bound

2007-07-11 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2007-07-11 16:37 ---
Regression: gcc-4.1.2 outputs the expected code for all test cases


-- 

scovich at gmail dot com changed:

   What|Removed |Added

Summary|Loop unrolling not performed|Regression: Loop unrolling
   |with large constant loop|not performed with large
   |bound   |constant loop bound


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729



[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args

2007-07-11 Thread scovich at gmail dot com


--- Comment #6 from scovich at gmail dot com  2007-07-11 20:27 ---
(In reply to comment #5)
 SImode moves will be a bit harder, because shufps insn pattern is involved in
 the vector expansion.

IIRC, shufps takes 3 cycles on Core2
(http://www.agner.org/optimize/instruction_tables.pdf), even without the
operand type mismatch (does that still exist?). That's =4 cycles.

Storing the vector to stack and load the desired entry would take =4 cycles,
even without Intel's store-load optimizations, and I imagine the optimizer
would be able to deal with it better.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661



[Bug middle-end/32725] New: Unnecessary reg-reg moves

2007-07-10 Thread scovich at gmail dot com
Compiling the following code

// g++-4.3-070710 -O3 -msse3 -mtune=core2 -S
#include emmintrin.h
typedef unsigned long long u64;
void foo(int* dest, unsigned short* src, long* indexes, __m128i _m1, __m128i
_e, __m128i _m2) {
  // required by the API, and makes the bug worse   
   u64 e = _mm_cvtsi128_si64x(_e);
   u64 m1 =_mm_cvtsi128_si64x(_m1);
   u64 m2 = _mm_cvtsi128_si64x(_m2);

   for(long i=0; i  3; i++) {
  u64 data = src[indexes[i]];
  __uint128_t result = (__uint128_t) (data  m1) * e;
  dest[i] = (result  64)  m2;
   }
}

Produces redundant reg-reg moves

_Z3fooPiPtPlU8__vectorxS2_S2_:
.LFB527:
pushq   %rbx
.LCFI0:
movq%rdx, %r11
movd%xmm1, %r10
movd%xmm0, %r8
movd%xmm2, %r9
movq(%rdx), %rax
movzwl  (%rsi,%rax,2), %eax
movq%rax, %rbx   1
andq%r8, %rbx
movq%rbx, %rax   2
mulq%r10
movl%r9d, %eax   3
andl%edx, %eax
movl%eax, (%rdi)
movq8(%r11), %rax
movzwl  (%rsi,%rax,2), %eax
movq%rax, %rbx   1
andq%r8, %rbx
movq%rbx, %rax   2
popq%rbx
mulq%r10
movl%r9d, %eax   3
andl%edx, %eax
movl%eax, 4(%rdi)
movq16(%r11), %rax
movzwl  (%rsi,%rax,2), %eax
andq%rax, %r8Almost what 1 should be
movq%r8, %rax2
mulq%r10 
andl%edx, %r9d   Essentially what 3 should be
movl%r9d, 8(%rdi)
ret

The output of a single iteration should look something like this (33% fewer
instructions):
movq8(%r11), %rax
movzwl  (%rsi,%rax,2), %eax
andq%r8, %rax
mulq%r10
andl%r9d, %edx
movl%edx, 4(%rdi)

Methinks cases (2) and (3) are related to bug 15158 and bug 21202, but that
case (1) is something else.

There's also that odd choice to use %rbp, even though there are plenty of
call-clobber regs to use instead...


-- 
   Summary: Unnecessary reg-reg moves
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725



[Bug middle-end/32708] New: _mm_cvtsi64x_si128() and _mm_cvtsi128_si64x() inefficient

2007-07-09 Thread scovich at gmail dot com
Consider the following functions (compiled with g++-4.1.2 -msse3 -O3):
#include emmintrin.h
__m128i int2vector(int i) { return _mm_cvtsi32_si128(i); }
int vector2int(__m128i i) { return _mm_cvtsi128_32(i); }
__m128i long2vector(long long i) { return _mm_cvtsi64x_si128(i); }
long long vector2long(__m128i) { return _mm_cvtsi128_si64x(i); }

They become:

_Z10int2vectori:
movd%edi, %xmm0
ret
_Z10vector2intU8__vectorx:
movd%xmm0, %rax
movq%xmm0, -16(%rsp)
ret
_Z11long2vectorx:
movd%rdi, %mm0
movq%rdi, -8(%rsp)
movq2dq %mm0, %xmm0
ret
_Z11vector2longU8__vectorx:
movd%xmm0, %rax
movq%xmm0, -16(%rsp)
ret

long2vector() should use a simple MOVQ instruction the way int2vector() uses
MOVD. It appears that the reason for the stack access is that the original code
used a reg64-mem-mm-xmm path, which the optimizer partly noticed;
gcc-4.3-20070617 leaves the full path in place.

Also, do the vector2X() functions really need to access the stack?

Finally, I've noticed several places where instructions involving 64-bit values
use the d/l suffix (e.g. long i = 0 == xorl %eax, %eax), or 32-bit
operations that use 64-bit registers (e.g. movd %xmm0, %rax above). Are those
generally features, bugs, or a who cares?


-- 
   Summary: _mm_cvtsi64x_si128() and _mm_cvtsi128_si64x()
inefficient
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32708



[Bug middle-end/32711] New: Regression: ICE when using inline asm constraint X

2007-07-09 Thread scovich at gmail dot com
Compiling the following functions with gcc-4.{1,2,3} results in an ICE.
gcc-3.4.4 does not ICE:

#include emmintrin.h
static inline
__m128i my_asm(__m128i a, __m128i b) {
   __m128i result;
   asm(pshufb\t%1,%0 : =x(result) : X(b), 0(a));
   return result;
}
__m128i foo(__m128i src) {
  return my_asm(src, _mm_set1_epi32(1));
}

If the inline asm is called directly (not through an inline function) or if the
X constraint changes to mx everything works fine.


-- 
   Summary: Regression: ICE when using inline asm constraint X
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32711



[Bug middle-end/32711] Regression: ICE when using inline asm constraint X

2007-07-09 Thread scovich at gmail dot com


--- Comment #2 from scovich at gmail dot com  2007-07-09 23:27 ---
(In reply to comment #1)
 X constraint means anything matches.  Now why we are ICEing is a bit weird.

I started using it because g doesn't seem to allow xmm references.
Fortunately, xm seems to have the desired effect.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32711



[Bug middle-end/32662] Significant extra code generation for 64x64=128-bit multiply

2007-07-07 Thread scovich at gmail dot com


--- Comment #3 from scovich at gmail dot com  2007-07-07 14:55 ---
While it's nice that the new optimization framework can eliminate the redundant
IMUL instruction(s), why were they being generated in the first place? 

Compiling the simpler foo() without optimizations gives:

_Z3fooPyPKyy:
.LFB2:
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
pushq   %rbx
.LCFI2:
movq%rdi, -16(%rbp)
movq%rsi, -24(%rbp)
movq%rdx, -32(%rbp)
movq-24(%rbp), %rax
movq(%rax), %rax
movq%rax, %rcx
movl$0, %ebx   here
movq-32(%rbp), %rax
movl$0, %edx   here
movq%rbx, %rsi
imulq   %rax, %rsi here
movq%rdx, %rdi
imulq   %rcx, %rdi here
addq%rdi, %rsi
mulq%rcx
addq%rdx, %rsi
movq%rsi, %rdx
movq%rdx, %rax
xorl%edx, %edx
movq%rax, %rdx
movq-16(%rbp), %rax
movq%rdx, (%rax)
popq%rbx
leave
ret


Barring something really strange it seems like this problem could/should be
fixed at its source, even for 4.1/4.2

Reopen?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32662



[Bug middle-end/32660] New: ICE using __builtin_ia32_vec_ext_v2di()

2007-07-06 Thread scovich at gmail dot com
Attempting to compile the following function causes an ICE:

#include emmintrin.h
long foo(__m128i val)
return __builtin_ia32_vec_ext_v2di(val, 1)
}

Changing the call to any of the following compiles just fine:
__builtin_ia32_vec_ext_v2di(val, 0)
__builtin_ia32_vec_ext_v4si(val, 1)
__builtin_ia32_vec_ext_v8hi(val, 1)


-- 
   Summary: ICE using __builtin_ia32_vec_ext_v2di()
   Product: gcc
   Version: 4.1.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: scovich at gmail dot com
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32660



[Bug middle-end/32660] ICE using __builtin_ia32_vec_ext_v2di()

2007-07-06 Thread scovich at gmail dot com


--- Comment #1 from scovich at gmail dot com  2007-07-06 23:11 ---
Oops.. forgot to include the error message

g++ -Wall -msse3 -O3 -S union-bug.C
union-bug.C: In function ‘long int foo(long long int __vector__)’:
union-bug.C:4: error: unrecognizable insn:
(insn 12 7 13 0 (set (reg:DI 61)
(vec_select:DI (reg/v:V2DI 60 [ val ])
(parallel [
(const_int 1 [0x1])
]))) -1 (nil)
(nil))
union-bug.C:4: internal compiler error: in extract_insn, at recog.c:2084


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32660



  1   2   >