from:"sgunderson at bigfoot dot com"

[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage

2018-07-04 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214

--- Comment #2 from sgunderson at bigfoot dot com ---
OK, starting a reduce that also checks for no -Wreturn-type warnings.

[Bug c++/81668] LTO ODR warnings are not helpful

2018-06-19 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #12 from sgunderson at bigfoot dot com ---
The spurious warning seems to be gone in GCC 8.

[Bug tree-optimization/86214] New: [8 Regression] Strongly increased stack usage

2018-06-19 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214

Bug ID: 86214
   Summary: [8 Regression] Strongly increased stack usage
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Created attachment 44296
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44296=edit
Test case

Hi,

We noticed that MySQL does not pass its test suite when compiled with GCC 8; it
runs out of stack. (GCC 7 is fine.) A reduced test case is included (mostly by
C-Reduce, but it needed some help by hand); most of it appears to be fluff that
keeps the compiler from just optimizing away the entire thing, but the gist of
it seems to be that it inlines the bg::bl() function several times without
caring that it balloons the stack size, and then doesn't manage to shrink the
stack again by overlapping variables. Putting the noinline attribute on
bg::bl() seems to be a workaround for now.

For comparison:

> g++-7 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i
stack.i: In function ‘void c()’:
stack.i:34:6: warning: stack usage is 8240 bytes [-Wstack-usage=]
 void c() {
  ^

> g++-8 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i  
>
stack.i: In function ‘void c()’:
stack.i:34:6: warning: stack usage is 32816 bytes [-Wstack-usage=]
 void c() {
  ^

The actual, unreduced file can be found at
https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/row/row0ins.cc#L926
(the line is positioned on a function whose adding noinline helps, although I
don't think it corresponds directly to bg::bl; I think bg::bl might be
ib::error, and the 8192-sized buffer comes from ib::logger::msg).

[Bug libstdc++/80335] perf of copying std::optional

2018-06-16 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80335

--- Comment #3 from sgunderson at bigfoot dot com ---
Appears to have been fixed in GCC 8, indeed.

#include 

std::optional func()
{
return 3;
}

GCC 7 (-O2) compiles to:

   0:   48 89 f8mov%rdi,%rax
   3:   c7 07 03 00 00 00   movl   $0x3,(%rdi)
   9:   c6 47 04 01 movb   $0x1,0x4(%rdi)
   d:   c3  retq   

GCC 8 (-O2):

   0:   48 b8 03 00 00 00 01movabs $0x10003,%rax
   7:   00 00 00 
   a:   c3  retq   

This is an ABI break, but I'll happily take it. :-)

[Bug c++/84076] [6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer

2018-01-30 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076

--- Comment #5 from sgunderson at bigfoot dot com ---
Ah, so it's allowed to send structs and classes, just not non-PODs. So that's
why the conversion to a pointer happens.

[Bug c++/84076] [6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer

2018-01-30 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076

--- Comment #3 from sgunderson at bigfoot dot com ---
printf aside, is this thing actually supported in varargs? I thought non-PODs
were not allowed in varargs, period. (If it's not allowed, I'm not sure why the
compiler even tries.)

[Bug c++/84076] New: [5/6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer

2018-01-27 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076

Bug ID: 84076
   Summary: [5/6/7/8 Regression] Warning about objects through POD
mistakenly claims the object is a pointer
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Test program:

#include 
#include 

int main(void)
{
std::string str;
printf("%s\n", str);
}

GCC 4.9 and older gives:

test.cpp: In function ‘int main()’:
test.cpp:7:20: error: cannot pass objects of non-trivially-copyable type
‘std::string {aka class std::basic_string}’ through ‘...’
  printf("%s\n", str);
^

GCC 5.0 and newer (including 7.3.0) prints:

test.cpp: In function ‘int main()’:
test.cpp:7:20: warning: format ‘%s’ expects argument of type ‘char*’, but
argument 2 has type ‘std::__cxx11::string* {aka
std::__cxx11::basic_string*}’ [-Wformat=]
  printf("%s\n", str);
^

This is a confusing warning, since it claims I'm sending a std::string * when
I'm sending a std::string. In particular, in the program I was trying to fix
this by adding ->c_str(), but .c_str() was the correct choice.

[Bug c++/83227] New: [7 Regression] internal compiler error: in process_init_constructor_array

2017-11-30 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83227

Bug ID: 83227
   Summary: [7 Regression] internal compiler error: in
process_init_constructor_array
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

I believe this is distinct from #82593, so I'm filing it as a separate bug.

The following test program dies with GCC 7.2.0 with -std=c++17:

#include 
#include 

struct Direction {
 Direction() {}
};

struct Front_back : public Direction {
  Front_back() : Direction() {}
};

void foo(const std::vector );

void bar() {
  foo({ Front_back{} });
}

test.cc: In function ‘void bar()’:
test.cc:15:23: internal compiler error: in process_init_constructor_array, at
cp/typeck2.c:1308
   foo({ Front_back{} });
   ^
Please submit a full bug report,
with preprocessed source if appropriate.

It works with GCC 6.4.0, and also with -std=c++14. It's still there in the
20171109 snapshot.

Reduced preprocessed case:

namespace std {
template  class initializer_list {
  const a *b;
  unsigned long c;
};
struct e {
  e(int);
};
template  class f : e {
public:
  f(initializer_list, int g = int()) : e(g) {}
};
}
struct h {};
struct i : h {
  i();
};
void foo(std::f) { foo({i{}}); }

[Bug c++/83226] New: [7 Regression] std::map with reference T breaks in C++17 mode

2017-11-30 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83226

Bug ID: 83226
   Summary: [7 Regression] std::map with reference T breaks in
C++17 mode
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

The following code works under GCC for -std=c++14, but breaks under -std=c++17:

#include 
#include 

int main(void)
{
  std::map<int, const int &> m;
  std::pair<int, const int &> val(3, 4);
  m.insert(val);  // Compile error.
  m.emplace(3, 4);  // Works.
}

I've looked briefly through the standard, but I can't see anything that
indicates you can't have a const reference as value type (not that I'd
recommend it!). The error messages given are:

In file included from /usr/include/c++/7/bits/stl_iterator.h:66:0,
 from /usr/include/c++/7/bits/stl_algobase.h:67,
 from /usr/include/c++/7/bits/stl_tree.h:63,
 from /usr/include/c++/7/map:60,
 from test.cc:1:
/usr/include/c++/7/bits/ptr_traits.h: In substitution of ‘template
template using rebind = _Up* [with _Up = const int&; _Tp =
std::_Rb_tree_node<std::pair >]’:
/usr/include/c++/7/bits/ptr_traits.h:147:77:   required by substitution of
‘template using __ptr_rebind = typename
std::pointer_traits::rebind<_Tp> [with _Ptr =
std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair > > >::pointer; _Tp = const int&]’
/usr/include/c++/7/bits/node_handle.h:203:69:   required by substitution of
‘template template using
__pointer = std::__ptr_rebind::pointer, _Tp> [with _Tp = std::pair::second_type; _Key = int; _Value = std::pair; _NodeAlloc = std::allocator<std::_Rb_tree_node<std::pair > >]’
/usr/include/c++/7/bits/node_handle.h:206:60:   required from ‘class
std::_Node_handle<int, std::pair,
std::allocator<std::_Rb_tree_node<std::pair > > >’
test.cc:8:15:   required from here
/usr/include/c++/7/bits/ptr_traits.h:133:28: error: forming pointer to
reference type ‘const int&’
 using rebind = _Up*;
^

Confirmed with 20171109 snapshot. Clang 5.0.0 with the same libstdc++ gives a
similar error, so I believe this is about the standard library, not the
compiler (unless it's an invalid program).

GCC 6.4.0 does not give an error here, so I'm marking this as a regression.

[Bug target/54589] struct offset add should be folded into address calculation

2017-11-03 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #3 from sgunderson at bigfoot dot com ---
Still there in GCC 7.2.1 (exact same assembler output), and in 8.0 snapshot
20171017.

[Bug c++/82799] New: [8 Regression] -Wunused-but-set-variable false positive

2017-11-01 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82799

Bug ID: 82799
   Summary: [8 Regression] -Wunused-but-set-variable false
positive
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

Reduced testcase (automatically; it might be possible to reduce further):

enum a { b }; 
struct c {
template < a > int d() {
const bool is_ident = 0;
const int ret = is_ident ? 7 : 9;
return ret;
}
void e() {
d < b > ();
}
};

When compiled with -Wall, yields:

test.cc: In instantiation of 'int c::d() [with a  = (a)0]':
test.cc:9:12:   required from here
test.cc:4:14: warning: variable 'is_ident' set but not used
[-Wunused-but-set-variable]
   const bool is_ident = 0;
  ^~~~

even though is_ident is clearly used on the line below.

gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian
20171017-1) 

This does not happen with GCC 7.2.1.

[Bug c++/81716] Bogus -Wlto warning with forward-declared pointers

2017-10-31 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81716

--- Comment #2 from sgunderson at bigfoot dot com ---
Still there in:

gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian
20171017-1)

[Bug c++/82780] [8 Regression] ICE on compiling Boost

2017-10-31 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82780

--- Comment #1 from sgunderson at bigfoot dot com ---
Here's a version that's valid C++:

class a {
};
template  class c {  c(c &)   : a(static_cast
(e.d)) {}  a d; };

[Bug c++/82780] New: [8 Regression] ICE on compiling Boost

2017-10-31 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82780

Bug ID: 82780
   Summary: [8 Regression] ICE on compiling Boost
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

Reduced test case below. Regression happens when compiling a part of MySQL
which uses Boost (1.65.0); original code was valid but reduced case is not.
(Reduction also independently found #82050.) GCC 7 does not complain.

gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian
20171017-1) 

atum17:~> cat ~/reduce2/tmp.i   
class a {  
} template  class c {  c(c &)   : (static_cast a && e.d;  
   a d

atum17:~> /usr/lib/gcc-snapshot/bin/g++  -c ~/reduce2/tmp.i 
[...]
/srv/sesse/reduce2/tmp.i:2:72: internal compiler error: tree check: expected
tree that contains 'decl common' structure, have 'identifier_node' in
get_inner_reference, at expr.c:6999
 } template  class c {  c(c &)   : (static_cast a && e.d; 
a d
^

[Bug c++/82269] -Wignored-qualifiers should not trigger on templated code

2017-09-20 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269

--- Comment #4 from sgunderson at bigfoot dot com ---
This one is perhaps a better case:

../sql/parse_tree_column_attrs.h: In constructor
'PT_blob_type::PT_blob_type(Blob_type, const CHARSET_INFO*, bool)':
../sql/parse_tree_column_attrs.h:548:59: warning: type qualifiers ignored on
cast result type [-Wignored-qualifiers]
   : PT_type(static_cast<decltype(PT_type::type)>(blob_type)),
   ^

I looked for a bug before filing but didn't find any; which one should I
subscribe to?

[Bug c++/82269] -Wignored-qualifiers should not trigger on templated code

2017-09-20 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from sgunderson at bigfoot dot com ---
OK, actually looking at the specific caller, perhaps it should :-) I'm still a
bit torn, though. I'll close for now and see if I can find a better example.

[Bug c++/82269] New: -Wignored-qualifiers should not trigger on templated code

2017-09-20 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269

Bug ID: 82269
   Summary: -Wignored-qualifiers should not trigger on templated
code
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

When compiling MySQL with GCC 8.0.0 20170917, I get

In file included from ../include/my_byteorder.h:53:0,
 from ../include/m_ctype.h:29,
 from ../sql/parse_tree_helpers.h:24,
 from ../unittest/gunit/opt_ref-t.cc:23:
../include/template_utils.h: In instantiation of 'T pointer_cast(void*) [with T
= unsigned char* const]':
../sql/sql_optimizer.cc:9973:60:   required from here
../include/template_utils.h:70:10: warning: type qualifiers ignored on cast
result type [-Wignored-qualifiers]
   return static_cast(p);
  ^

I think this is a bit too aggressive. The function in question reads

template
inline T pointer_cast(void *p)
{
  return static_cast(p);
}

Sure, it's possible to put std::remove_cv_t around the type, but should it
really be needed?

[Bug libstdc++/80335] perf of copying std::optional

2017-08-31 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80335

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 CC||sgunderson at bigfoot dot com

--- Comment #1 from sgunderson at bigfoot dot com ---
This also affects the _returning_ std::optional; since it is not trivially copy
constructible, std::optional must be returned (at least on amd64) by means
of a hidden parameter instead of in registers.

Since this affects the return type ABI, it can't be changed easily
after-the-fact, so if possible, it should be fixed before C++17 support becomes
non-experimental.

[Bug c/81980] Spurious -Wmissing-format-attribute warning in 32-bit mode

2017-08-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81980

--- Comment #1 from sgunderson at bigfoot dot com ---
Forgot to say: Also present in trunk r251306 and all the way back to at least
4.8.

[Bug c/81980] New: Spurious -Wmissing-format-attribute warning in 32-bit mode

2017-08-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81980

Bug ID: 81980
   Summary: Spurious -Wmissing-format-attribute warning in 32-bit
mode
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

The following (reduced) code gives a warning if and only if I add -m32:

atum17:~> cat test.c
#include 

char a;
void set_message(const char *fmt, va_list ap)
  __attribute__((format(printf, 1, 0)));
void set_message_by_errcode(va_list ap) { set_message(, ap); }

atum17:~> gcc -Wmissing-format-attribute -c test.c 
atum17:~> gcc -Wmissing-format-attribute -c test.c -m32
test.cc: In function ‘void set_message_by_errcode(va_list)’:
test.cc:6:61: warning: function ‘void set_message_by_errcode(va_list)’ might be
a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
 void set_message_by_errcode(va_list ap) { set_message(, ap); }
 ^

I believe the warning is spurious, since there's no way you could construct a
valid printf format attribute for set_message_by_errcode (it doesn't take in a
string parameter).

This holds for both C and C++.

atum17:~> gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 7.2.0-1'
--with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-7
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 7.2.0 (Debian 7.2.0-1)

[Bug c++/81668] LTO ODR warnings are not helpful

2017-08-09 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #9 from sgunderson at bigfoot dot com ---
(In reply to Manuel López-Ibáñez from comment #8)
> Actually, what would be more useful is to detect that the difference in type
> comes from S and point out where S has been declared as different types.

Yes, that would be even better. But save for that :-)

> Note that this is not the same bug I pointed out for 
> 
> ../include/violite.h:288:8: warning: type ‘struct st_vio’ violates the C++
> One Definition Rule [-Wodr]
> ../include/violite.h:288:0: note: a different type is defined in another
> translation unit
> 
> The :0: indicates something wrong with the location info. If the location is
> unknown, it would be better to use UNKNOWN_LOCATION.

Yes, I know. It's a bit odd, but it doesn't bother me as much in this case.

[Bug c++/81668] LTO ODR warnings are not helpful

2017-08-08 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #7 from sgunderson at bigfoot dot com ---
(In reply to Manuel López-Ibáñez from comment #6)
>> fts0pars.y:62:0: note: a field with different name is defined in another
>> translation unit
> Did you cut the above? It looks like a note without a previous warning.
> Also, GCC will have trouble to point out the correct location when compiling
> a generated file that contains linemarkers, unless the linemarkers exactly
> point out to the original file AND the original file is available to read.

Sorry, yes, it was cut (I didn't intend to include it, as it is related to
another and very real warning).

Let me make a more minimal example to illustrate my issue (adapted from the
case in 81716). I thought I'd pasted it already, but evidently it never made
Bugzilla.

atum17:~> cat test1.cc
#include "test.h"

void foo(S *t)
{
  q[0] = nullptr;
}
atum17:~> cat test2.cc
#include 

#include "test.h"

class S {
  int m;
};

void bar(S *t)
{
printf("%p\n", q[0]);
}
atum17:~> cat test.h  
class S;
extern S *q[10];

atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -O2 -flto  -o test.so test1.cc
test2.cc
test.h:2:11: warning: type of 'q' does not match original declaration
[-Wlto-type-mismatch]
 extern S *q[10];
   ^
test.h:2:11: note: 'q' was previously declared here
 extern S *q[10];
   ^
test.h:2:11: note: code may be misoptimized unless -fno-strict-aliasing is used
/usr/lib/x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status


What I'd like is some sort of indication about where test.h came in from
(test1.cc and test2.cc).

[Bug c++/81668] LTO ODR warnings are not helpful

2017-08-07 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #5 from sgunderson at bigfoot dot com ---
(In reply to Markus Trippelsdorf from comment #3)
> I don't see any bug, all relevant information is in the warnings.

My point is that all relevant information _isn't_ in the warnings.

In particular: The context of the .h file (which .o/.cc file it was compiled as
part of in the two cases) is nowhere to be found. If I had that, it would be a
lot easier to preprocess the two files and try to find what the difference is.

Seemingly at least one of these was a GCC bug (#81716); with some luck, the
others I cannot figure out are, too.

[Bug c++/81716] New: Bogus -Wlto warning with forward-declared pointers

2017-08-04 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81716

Bug ID: 81716
   Summary: Bogus -Wlto warning with forward-declared pointers
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

It seems that if you forward-declare a class in one translation unit (and use
pointers to it), it will count as a different type for LTO detection purposes,
which doesn't sound right. Might there be that it implicitly gets a type of
nullptr_t? Or something else?

gcc version 8.0.0 20170618 (experimental) [trunk revision 249349] (Debian
20170618-1) 

atum17:~> cat test1.cc
class S;
extern S *q[10];

void foo(S *t)
{
  q[0] = nullptr;
}
atum17:~> cat test2.cc
#include 

class S {
  int m;
};
extern S *q[10];

void bar(S *t)
{
printf("%p\n", q[0]);
}
atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -O2 -flto  -o test.so test1.cc
test2.cc   
test2.cc:6:11: warning: type of 'q' does not match original declaration
[-Wlto-type-mismatch]
 extern S *q[10];
   ^
test1.cc:2:11: note: 'q' was previously declared here
 extern S *q[10];
   ^
test1.cc:2:11: note: code may be misoptimized unless -fno-strict-aliasing is
used
/usr/lib/x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status

[Bug c++/81668] LTO ODR warnings are not helpful

2017-08-04 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

--- Comment #2 from sgunderson at bigfoot dot com ---
Running with -fno-diagnostics-show-caret does not help any:

../include/violite.h:288:8: warning: type ‘struct st_vio’ violates the C++ One
Definition Rule [-Wodr]
../include/violite.h:288:0: note: a different type is defined in another
translation unit
../include/violite.h:339:46: note: the first difference of corresponding
definitions is field ‘viodelete’
../include/violite.h:339:0: note: a field of same name but different type is
defined in another translation unit

It's hard for me to look at the preprocessed source code, because I don't know
what to preprocess. Like I said, there's probably a thousand translation units
including this .h file; how would I know which one to look through to find the
two differing definitions?

[Bug c++/81668] New: LTO ODR warnings are not helpful

2017-08-02 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668

Bug ID: 81668
   Summary: LTO ODR warnings are not helpful
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

I'm trying to make MySQL compile with LTO. There are a lot of ODR violations
(which I'm trying to fix), but sometimes, the warnings are too vague to give
any real information. An example:

[797/1336] Building CXX object
unittest/gunit/CMakeFiles/merge_large_tests-t.dir/opt_ref-t.cc.o
In file included from ../include/my_byteorder.h:53:0,
 from ../include/m_ctype.h:29,
 from ../include/my_compare.h:25,
 from ../sql/field.h:22,
 from ../unittest/gunit/fake_table.h:27,
 from ../unittest/gunit/opt_ref-t.cc:23:
../include/template_utils.h: In instantiation of 'T pointer_cast(void*) [with T
= unsigned char* const]':
../sql/sql_optimizer.cc:9901:60:   required from here
../include/template_utils.h:70:10: warning: type qualifiers ignored on cast
result type [-Wignored-qualifiers]
   return static_cast(p);
  ^
[852/1336] Linking CXX executable runtime_output_directory/pfs-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless
-fno-strict-aliasing is used
[854/1336] Linking CXX executable runtime_output_directory/pfs_instr-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless
-fno-strict-aliasing is used
[855/1336] Linking CXX executable runtime_output_directory/pfs_instr_class-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless
-fno-strict-aliasing is used
[856/1336] Linking CXX executable runtime_output_directory/pfs_account-oom-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless
-fno-strict-aliasing is used
[857/1336] Linking CXX executable runtime_output_directory/pfs_host-oom-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless
-fno-strict-aliasing is used
[858/1336] Linking CXX executable runtime_output_directory/pfs_user-oom-t
../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not
match original declaration [-Wlto-type-mismatch]
 extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
^
../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously
declared here
 thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS];
 ^
../storage/perfschema/pfs.cc:2072:33: note

[Bug c++/81277] New: assert() in multiversioned functions causes copmilation error

2017-07-02 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81277

Bug ID: 81277
   Summary: assert() in multiversioned functions causes
copmilation error
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

As the bug title says:

#include 
#include 

__attribute__((target("default")))
void foo(int x)
{
assert(x >= 0);
}

__attribute__((target("arch=haswell")))
void foo(int x)
{
assert(x >= 0);
}

When compiled:

/tmp/ccCQK5gF.s: Assembler messages:
/tmp/ccCQK5gF.s:71: Error: symbol `_ZZ3fooiE19__PRETTY_FUNCTION__' is already
defined

[Bug c++/81276] New: Function multiversioning doesn't work with C++ templates

2017-07-02 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81276

Bug ID: 81276
   Summary: Function multiversioning doesn't work with C++
templates
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

Seemingly one can't ask for a template to be multiversioned:

template
__attribute__ ((target("default")))
void func(T *x) {}

template
__attribute__ ((target("arch=haswell")))
void func(T *x) {}

gives, when compiled:

func.cpp:7:6: error: ambiguating new declaration ‘template void
func(T*)’
 void func(T *x) {}
  ^~~~
func.cpp:3:6: note: old declaration ‘template void func(T*)’
 void func(T *x) {}
  ^~~~

target_clones works, but is useless for me because it turns off inlining.

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

--- Comment #4 from sgunderson at bigfoot dot com ---
I think this should work as reduction:

struct Empty {
};

template
struct A
{
A =(const A&) 
{
T t(3);
return *this;
}   
};

class B
{   
A a;
};

int main(void)
{
B b1, b2;
b1 = b2;
}


The error is attributed to the line with “class B”, without ever mentioning the
“b1 = b2;” line.

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

--- Comment #2 from sgunderson at bigfoot dot com ---
Yes, I mean that the error message isn't clear (and it's basically the same
error message in 4.8, so no regression).

I don't think I understand the difficulties involved. Doesn't the error come as
a direct result of my copying? If I do this with e.g. std::vector, I get a much
clearer error message, which directly points to the line in question:

[…]
/usr/include/c++/6/bits/vector.tcc:195:19:   required from ‘std::vector<_Tp,
_Alloc>& std::vector<_Tp, 
_Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp =
std::unique_ptr; _Alloc = std::all
ocator<std::unique_ptr >]’
test.cc:7:7:   required from here

My main gripe is that with unordered_map, the error traceback stops in the
internal details of _Hashtable:

/usr/include/c++/7/bits/unordered_map.h:101:11:   required from here

In the real-world case in question, I eventually had to go into unordered_map.h
 (yes, in /usr/include) and replace “= default;” with “= delete;” to figure out
who was calling the copy constructor.

[Bug c++/80858] New: When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

Bug ID: 80858
   Summary: When trying to copy std::unordered_map illegally,
error message doesn't tell what's wrong
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Using gcc version 7.1.0 (Debian 7.1.0-5) (but the error goes back to at least
4.8, and amazingly, also in Clang), on this piece of code, simplified from a
much bigger test case:

#include 
#include 

int main(void)
{
  std::unordered_map<int, std::unique_ptr> a, b;
  a = b;
}

The code is wrong, and GCC correctly rejects it, but the error message is less
than helpful, since it doesn't mention the line with the assignment on, or
really anything hinting at who asked the copy constructor to be invoked:

$ g++-7 -c test.cc
In file included from
/usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0,
 from /usr/include/c++/7/bits/allocator.h:46,
 from /usr/include/c++/7/memory:63,
 from test.cc:1:
/usr/include/c++/7/ext/new_allocator.h: In instantiation of ‘void
__gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up =
std::pair >; _Args = {const std::pair > >&}; _Tp = std::pair >]’:
/usr/include/c++/7/bits/alloc_traits.h:475:4:   required from ‘static void
std::allocator_traits<std::allocator<_Tp1>
>::construct(std::allocator_traits<std::allocator<_Tp1> >::allocator_type&,
_Up*, _Args&& ...) [with _Up = std::pair >;
_Args = {const std::pair > >&}; _Tp = std::pair
>; std::allocator_traits<std::allocator<_Tp1> >::allocator_type =
std::allocator<std::pair > >]’
/usr/include/c++/7/bits/hashtable_policy.h:2066:37:   required from
‘std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type*
std::__detail::_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&& ...)
[with _Args = {const std::pair > >&}; _NodeAlloc =
std::allocator<std::__detail::_Hash_node<std::pair >, false> >;
std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type =
std::__detail::_Hash_node<std::pair >, false>]’
/usr/include/c++/7/bits/hashtable.h:1023:54:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&)::<lambda(const
__node_type*)> [with _Key = int; _Value = std::pair >; _Alloc = std::allocator<std::pair > >; _ExtractKey = std::__detail::_Select1st; _Equal =
std::equal_to; _H1 = std::hash; _H2 =
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash;
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits<false, false, true>; std::_Hashtable<_Key,
_Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::__node_type = std::__detail::_Hash_node<std::pair >, false>; typename _Traits::__hash_cached =
std::integral_constant<bool, false>]’
/usr/include/c++/7/bits/hashtable.h:1022:9:   required from ‘struct
std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&) [with _Key =
int; _Value = std::pair >; _Alloc =
std::allocator<std::pair > >; _ExtractKey =
std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash;
_H2 = std::__detail::_Mod_range_hashing; _Hash =
std::__detail::_Default_ranged_hash; _RehashPolicy =
std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits<false, false, true>]::<lambda(const
__node_type*)>’
/usr/include/c++/7/bits/hashtable.h:1021:14:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>& std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey,
_Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const
std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>&) [with _Key = int; _Value = std::pair >; _Alloc = std::allocator<std::pair > >; _ExtractKey = std::__detail::_Select1st; _Equal =
std::equal_to; _H1 = std::hash; _H2 =
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash;
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits<false, false, true>]’
/usr/include/c++/7/bits/unordered_map.h:101:11:   required from here
/usr/include

[Bug c++/79746] [7 Regression] Confusing -Wunused-but-set-parameter warning with virtual inheritance

2017-03-01 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746

--- Comment #6 from sgunderson at bigfoot dot com ---
Thanks. But I'm still curious; is the second code snippet well-formed or not?

[Bug c++/79746] Confusing -Wunused-but-set-parameter warning with virtual inheritance

2017-02-28 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746

--- Comment #1 from sgunderson at bigfoot dot com ---
Actually this is interesting; this code (derived from the previous one)
compiles without warning in GCC 7.0 and Clang, but gives an error in GCC 6.3:

struct Base { 
Base(const char *foo) : m_foo(foo) {}
virtual int func() = 0;

const char *m_foo;
};

struct Derived : public virtual Base {
Derived(const char *foo) { (void)foo; }
};

Which compiler is right?

[Bug c++/79750] New: -Wimplicit-fallthrough= comment detection gets confused by #if

2017-02-28 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79750

Bug ID: 79750
   Summary: -Wimplicit-fallthrough= comment detection gets
confused by #if
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

It seems that fallthrough comments are not properly parsed if they are followed
by a preprocessor statement. Minified test case:


atum17:~> /usr/lib/gcc-snapshot/bin/g++ -v  
Using built-in specs.
COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20170226-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,ada,c++,go,brig,fortran,objc,obj-c++
--prefix=/usr/lib/gcc-snapshot --with-gcc-major-version-only --program-prefix=
--enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=yes --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian
20170226-1) 

atum17:~> cat test.cc
int func(int x)
{
switch (x) {
case 0:
x = 1;
//-fallthrough
#if 1
case 1:
#endif
case 2:
++x;
}
return x;
}

atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -Wextra -c test.cc
test.cc: In function 'int func(int)':
test.cc:5:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
   x = 1;
   ~~^~~
test.cc:8:2: note: here
  case 1:
  ^~~~

If I remove the #if 1, there is no warning.

[Bug c++/79746] New: Confusing -Wunused-but-set-parameter warning with virtual inheritance

2017-02-28 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746

Bug ID: 79746
   Summary: Confusing -Wunused-but-set-parameter warning with
virtual inheritance
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

This is a minified testcase of MySQL when trying to compile with a 7.0
snapshot:

atum17:~> /usr/lib/gcc-snapshot/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20170226-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,ada,c++,go,brig,fortran,objc,obj-c++
--prefix=/usr/lib/gcc-snapshot --with-gcc-major-version-only --program-prefix=
--enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=yes --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian
20170226-1) 

atum17:~> cat test2.cc
struct Base {
Base(const char *foo) : m_foo(foo) {}
virtual int func() = 0;

const char *m_foo;
};

struct Derived : public virtual Base {
Derived(const char *foo) : Base(foo) {}
};

atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wunused-but-set-parameter -c test2.cc
test2.cc: In constructor 'Derived::Derived(const char*)':
test2.cc:9:22: warning: parameter 'foo' set but not used
[-Wunused-but-set-parameter]
  Derived(const char *foo) : Base(foo) {}

I think the warning is actually somehow correct, but it's very confusing until
you see what's going on. I think the logic goes something like: Virtual bases
are always set through the most derived class. Since Derived has a pure virtual
(func()), it can't be the most derived class, and thus, its call to Base() can
never actually happen. Thus, “foo” is unused.

Perhaps a better warning would be something like

test2.cc:9:22: warning: class 'Derived' inherits virtually from 'Base' but is
not possible to instantiate by itself, so it can never be the most derived
class, and the call to 'Base::Base(const char *foo)' is always ignored

but that might be too wordy.

[Bug c++/79727] -Wimplicit-fallthrough=3 doesn't seem to match any comments

2017-02-28 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79727

--- Comment #2 from sgunderson at bigfoot dot com ---
Wait, it can't do with a substring match? That wasn't clear at all from the
documentation, and it makes the default a lot more strict than I assumed. Some
of the regexes are rather strange, then; one would assume that the ones
starting with [ \t.!]* are to capture word boundaries; why would . and ! be
there otherwise? To capture strange comment syntaxes like this?

  // !else, fallthrough-

[Bug c++/79727] New: -Wimplicit-fallthrough=3 doesn't seem to match any comments

2017-02-27 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79727

Bug ID: 79727
   Summary: -Wimplicit-fallthrough=3 doesn't seem to match any
comments
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

I can't get the supposed fallthrough comments (on
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html) to work:

gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian
20170226-1) 

atum17:~> cat test.cc
#include 
#include 

int main(int argc, char **argv)
{
switch (argc) {
case 2:
printf("something\n");
// -fallthrough
case 3:
printf("whatever\n");
break;
}
}

atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wimplicit-fallthrough=3 -c test.cc
test.cc: In function 'int main(int, char**)':
test.cc:8:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
   printf("something\n");
   ~~^~~
test.cc:10:2: note: here
  case 3:
  ^~~~

I've tried a variety of the other patterns that are supposed to match, but I
can't get it to work on level 3. Level 2 appears to work as documented.

[Bug target/71993] __builtin_cpu_supports() does not support "f16c"

2016-07-26 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71993

--- Comment #2 from sgunderson at bigfoot dot com ---
Right.

[Bug target/71993] New: __builtin_cpu_supports() does not support "f16c"

2016-07-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71993

Bug ID: 71993
   Summary: __builtin_cpu_supports() does not support "f16c"
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

As the summary says. You just get:

lib.cc:4:1: error: Parameter to builtin not valid: f16c

Tested with:

gcc version 7.0.0 20160707 (experimental) [trunk revision 238117] (Debian
20160707-1)

[Bug tree-optimization/71990] Function multiversioning prohibits inlining

2016-07-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990

--- Comment #4 from sgunderson at bigfoot dot com ---
OK, so it would have to be a special kind of cloning, not the one you can do
yourself from code as of today?

As a user, I suppose there's no really good way of dealing with this currently,
right? Short of maybe doing manual multiversioning with
__builtin_cpu_supports() and hoping that the compiler can hoist that out of all
the loops.

[Bug tree-optimization/71990] Function multiversioning prohibits inlining

2016-07-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990

--- Comment #2 from sgunderson at bigfoot dot com ---
Would pushing the mv automatically upwards into callers really help? There's
still no way that I can see to inline the function; I mean, pushing upwards is
what I've been trying to do here manually with the target clones.

[Bug tree-optimization/71990] New: Function multiversioning prohibits inlining

2016-07-25 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990

Bug ID: 71990
   Summary: Function multiversioning prohibits inlining
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Hi,

I'm trying to write a library that uses F16C instructions in certain places,
and since they're not really universally accessible (and ld.so hardware
capabilities seem to have been long abandoned), I've tried to use function
multiversioning for it. However, trying to combine it with inlining seems to
draw a blank; a very simplified example:

klump:~> /usr/lib/gcc-snapshot/bin/g++ -v 
Using built-in specs.  
COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7.0.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20160707-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,ada,c++,java,go,fortran,objc,obj-c++
--prefix=/usr/lib/gcc-snapshot --enable-shared --enable-linker-build-id
--disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-libmpx
--enable-plugin --with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-7-snap-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--disable-werror --enable-checking=yes --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.0.0 20160707 (experimental) [trunk revision 238117] (Debian
20160707-1) 

klump:~> cat test.cc 
#include 

__attribute__ ((target("default")))
inline int foo()
{
return 0;
}

__attribute__ ((target("avx")))
inline int foo()
{
return 1;
}

int bar()
{
int sum = 0;
for (int i = 0; i < 100; ++i) {
sum += foo();
}
return sum;
}

int main(void)
{
printf("%d\n", bar());
}

klump:~> /usr/lib/gcc-snapshot/bin/g++ -O2 -o test test.cc
klump:~> nm --demangle test | egrep 'foo|bar' 
00400c40 i _Z3foov.ifunc()
00400bf0 T bar()
00400c20 W foo()
00400c30 W foo() [clone .avx]
00400c40 W foo() [clone .resolver]

Of course, in reality, my foo() would do something more complicated, like call
_cvtss_sh() or similar; this is a toy example. But it illustrates that the
function multiversioning blocks inlining.

If I compile with -mavx, the entire multiversioning goes away (only the AVX
version is emitted), so I hoped that I could use target cloning on bar():

__attribute__ ((target_clones("avx", "default")))
int bar()
{
// same code...

but unfortunately, no. There's a bar() clone for AVX emitted, but it still
calls the resolving function for foo(); no inlining.

So I really can't find any usable way of using this feature if your
architecture switch is in inlined functions (in my case, convert to/from fp16).

[Bug rtl-optimization/68282] Optimization fails to remove unnecessary sign extension instruction

2015-11-11 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68282

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 CC||sgunderson at bigfoot dot com

--- Comment #2 from sgunderson at bigfoot dot com ---
Shouldn't it be possible to fold the incl into the mov, too?

shrl$2, %edi
movltable+4(,%rdi,4), %eax
retq

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-03 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 CC||sgunderson at bigfoot dot com

--- Comment #6 from sgunderson at bigfoot dot com ---
Without knowing anything about the GCC internals here, I could perhaps also
point out that GCC should know that these have limited range. As a trivial
example:

int foo(int x)
{
int z = __builtin_ctz(x);
if (z  2000) {
return 1;
} else {
return 0;
}
}

There's no way this function can return anything but 0, and VRP should probably
be taught that. (I wonder if this would fix the unneccessary sign extension
too?)

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-03 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #7 from sgunderson at bigfoot dot com ---
Wait, sorry, someone's already pointed that out. Ignore me, then...

I can at least confirm it still happens with GCC 4.8.1.

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #6 from sgunderson at bigfoot dot com ---
BZHI seems to have the same problem.

[Bug target/57624] BZHI instrinsic is missing

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57624

--- Comment #2 from sgunderson at bigfoot dot com ---
Shouldn't really the documentation say so, then? The entire GCC manual seems to
make no note of this header at all, as far as I can see.

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #8 from sgunderson at bigfoot dot com ---
I really did spot the BZHI problem in actual code; that's how I found out :-) I
rewrote it slightly and the problem disappeared, though.

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #10 from sgunderson at bigfoot dot com ---
On Thu, Jun 27, 2013 at 12:27:02PM +, jakub at gcc dot gnu.org wrote:
 Then please provide preprocessed testcase for it (plus command line options). 
 Because I'm really curious how it could have been matched.

Sorry, the code is a) not so easy to make public right now, and b) this
particular edit has been lost in the mists of time (like I said, I wrote it
slightly differently and then it was gone). But the scrollback in my terminal
still has this for “proof”:

sesse@gruessi:~/addie$ g++-4.8 -O2 -march=native -o addie addie.cc 
/tmp/ccJweT2R.s: Assembler messages:
/tmp/ccJweT2R.s:82: Error: operand size mismatch for `bzhi'

Sorry I couldn't be more helpful. :-)

/* Steinar */

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #11 from sgunderson at bigfoot dot com ---
On Thu, Jun 27, 2013 at 12:32:18PM +, sgunderson at bigfoot dot com wrote:
 Sorry, the code is a) not so easy to make public right now, and b) this
 
 particular edit has been lost in the mists of time (like I said, I wrote it
 
 slightly differently and then it was gone). But the scrollback in my termin
 al
 
 still has this for “proof”:

Hah, I reproduced it. I'll try to distill it down to a small test case.

/* Steinar */

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #12 from sgunderson at bigfoot dot com ---
Created attachment 30389
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30389action=edit
BZHI bug example (compile with g++-4.8 -O2 -mbmi2 -c foo.cc)

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #13 from sgunderson at bigfoot dot com ---
There. That ought to satisfy your curiosity. :-) I get:

sesse@gruessi:~/addie$ g++-4.8 -O2 -mbmi2 -c foo.cc
/tmp/ccX2oEfE.s: Assembler messages:
/tmp/ccX2oEfE.s:21: Error: operand size mismatch for `bzhi'

due to

bzhi_ZL5shift(,%rax,8), %rdx, %rdx

[Bug target/57623] New: BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

Bug ID: 57623
   Summary: BEXTR intrinsic has memory operands switched around
(fails to compile code)
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com

Hi,

Given I'm on gcc 4.8.1 (Debian 4.8.1-2). Given the following test program:

sesse@gruessi:~$ cat bextr-test.c 
#include stdint.h

uint64_t func(uint64_t x, uint64_t *y)
{
return __builtin_ia32_bextr_u64(x, *y);
}

trying to compile it fails:

sesse@gruessi:~$ gcc-4.8 -O2 -mbmi -c bextr-test.c --save-temps
bextr-test.s: Assembler messages:
bextr-test.s:9: Error: operand size mismatch for `bextr'

seemingly because GCC's idea of r/m is broken for this instruction:


sesse@gruessi:~$ cat bextr-test.s
.filebextr-test.c
.text
.p2align 4,,15
.globlfunc
.typefunc, @function
func:
.LFB0:
.cfi_startproc
bextr(%rsi), %rdi, %rax
ret
.cfi_endproc
.LFE0:
.sizefunc, .-func
.identGCC: (Debian 4.8.1-2) 4.8.1
.section.note.GNU-stack,,@progbits

As far as I understand, the second operand can be r/m64, but the first can only
be r64.

[Bug target/57624] New: BZHI instrinsic is missing

2013-06-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57624

Bug ID: 57624
   Summary: BZHI instrinsic is missing
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com

Hi,

The GCC documentation
(http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html) claims
there should be such an intrinsic, added in gcc 4.7:

 unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)

Yet, with gcc 4.8.1 (Debian 4.8.1-2), nothing of the sort exists:

sesse@gruessi:~$ gcc-4.8 -Wall -O2 -mbmi2 -c bzhi-test.c
bzhi-test.c: In function ‘func’:
bzhi-test.c:5:2: warning: implicit declaration of function ‘_bzhi_u64’
[-Wimplicit-function-declaration]
  return _bzhi_u64(x, y);
  ^

A function call is generated instead, which was obviously not what I intended.
:-)

I thought this was maybe just a documentation error, but
__builtin_ia32_bzhi_u64 also does not exist.

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #2 from sgunderson at bigfoot dot com ---
On Sat, Jun 15, 2013 at 04:33:14PM +, jakub at gcc dot gnu.org wrote:
 The fix for the compiler is easy, but at least the AVX2 spec documents that
 _bextr_u{32,64} intrinsics actually take 3 arguments (source, start and
 length),
 with the latter two always unsigned int, while our intrinsic has only two
 arguments (where the latter is expected to be (start  255) | (length  8)).
 Not sure if we want to change this, and if so, just for 4.9+, or also for
 4.8.2+ and 4.7.4+?

If you decide to change it, at least consider keeping the old version around;
for instance, the start/length combination could come from a table. In
general, if you actually have to do shifting and stuff to create this
operand, the gain of the instruction is already lost.

/* Steinar */

[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)

2013-06-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623

--- Comment #4 from sgunderson at bigfoot dot com ---
On Sat, Jun 15, 2013 at 05:10:57PM +, jakub at gcc dot gnu.org wrote:
 If both start and length are constants, then it will be folded by the 
 compiler,
 similarly if you use it inside of loop and start/length will be loop
 invariants, the computation can be hoisted out of the loop.

Sure, but again, neither of these match my situation. I really need to do a
lookup into a table (with a non-constant index) to get the value.

/* Steinar */

[Bug tree-optimization/55155] New: Autovectorization does not use unaligned loads/stores

2012-10-31 Thread sgunderson at bigfoot dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55155

 Bug #: 55155
   Summary: Autovectorization does not use unaligned loads/stores
Classification: Unclassified
   Product: gcc
   Version: 4.7.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

I am on

  gcc version 4.7.1 (Debian 4.7.1-7) 

and a project of mine had code that looked like this:

beklager:~ cat example.cpp
void func(float * __restrict prod_features, float * __restrict
grad_prod_features, float alpha, unsigned num_prods) {
float *pf = (float *)__builtin_assume_aligned(prod_features, 16);
float *gpf = (float *)__builtin_assume_aligned(grad_prod_features, 16);
for (unsigned i = 0; i  num_prods * 16; ++i) {
prod_features[i] -= alpha * grad_prod_features[i];
//pf[i] -= alpha * gpf[i];
}
}

This would seem like a great case for autovectorization, so I tried:

beklager:~ g++ -Wall -O2 -ftree-vectorize -msse4.1 -c example.cpp 
example.cpp: In function ‘void func(float*, float*, float, unsigned int)’:
example.cpp:2:9: warning: unused variable ‘pf’ [-Wunused-variable]
example.cpp:3:9: warning: unused variable ‘gpf’ [-Wunused-variable]

The resulting code, however, is a train wreck:
beklager:~ objdump --disassemble --demangle example.o 

example.o: file format elf64-x86-64


Disassembly of section .text:

 func(float*, float*, float, unsigned int):
   0:55   push   %rbp
   1:c1 e2 04 shl$0x4,%edx
   4:85 d2test   %edx,%edx
   6:53   push   %rbx
   7:0f 84 ef 00 00 00je fc func(float*, float*, float,
unsigned int)+0xfc
   d:49 89 f8 mov%rdi,%r8
  10:41 83 e0 0f  and$0xf,%r8d
  14:49 c1 e8 02  shr$0x2,%r8
  18:49 f7 d8 neg%r8
  1b:41 83 e0 03  and$0x3,%r8d
  1f:44 39 c2 cmp%r8d,%edx
  22:44 0f 42 c2  cmovb  %edx,%r8d
  26:83 fa 04 cmp$0x4,%edx
  29:0f 87 d0 00 00 00ja ff func(float*, float*, float,
unsigned int)+0xff
  2f:41 89 d0 mov%edx,%r8d
  32:31 c0xor%eax,%eax
  34:0f 1f 40 00  nopl   0x0(%rax)
  38:f3 0f 10 14 86   movss  (%rsi,%rax,4),%xmm2
  3d:8d 48 01 lea0x1(%rax),%ecx
  40:f3 0f 59 d0  mulss  %xmm0,%xmm2
  44:f3 0f 10 0c 87   movss  (%rdi,%rax,4),%xmm1
  49:f3 0f 5c ca  subss  %xmm2,%xmm1
  4d:f3 0f 11 0c 87   movss  %xmm1,(%rdi,%rax,4)
  52:48 83 c0 01  add$0x1,%rax
  56:41 39 c0 cmp%eax,%r8d
  59:77 ddja 38 func(float*, float*, float,
unsigned int)+0x38
  5b:44 39 c2 cmp%r8d,%edx
  5e:0f 84 98 00 00 00je fc func(float*, float*, float,
unsigned int)+0xfc
  64:89 d5mov%edx,%ebp
  66:45 89 c1 mov%r8d,%r9d
  69:44 29 c5 sub%r8d,%ebp
  6c:41 89 eb mov%ebp,%r11d
  6f:41 c1 eb 02  shr$0x2,%r11d
  73:42 8d 1c 9d 00 00 00 lea0x0(,%r11,4),%ebx
  7a:00 
  7b:85 dbtest   %ebx,%ebx
  7d:74 59je d8 func(float*, float*, float,
unsigned int)+0xd8
  7f:0f 28 c8 movaps %xmm0,%xmm1
  82:49 c1 e1 02  shl$0x2,%r9
  86:0f 57 db xorps  %xmm3,%xmm3
  89:4e 8d 14 0f  lea(%rdi,%r9,1),%r10
  8d:0f c6 c9 00  shufps $0x0,%xmm1,%xmm1
  91:49 01 f1 add%rsi,%r9
  94:31 c0xor%eax,%eax
  96:45 31 c0 xor%r8d,%r8d
  99:0f 28 e1 movaps %xmm1,%xmm4
  9c:0f 1f 40 00  nopl   0x0(%rax)
  a0:0f 28 cb movaps %xmm3,%xmm1
  a3:41 83 c0 01  add$0x1,%r8d
  a7:41 0f 28 14 02   movaps (%r10,%rax,1),%xmm2
  ac:41 0f 12 0c 01   movlps (%r9,%rax,1),%xmm1
  b1:41 0f 16 4c 01 08movhps 0x8(%r9,%rax,1),%xmm1
  b7:0f 59 cc mulps  %xmm4,%xmm1
  ba:0f 5c d1 subps  %xmm1,%xmm2
  bd:41 0f 29 14 02   movaps %xmm2,(%r10,%rax,1)
  c2:48 83 c0 10  add$0x10,%rax
  c6:45 39 d8 cmp%r11d,%r8d
  c9:72 d5jb a0 func(float*, float*, float,
unsigned int)+0xa0
  cb:01 d9add%ebx,%ecx
  cd:39 ddcmp%ebx,%ebp
  cf:74 2bje fc

[Bug target/54589] struct offset add should be folded into address calculation

2012-09-17 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #2 from sgunderson at bigfoot dot com 2012-09-17 09:18:16 UTC ---
FWIW, in my original code, func() is a part of a loop body (it keeps reading
values from src in a loop). It doesn't really change anything in the generated
code, though.

[Bug tree-optimization/54589] New: [missed-optimization] struct offset add should be folded into address calculation

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

 Bug #: 54589
   Summary: [missed-optimization] struct offset add should be
folded into address calculation
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

I found this in 4.4 (Ubuntu 10.04), and have confirmed it's still there in

  gcc (Debian 20120820-1) 4.8.0 20120820 (experimental) [trunk revision 190537]

This code:

  #include emmintrin.h

  struct param {
  int a, b, c, d;
  __m128i array[256];
  };

  void func(struct param *p, unsigned char *src, int *dst)
  {
  __m128i x = p-array[*src];
  *dst = _mm_cvtsi128_si32(x);
  }

compiles with -O2 on x86-64 to this assembler:

   func:
 0:0f b6 06 movzbl (%rsi),%eax
 3:48 83 c0 01  add$0x1,%rax
 7:48 c1 e0 04  shl$0x4,%rax
 b:8b 04 07 mov(%rdi,%rax,1),%eax
 e:89 02mov%eax,(%rdx)
10:c3   retq   

The add should be folded into the address calculation here. (The shl can't,
because it's too big.) Curiously enough, if I misalign the struct element by
removing c and d, and declaring the struct __attribute__((packed)), GCC will do
that; the mov will then be from $8(%rdi,%rax,1),%eax and there is no redundant
add.

[Bug tree-optimization/54592] New: [4.8 Regression] [missed-optimization] Cannot fuse SSE move and add together

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54592

 Bug #: 54592
   Summary: [4.8 Regression] [missed-optimization] Cannot fuse SSE
move and add together
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

I have, on x86-64,

  gcc version 4.7.1 (Debian 4.7.1-9) 
  gcc version 4.8.0 20120820 (experimental) [trunk revision 190537] (Debian
20120820-1) 

Given the following test program:

  #include emmintrin.h

  void func(__m128i *foo, size_t a, size_t b, int *dst)
  {
__m128i x = foo[a];
__m128i y = foo[b];
__m128i sum = _mm_add_epi32(x, y);
*dst = _mm_cvtsi128_si32(sum);
  }

GCC 4.8 with -O2 compiles it to

   0:48 c1 e6 04  shl$0x4,%rsi
   4:48 c1 e2 04  shl$0x4,%rdx
   8:66 0f 6f 0c 17   movdqa (%rdi,%rdx,1),%xmm1
   d:66 0f 6f 04 37   movdqa (%rdi,%rsi,1),%xmm0
  12:66 0f fe c1  paddd  %xmm1,%xmm0
  16:66 0f 7e 01  movd   %xmm0,(%rcx)
  1a:c3   retq   

The mov into %xmm1 here doesn't seem to make sense; it should rather be
paddd-ed in directly. And indeed, GCC 4.7 with -O2 gets this right:

   0:48 c1 e6 04  shl$0x4,%rsi
   4:48 c1 e2 04  shl$0x4,%rdx
   8:66 0f 6f 04 37   movdqa (%rdi,%rsi,1),%xmm0
   d:66 0f fe 04 17   paddd  (%rdi,%rdx,1),%xmm0
  12:66 0f 7e 01  movd   %xmm0,(%rcx)
  16:c3   retq   

This would seem like a regression to me.

[Bug target/42778] Superfluous stack management code is generated

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42778

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 CC||sgunderson at bigfoot dot
   ||com

--- Comment #3 from sgunderson at bigfoot dot com 2012-09-15 16:02:37 UTC ---
This seems to be no longer wrong in 4.8.

[Bug target/54593] New: [missed-optimization] Move from SSE to integer register goes through the stack without -march=native

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593

 Bug #: 54593
   Summary: [missed-optimization] Move from SSE to integer
register goes through the stack without -march=native
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

I have reproduced this on 4.4, 4.6, 4.7 and 4.8 (Debian 20120820-1, trunk
version 190537). Given the following code:

  #include x86intrin.h

  int test1(__m128i v) {
 return _mm_cvtsi128_si32(v);
  }

GCC generates

   0:66 0f 7e 44 24 f4movd   %xmm0,-0xc(%rsp)
   6:8b 44 24 f4  mov-0xc(%rsp),%eax
   a:c3   retq   

Shouldn't it go directly to %eax instead of through the stack? Granted, on
Netburst this takes ten cycles or so, but this is x86-64. It appears to be some
sort of tuning issue, since if I use -mtune=native (I am on an Atom) I get:

   0:66 0f 7e c0  movd   %xmm0,%eax
   4:90   nop
   5:90   nop
   6:90   nop
   7:90   nop
   8:90   nop
   9:90   nop
   a:c3   retq   

which is sort-of what I expect. Well, the NOPs are a bit weird, but... :-)

[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593

--- Comment #2 from sgunderson at bigfoot dot com 2012-09-15 16:38:34 UTC ---
Interesting. So it's a conscious choice that “generic” does this?

[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593

--- Comment #4 from sgunderson at bigfoot dot com 2012-09-15 16:54:28 UTC ---
I'm not sure if I understand the comment very well; it talks about Pentium 4,
but none of them run 64-bit code, do they?

[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native

2012-09-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593

--- Comment #6 from sgunderson at bigfoot dot com 2012-09-15 20:28:02 UTC ---
Ah. So basically it hurts AMD enough (the opposite doesn't hit Intel enough)
that the choice was made to make it that way generic too. Well, as long as it's
a deliberate choice, I assume it's a reasonable tradeoff, so thanks for the
enlightenment. :-)

[Bug tree-optimization/51513] New: [missed optimization] Only partially optimizes away unreachable switch default case

2011-12-12 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51513

 Bug #: 51513
   Summary: [missed optimization] Only partially optimizes away
unreachable switch default case
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

I have code that looks like this:

pannekake:~ cat test.c
void foo();
void bar();
void baz();

void func(int i)
{
switch (i) {
case 0: foo(); break;
case 1: bar(); break;
case 2: baz(); break;
case 3: baz(); break;
case 4: bar(); break;
case 5: foo(); break;
case 6: foo(); break;
case 7: bar(); break;
case 8: baz(); break;
case 9: baz(); break;
case 10: bar(); break;
default: __builtin_unreachable(); break;
}
}

Compiling this yields:

pannekake:~ gcc-4.6 -O2 -c test.c  objdump --disassemble test.o

test.o: file format elf64-x86-64


Disassembly of section .text:

 func:
   0:83 ff 0a cmp$0xa,%edi
   3:76 03jbe8 func+0x8
   5:0f 1f 00 nopl   (%rax)
   8:89 ffmov%edi,%edi
   a:31 c0xor%eax,%eax
   c:ff 24 fd 00 00 00 00 jmpq   *0x0(,%rdi,8)
  13:0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
  18:e9 00 00 00 00   jmpq   1d func+0x1d
  1d:0f 1f 00 nopl   (%rax)
  20:e9 00 00 00 00   jmpq   25 func+0x25
  25:0f 1f 00 nopl   (%rax)
  28:e9 00 00 00 00   jmpq   2d func+0x2d

The first compare is, as you can see, unneeded; the code for the default case
itself (a repz ret) has been optimized away due to the __builtin_unreachable(),
but the compare and branch remains.

I've also seen it sometimes be able to remove the jump instruction itself, but
not the compare.

[Bug tree-optimization/51513] [missed optimization] Only partially optimizes away unreachable switch default case

2011-12-12 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51513

--- Comment #1 from sgunderson at bigfoot dot com 2011-12-12 10:54:16 UTC ---
Forgot this:

pannekake:~ gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-5'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--with-arch-32=i586 --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-5)

[Bug tree-optimization/49872] Missed optimization: Could coalesce neighboring memsets

2011-07-28 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49872

--- Comment #2 from sgunderson at bigfoot dot com 2011-07-28 10:09:51 UTC ---
I'm not sure if I've seen exactly this construction in real-world code, but
I've certainly seen examples of the hybrid I sketched out (looking at one was
what motivated me to file the bug), ie. something like:

struct S {
int f[1024];
int g;
};

void func(struct S* s)
{
memset(s-f, 0, sizeof(s-f));
s-g = 0;
}

which I would argue should be rewritten to

void func(struct S* s)
{
memset(s-f, 0, sizeof(s-f) + sizeof(s-g));
}

I'd argue that programmers should not be doing this kind of optimization
themselves, since it's very prone to break when changing the structure,
especially as alignment etc. comes into play.

[Bug target/49865] New: Unneccessary reload causes small size regression from 4.6.1

2011-07-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865

   Summary: Unneccessary reload causes small size regression from
4.6.1
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com
Target: i?86-*-*


Comparing 4.6.1 with gcc-snapshot from Debian:

gcc version 4.7.0 20110709 (experimental) [trunk revision 176106] (Debian
20110709-1) 

Given this code:

fugl:~ cat test.cpp 
#include string.h

class MyClass {
void func();

float f[1024];
int i;
};

void MyClass::func()
{
memset(f, 0, sizeof(f));
i = 0;
}

and compiling with

fugl:~ /usr/lib/gcc-snapshot/bin/g++ -Os -c test.cpp

g++ produces, according to objdump:

 _ZN7MyClass4funcEv:
   0:55   push   %ebp
   1:31 c0xor%eax,%eax
   3:89 e5mov%esp,%ebp
   5:b9 00 04 00 00   mov$0x400,%ecx
   a:57   push   %edi
   b:8b 7d 08 mov0x8(%ebp),%edi
   e:f3 abrep stos %eax,%es:(%edi)
  10:8b 45 08 mov0x8(%ebp),%eax
  13:c7 80 00 10 00 00 00 movl   $0x0,0x1000(%eax)
  1a:00 00 00 
  1d:5f   pop%edi
  1e:5d   pop%ebp
  1f:c3   ret

while 4.6.1 has a more efficient sequence:

 _ZN7MyClass4funcEv:
   0:55   push   %ebp
   1:b9 00 04 00 00   mov$0x400,%ecx
   6:89 e5mov%esp,%ebp
   8:31 c0xor%eax,%eax
   a:8b 55 08 mov0x8(%ebp),%edx
   d:57   push   %edi
   e:89 d7mov%edx,%edi
  10:f3 abrep stos %eax,%es:(%edi)
  12:c7 82 00 10 00 00 00 movl   $0x0,0x1000(%edx)
  19:00 00 00 
  1c:5f   pop%edi
  1d:5d   pop%ebp
  1e:c3   ret   

It seems 4.6 is able to take a copy of the this pointer from a register
before the rep stos operation, which is one byte smaller than reloading it
from the stack when it needs to clear i.

Of course, the _most_ efficient code sequence here would be doing the i = 0
before the memset, but I'm not sure if this is legal. However, eax should still
contain zero, so the mov could be done from eax instead of from a constant.

[Bug target/49865] Unnecessary reload causes small size regression from 4.6.1

2011-07-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865

--- Comment #1 from sgunderson at bigfoot dot com 2011-07-27 11:38:57 UTC ---
(In reply to comment #0)
 Of course, the _most_ efficient code sequence here would be doing the i = 0
 before the memset, but I'm not sure if this is legal. However, eax should 
 still
 contain zero, so the mov could be done from eax instead of from a constant.

Actually, thinking about it, the most efficient code sequence would be just
giving 4100 to memset instead of 4096, but that's for an enhancement request at
some point.

[Bug tree-optimization/49872] New: Missed optimization: Could coalesce neighboring memsets

2011-07-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49872

   Summary: Missed optimization: Could coalesce neighboring
memsets
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Given the following code:

#include string.h

struct S {
int f[1024];
int g[1024];
};

void func(struct S* s)
{
memset(s-f, 0, sizeof(s-f));
memset(s-g, 0, sizeof(s-g));
}

GCC currently generates two memsets. The code with -O2 is a bit hard to read,
so I'm just pasting the -Os assembly for clarity:

 func:
   0:55   push   %ebp
   1:31 c0xor%eax,%eax
   3:89 e5mov%esp,%ebp
   5:b9 00 04 00 00   mov$0x400,%ecx
   a:57   push   %edi
   b:8b 7d 08 mov0x8(%ebp),%edi
   e:f3 abrep stos %eax,%es:(%edi)
  10:8b 55 08 mov0x8(%ebp),%edx
  13:66 b9 00 04  mov$0x400,%cx
  17:81 c2 00 10 00 00add$0x1000,%edx
  1d:89 d7mov%edx,%edi
  1f:f3 abrep stos %eax,%es:(%edi)
  21:5f   pop%edi
  22:5d   pop%ebp
  23:c3   ret

Ideally GCC should also be able to coalesce this together with memsets not
written as memset, e.g. s-g[0] = 0;.

[Bug target/49865] [4.7 Regression] Unnecessary reload causes small bloat

2011-07-27 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865

--- Comment #2 from sgunderson at bigfoot dot com 2011-07-27 17:28:19 UTC ---
(In reply to comment #1)
 Actually, thinking about it, the most efficient code sequence would be just
 giving 4100 to memset instead of 4096, but that's for an enhancement request 
 at
 some point.

Filed as bug #49872.

[Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information

2011-07-12 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

   Summary: Could do more efficient unsigned-to-float to
conversions based on range information
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


I have code that looks vaguely like this:

float func(unsigned x)
{
return (x  0xf) * 0.01f;
}

When I compile it, GCC gives a long and relatively slow sequence:

fugl:~ gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.1-3'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-multiarch
--with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib/i386-linux-gnu
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib/i386-linux-gnu
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.1 (Debian 4.6.1-3) 

fugl:~ gcc-4.6 -O2 -march=pentium3 -msse2 -mfpmath=sse -c test.c
fugl:~ objdump --disassemble test.o 

test.o: file format elf32-i386


Disassembly of section .text:

 func:
   0:83 ec 04 sub$0x4,%esp
   3:8b 54 24 08  mov0x8(%esp),%edx
   7:89 d0mov%edx,%eax
   9:81 e2 ff ff 00 00and$0x,%edx
   f:25 ff ff 0f 00   and$0xf,%eax
  14:c1 e8 10 shr$0x10,%eax
  17:f3 0f 2a c0  cvtsi2ss %eax,%xmm0
  1b:f3 0f 2a ca  cvtsi2ss %edx,%xmm1
  1f:f3 0f 59 05 00 00 00 mulss  0x0,%xmm0
  26:00 
  27:f3 0f 58 c1  addss  %xmm1,%xmm0
  2b:f3 0f 59 05 04 00 00 mulss  0x4,%xmm0
  32:00 
  33:f3 0f 11 04 24   movss  %xmm0,(%esp)
  38:d9 04 24 flds   (%esp)
  3b:58   pop%eax
  3c:c3   ret
  3d:8d 76 00 lea0x0(%esi),%esi

I assume this is because x is unsigned (I cannot easily change this, as I
depend on wraparound). However, if I insert a cast to int after the and
operation, I get the same results, and a much better sequence:

0040 func2:
  40:83 ec 04 sub$0x4,%esp
  43:8b 44 24 08  mov0x8(%esp),%eax
  47:25 ff ff 0f 00   and$0xf,%eax
  4c:f3 0f 2a c0  cvtsi2ss %eax,%xmm0
  50:f3 0f 59 05 04 00 00 mulss  0x4,%xmm0
  57:00 
  58:f3 0f 11 04 24   movss  %xmm0,(%esp)
  5d:d9 04 24 flds   (%esp)
  60:5a   pop%edx
  61:c3   ret

In other words, the modified code looks like this:

float func2(unsigned x)
{
return (int)(x  0xf) * 0.01f;
}

This should be possible for GCC to do when it has range information that says
the sign bit cannot be set.

[Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information

2011-07-12 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

--- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC ---
Wow, answer in record time :-)

I don't know anything about GCC internals, so I can't comment much on the
patch; my only worry here is what would happen if you had a very narrow mask,
e.g. (x  0xf) and you try to coerce it into the minimum possible type (a
char); wouldn't you end up doing some sort of expansion with movzbl again?

[Bug target/49583] Reloading stack operands in the wrong order, so needs to insert fxch

2011-07-03 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49583

--- Comment #3 from sgunderson at bigfoot dot com 2011-07-03 17:20:11 UTC ---
Hi,

My bug report was (as you can see in the title) not about the fstps/fld
sequence; it was about the extraneous fxch instructions. (My original code was
with -ffast-math, but I didn't want to burden the example with too many flags.)

In any case, even if I am to ignore “one or two”, how many fxch are too many? I
can give you can code where there are _five_ (in what is a tight inner loop for
me), just by expanding on the example in question.

[Bug tree-optimization/49583] New: Reloading stack operands in the wrong order, so needs to insert fxch

2011-06-29 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49583

   Summary: Reloading stack operands in the wrong order, so needs
to insert fxch
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Created attachment 24638
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24638
Minimal testcase

Hi,

It seems that when generating x87 code, GCC sometimes reloads items from the
stack in the wrong order, and then goes to great lengths to swap them around. I
have an example with six loads and six fxch instructions, but attached is a
minimal example. Compiling with gcc version 4.6.1 (Debian 4.6.1-1) as
follows:

pannekake:~ gcc-4.6 -m32 -Wall -O2 -march=pentium3 -c fxch.c

The odd sequence is around this:

  41:d9 44 24 48  flds   0x48(%esp)
  45:dd 5c 24 08  fstpl  0x8(%esp)
  49:dd 14 24 fstl   (%esp)
  4c:d9 5c 24 10  fstps  0x10(%esp)
  50:e8 fc ff ff ff   call   51 process+0x51
  55:d9 5c 24 1c  fstps  0x1c(%esp)
  59:d9 44 24 1c  flds   0x1c(%esp)
  5d:d9 44 24 10  flds   0x10(%esp)
  61:d9 c9fxch   %st(1)
  63:d9 1c b7 fstps  (%edi,%esi,4)
  66:46   inc%esi
  67:39 eecmp%ebp,%esi

In particular, why did it use fstps immediately followed by flds of the same
value? And if it really wants to reload (in my more complex example, it really
needs to), why not just do the loads in the right order from the start instead
of doing the fxch?

[Bug target/48139] __builtin_lrintf() becomes a library call, not an cvtss2si instruction

2011-03-16 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139

--- Comment #2 from sgunderson at bigfoot dot com 2011-03-16 12:03:40 UTC ---
But the lrintf() man page says explicitly that these functions cannot set
errno. Is this the man page being too glibc-specific, or something else?

[Bug target/48139] __builtin_lrintf() becomes a library call, not an cvtss2si instruction

2011-03-16 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139

--- Comment #6 from sgunderson at bigfoot dot com 2011-03-16 22:59:53 UTC ---
(In reply to comment #5)
 So, there's no glibc bug, but I don't think this makes a compelling case for
 any particular gcc behavior. The implementation is gcc+glibc, so gcc could
 say that its implementation of lrint never sets errno, and all would be
 conforming. Or gcc could say that users will pick a libc based on whether they
 want errno to be set, and so it should emit the call. Or gcc could optimize
 lrint in C99 (where errno-setting is forbidden) but not in C1x (where it's
 allowed).

Well, if C99/C1x _allows_ gcc to do this, I'd say it's a missed optimization
opportunity not to. :-)

FWIW, my code is C++.

 One local workaround is to set __attribute__((optimize(no-math-errno))) on
 the functions whose assembly contains the undesired call, but that's a bit
 fragile in the face of changing inlining decisions.

Indeed; in my case, the function is pretty much guaranteed to get inlined, so
I'd have to sprinkle those attributes around all the potential callers.

[Bug c/48139] New: __builtin_lrintf() becomes a library call, not an cvtss2si instruction

2011-03-15 Thread sgunderson at bigfoot dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139

   Summary: __builtin_lrintf() becomes a library call, not an
cvtss2si instruction
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: sgunder...@bigfoot.com


Hi,

It seems there is no way on x86-64 (short of an asm() statement) to get direct
access to the cvtss2si instruction, ie. convert a single float to an int in the
current rounding mode. Even __builtin_lrintf() becomes a library call to
glibc's lrintf(), which in itself only contains a single instruction and then a
ret. (If I set -fno-math-errno, I do get the instruction, but this is
unfortunately not an option for me.)

I've been told that this may or may not be correct behavior; it's a bit unclear
if lrintf() should set errno or not according to C99 and glibc's
math_errhandling setting. I guess this either is a missed optimization in GCC
_or_ a bug in glibc, though. It seems to me the former is more likely, though,
given that the entire point of lrint() and friends seems to be being able to do
quick float-to-int without having to deal with special code for NaN and the
likes.

[Bug rtl-optimization/45670] New: Less efficient x86 addressing mode selection on 4.6, causes -Os size regression from 4.5

2010-09-14 Thread sgunderson at bigfoot dot com

Hi,

Given the following test C++ file:

class Class
{
public:
void func();

float *buf;
int size;
};

void Class::func()
{
for (int i = 0; i  size; ++i) {
buf[i] = 0;
}
}

4.6 (see below for exact version) will generate larger code (36 vs. 30 bytes)
than 4.5.1 (Debian 4.5.1-6) given -Os. The output is

 Class::func():
   0:   55  push   %ebp
   1:   31 c0   xor%eax,%eax
   3:   89 e5   mov%esp,%ebp
   5:   8b 4d 08mov0x8(%ebp),%ecx
   8:   53  push   %ebx
   9:   8b 59 04mov0x4(%ecx),%ebx
   c:   eb 10   jmp1e Class::func()+0x1e
   e:   8d 14 85 00 00 00 00lea0x0(,%eax,4),%edx
  15:   40  inc%eax
  16:   03 11   add(%ecx),%edx
  18:   c7 02 00 00 00 00   movl   $0x0,(%edx)
  1e:   39 d8   cmp%ebx,%eax
  20:   7c ec   jl e Class::func()+0xe
  22:   5b  pop%ebx
  23:   5d  pop%ebp
  24:   c3  ret

Basically the problem is that the lea is large (due to the zero immediate
taking up 32 bits); 4.5 uses a variation where the address calculation takes
both a base and an index register, which has a shorter form not requiring to
store the zero. (The joys of x86; lea edx, [eax*4 + ecx] takes less space then
lea edx, [eax*4]...)

===

Configured with: ../src/configure -v --with-pkgversion='Debian 20100828-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,ada,c++,fortran,objc,obj-c++
--prefix=/usr/lib/gcc-snapshot --enable-shared --enable-multiarch
--enable-linker-build-id --with-system-zlib --disable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
--enable-gold --with-plugin-ld=ld.gold --enable-objc-gc --enable-targets=all
--with-arch-32=i586 --with-tune=generic --disable-werror --enable-checking=yes
--build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.6.0 20100828 (experimental) [trunk revision 163616] (Debian
20100828-1)


-- 
   Summary: Less efficient x86 addressing mode selection on 4.6,
causes -Os size regression from 4.5
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sgunderson at bigfoot dot com
 GCC build triplet: i486-linux-gnu
  GCC host triplet: i486-linux-gnu
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45670

[Bug tree-optimization/38328] New: Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com

First of all, I'm using Debian's gcc-snapshot package:

  gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian
20081117-1) 

Let me know if I should try to rebuild with another GCC version.

I tested my image scaler (http://bzr.sesse.net/qscale/) and libjpeg with 4.4
vs. 4.3, and got the following oprofile graph for the same load in both cases.

4.3:

  samples  %app name symbol name
  5182 21.8484  libjpeg.so.62.0.0jpeg_idct_islow
  5150 21.7135  libjpeg.so.62.0.0decode_mcu
  3582 15.1025  qscale   vscale
  1237  5.2154  libjpeg.so.62.0.0jpeg_fill_bit_buffer
  592   2.4960  qscale   hscale

4.4:

  samples  %app name symbol name
  7054 31.9056  qscale   jpeg_idct_islow
  4401 19.9059  qscale   decode_mcu
  3584 16.2106  qscale   vscale
  1352  6.1152  qscale   jpeg_fill_bit_buffer
  606   2.7410  qscale   hscale

Note that decode_mcu is 17% faster (probably due to better register
allocation), but jpeg_idct_islow is 36% slower! jpeg_fill_bit_buffer is also a
tiny bit slower, but that's not as critical. (The overall effect is that the
JPEG decoding as a whole runs slower.) I have not looked at the generated code,
but it's definitely not good.

FWIW, it's repeatable between runs -- the sample counts change very little
(1-2%, perhaps).


-- 
   Summary: Massive performance regression for jpeg_idct_islow
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sgunderson at bigfoot dot com
 GCC build triplet: i486-linux-gnu
  GCC host triplet: i486-linux-gnu
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #2 from sgunderson at bigfoot dot com  2008-11-30 15:06 ---
OK, I looked at the source. The issue here seems to be that 4.4 likes to
compile this:

z3 = ((z3) * (- ((INT32) 16069)));

into this:

10  0.0403 : 805cc87:   lea(%ecx,%ecx,4),%ebx
   : 805cc8a:   lea(%ebx,%ebx,4),%ebx
20  0.0805 : 805cc8d:   lea(%ebx,%ebx,4),%ebx
 7  0.0282 : 805cc90:   lea(%ecx,%ebx,2),%ebx
 3  0.0121 : 805cc93:   shl$0x4,%ebx
38  0.1530 : 805cc96:   add%ecx,%ebx
 8  0.0322 : 805cc98:   lea(%ecx,%ebx,4),%esi

4.3 uses imul here, which is a lot faster.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #4 from sgunderson at bigfoot dot com  2008-11-30 20:32 ---
Subject: Re:  Massive performance regression
for jpeg_idct_islow

On Sun, Nov 30, 2008 at 04:23:31PM -, rguenth at gcc dot gnu dot org wrote:
 Which tuning are you using?  Try enabling -mtune=generic (possibly by 
 default).

The compile flags are -g -O2 -D_REENTRANT, IIRC. No weird compile options.

/* Steinar */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #6 from sgunderson at bigfoot dot com  2008-11-30 20:40 ---
Subject: Re:  Massive performance regression
for jpeg_idct_islow

On Sun, Nov 30, 2008 at 08:37:31PM -, rguenth at gcc dot gnu dot org wrote:
 --- Comment #5 from rguenth at gcc dot gnu dot org  2008-11-30 20:37 
 ---
 What is the gcc output if you append -v?

fugl:~  /usr/lib/gcc-snapshot/bin/gcc -v   
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20081117-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,c++,java,fortran,objc,obj-c++,ada
--prefix=/usr/lib/gcc-snapshot --enable-shared --with-system-zlib --disable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk
--enable-gtk-cairo --disable-plugin
--with-java-home=/usr/lib/gcc-snapshot/java-1.5.0-gcj-4.4-1.5.0.0/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/gcc-snapshot/jvm
--with-jvm-jar-dir=/usr/lib/gcc-snapshot/jvm-exports
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-mpfr
--enable-targets=all --enable-cld --disable-werror --build=i486-linux-gnu
--host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian
20081117-1) 

/* Steinar */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #8 from sgunderson at bigfoot dot com  2008-11-30 21:19 ---
Subject: Re:  Massive performance regression
for jpeg_idct_islow

On Sun, Nov 30, 2008 at 09:04:07PM -, rguenth at gcc dot gnu dot org wrote:
 Append -v to the command-line you use for compiling ;)  Seriously, if using
 -mtune=generic works then this is a Debian packaging issue of their
 gcc-snapshot compiler.

fugl:~/nmu/libjpeg6b-6b /usr/lib/gcc-snapshot/bin/gcc -D_REENTRANT -g -Wall
-O2 -g -I. -c ./jidctint.c  -fPIC -DPIC -o .libs/jidctint.o -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20081117-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,c++,java,fortran,objc,obj-c++,ada
--prefix=/usr/lib/gcc-snapshot --enable-shared --with-system-zlib --disable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk
--enable-gtk-cairo --disable-plugin
--with-java-home=/usr/lib/gcc-snapshot/java-1.5.0-gcj-4.4-1.5.0.0/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/gcc-snapshot/jvm
--with-jvm-jar-dir=/usr/lib/gcc-snapshot/jvm-exports
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-mpfr
--enable-targets=all --enable-cld --disable-werror --build=i486-linux-gnu
--host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian
20081117-1) 
COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC'
'-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486'
 /usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/cc1 -quiet -v -I.
-D_REENTRANT -DPIC ./jidctint.c -quiet -dumpbase jidctint.c -mtune=i486
-auxbase-strip .libs/jidctint.o -g -g -O2 -Wall -version -fPIC -o
/tmp/cc5hqg0m.s
ignoring nonexistent directory /usr/local/include/i486-linux-gnu
ignoring nonexistent directory
/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../../i486-linux-gnu/include
ignoring nonexistent directory /usr/include/i486-linux-gnu
#include ... search starts here:
#include ... search starts here:
 .
 /usr/local/include
 /usr/lib/gcc-snapshot/include
 /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/include
 /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/include-fixed
 /usr/include
End of search list.
GNU C (Debian 20081117-1) version 4.4.0 20081117 (experimental) [trunk revision
141948] (i486-linux-gnu)
compiled by GNU C version 4.4.0 20081117 (experimental) [trunk revision
141948], GMP version 4.2.2, MPFR version 2.3.2.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 445209552aa2d93e7e967b7473e83cd6
COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC'
'-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486'
 as -V -Qy -o .libs/jidctint.o /tmp/cc5hqg0m.s
GNU assembler version 2.18.0 (i486-linux-gnu) using BFD version (GNU Binutils
for Debian) 2.18.0.20080103
COMPILER_PATH=/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC'
'-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486'

-mtune=generic still produces these long series of leas.

/* Steinar */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #9 from sgunderson at bigfoot dot com  2008-11-30 21:22 ---
Subject: Re:  Massive performance regression
for jpeg_idct_islow

On Sun, Nov 30, 2008 at 09:19:08PM -, sgunderson at bigfoot dot com wrote:
 -mtune=generic still produces these long series of leas.

Sorry, I objdumped the wrong file. -mtune=generic appears to fix it (although
I haven't checked the performance).

/* Steinar */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow

2008-11-30 Thread sgunderson at bigfoot dot com



--- Comment #11 from sgunderson at bigfoot dot com  2008-11-30 22:48 ---
Subject: Re:  Massive performance regression
for jpeg_idct_islow

On Sun, Nov 30, 2008 at 09:29:29PM -, rguenth at gcc dot gnu dot org wrote:
 so it uses -mtune=i486 - this optimizes the multiplication for i486 where imul
 is slow.  The difference to 4.3 is a packaging issue in Debian.

Thanks! I'll file a bug against the package.

/* Steinar */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328

87 matches

Mail list logo