[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214 --- Comment #2 from sgunderson at bigfoot dot com --- OK, starting a reduce that also checks for no -Wreturn-type warnings.
[Bug c++/81668] LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 --- Comment #12 from sgunderson at bigfoot dot com --- The spurious warning seems to be gone in GCC 8.
[Bug tree-optimization/86214] New: [8 Regression] Strongly increased stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214 Bug ID: 86214 Summary: [8 Regression] Strongly increased stack usage Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Created attachment 44296 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44296=edit Test case Hi, We noticed that MySQL does not pass its test suite when compiled with GCC 8; it runs out of stack. (GCC 7 is fine.) A reduced test case is included (mostly by C-Reduce, but it needed some help by hand); most of it appears to be fluff that keeps the compiler from just optimizing away the entire thing, but the gist of it seems to be that it inlines the bg::bl() function several times without caring that it balloons the stack size, and then doesn't manage to shrink the stack again by overlapping variables. Putting the noinline attribute on bg::bl() seems to be a workaround for now. For comparison: > g++-7 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i stack.i: In function ‘void c()’: stack.i:34:6: warning: stack usage is 8240 bytes [-Wstack-usage=] void c() { ^ > g++-8 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i > stack.i: In function ‘void c()’: stack.i:34:6: warning: stack usage is 32816 bytes [-Wstack-usage=] void c() { ^ The actual, unreduced file can be found at https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/row/row0ins.cc#L926 (the line is positioned on a function whose adding noinline helps, although I don't think it corresponds directly to bg::bl; I think bg::bl might be ib::error, and the 8192-sized buffer comes from ib::logger::msg).
[Bug libstdc++/80335] perf of copying std::optional
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80335 --- Comment #3 from sgunderson at bigfoot dot com --- Appears to have been fixed in GCC 8, indeed. #include std::optional func() { return 3; } GCC 7 (-O2) compiles to: 0: 48 89 f8mov%rdi,%rax 3: c7 07 03 00 00 00 movl $0x3,(%rdi) 9: c6 47 04 01 movb $0x1,0x4(%rdi) d: c3 retq GCC 8 (-O2): 0: 48 b8 03 00 00 00 01movabs $0x10003,%rax 7: 00 00 00 a: c3 retq This is an ABI break, but I'll happily take it. :-)
[Bug c++/84076] [6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076 --- Comment #5 from sgunderson at bigfoot dot com --- Ah, so it's allowed to send structs and classes, just not non-PODs. So that's why the conversion to a pointer happens.
[Bug c++/84076] [6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076 --- Comment #3 from sgunderson at bigfoot dot com --- printf aside, is this thing actually supported in varargs? I thought non-PODs were not allowed in varargs, period. (If it's not allowed, I'm not sure why the compiler even tries.)
[Bug c++/84076] New: [5/6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84076 Bug ID: 84076 Summary: [5/6/7/8 Regression] Warning about objects through POD mistakenly claims the object is a pointer Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Test program: #include #include int main(void) { std::string str; printf("%s\n", str); } GCC 4.9 and older gives: test.cpp: In function ‘int main()’: test.cpp:7:20: error: cannot pass objects of non-trivially-copyable type ‘std::string {aka class std::basic_string}’ through ‘...’ printf("%s\n", str); ^ GCC 5.0 and newer (including 7.3.0) prints: test.cpp: In function ‘int main()’: test.cpp:7:20: warning: format ‘%s’ expects argument of type ‘char*’, but argument 2 has type ‘std::__cxx11::string* {aka std::__cxx11::basic_string*}’ [-Wformat=] printf("%s\n", str); ^ This is a confusing warning, since it claims I'm sending a std::string * when I'm sending a std::string. In particular, in the program I was trying to fix this by adding ->c_str(), but .c_str() was the correct choice.
[Bug c++/83227] New: [7 Regression] internal compiler error: in process_init_constructor_array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83227 Bug ID: 83227 Summary: [7 Regression] internal compiler error: in process_init_constructor_array Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- I believe this is distinct from #82593, so I'm filing it as a separate bug. The following test program dies with GCC 7.2.0 with -std=c++17: #include #include struct Direction { Direction() {} }; struct Front_back : public Direction { Front_back() : Direction() {} }; void foo(const std::vector ); void bar() { foo({ Front_back{} }); } test.cc: In function ‘void bar()’: test.cc:15:23: internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1308 foo({ Front_back{} }); ^ Please submit a full bug report, with preprocessed source if appropriate. It works with GCC 6.4.0, and also with -std=c++14. It's still there in the 20171109 snapshot. Reduced preprocessed case: namespace std { template class initializer_list { const a *b; unsigned long c; }; struct e { e(int); }; template class f : e { public: f(initializer_list, int g = int()) : e(g) {} }; } struct h {}; struct i : h { i(); }; void foo(std::f) { foo({i{}}); }
[Bug c++/83226] New: [7 Regression] std::map with reference T breaks in C++17 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83226 Bug ID: 83226 Summary: [7 Regression] std::map with reference T breaks in C++17 mode Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, The following code works under GCC for -std=c++14, but breaks under -std=c++17: #include #include int main(void) { std::map<int, const int &> m; std::pair<int, const int &> val(3, 4); m.insert(val); // Compile error. m.emplace(3, 4); // Works. } I've looked briefly through the standard, but I can't see anything that indicates you can't have a const reference as value type (not that I'd recommend it!). The error messages given are: In file included from /usr/include/c++/7/bits/stl_iterator.h:66:0, from /usr/include/c++/7/bits/stl_algobase.h:67, from /usr/include/c++/7/bits/stl_tree.h:63, from /usr/include/c++/7/map:60, from test.cc:1: /usr/include/c++/7/bits/ptr_traits.h: In substitution of ‘template template using rebind = _Up* [with _Up = const int&; _Tp = std::_Rb_tree_node<std::pair >]’: /usr/include/c++/7/bits/ptr_traits.h:147:77: required by substitution of ‘template using __ptr_rebind = typename std::pointer_traits::rebind<_Tp> [with _Ptr = std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair > > >::pointer; _Tp = const int&]’ /usr/include/c++/7/bits/node_handle.h:203:69: required by substitution of ‘template template using __pointer = std::__ptr_rebind::pointer, _Tp> [with _Tp = std::pair::second_type; _Key = int; _Value = std::pair; _NodeAlloc = std::allocator<std::_Rb_tree_node<std::pair > >]’ /usr/include/c++/7/bits/node_handle.h:206:60: required from ‘class std::_Node_handle<int, std::pair, std::allocator<std::_Rb_tree_node<std::pair > > >’ test.cc:8:15: required from here /usr/include/c++/7/bits/ptr_traits.h:133:28: error: forming pointer to reference type ‘const int&’ using rebind = _Up*; ^ Confirmed with 20171109 snapshot. Clang 5.0.0 with the same libstdc++ gives a similar error, so I believe this is about the standard library, not the compiler (unless it's an invalid program). GCC 6.4.0 does not give an error here, so I'm marking this as a regression.
[Bug target/54589] struct offset add should be folded into address calculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #3 from sgunderson at bigfoot dot com --- Still there in GCC 7.2.1 (exact same assembler output), and in 8.0 snapshot 20171017.
[Bug c++/82799] New: [8 Regression] -Wunused-but-set-variable false positive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82799 Bug ID: 82799 Summary: [8 Regression] -Wunused-but-set-variable false positive Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, Reduced testcase (automatically; it might be possible to reduce further): enum a { b }; struct c { template < a > int d() { const bool is_ident = 0; const int ret = is_ident ? 7 : 9; return ret; } void e() { d < b > (); } }; When compiled with -Wall, yields: test.cc: In instantiation of 'int c::d() [with a = (a)0]': test.cc:9:12: required from here test.cc:4:14: warning: variable 'is_ident' set but not used [-Wunused-but-set-variable] const bool is_ident = 0; ^~~~ even though is_ident is clearly used on the line below. gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian 20171017-1) This does not happen with GCC 7.2.1.
[Bug c++/81716] Bogus -Wlto warning with forward-declared pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81716 --- Comment #2 from sgunderson at bigfoot dot com --- Still there in: gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian 20171017-1)
[Bug c++/82780] [8 Regression] ICE on compiling Boost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82780 --- Comment #1 from sgunderson at bigfoot dot com --- Here's a version that's valid C++: class a { }; template class c { c(c &) : a(static_cast (e.d)) {} a d; };
[Bug c++/82780] New: [8 Regression] ICE on compiling Boost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82780 Bug ID: 82780 Summary: [8 Regression] ICE on compiling Boost Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, Reduced test case below. Regression happens when compiling a part of MySQL which uses Boost (1.65.0); original code was valid but reduced case is not. (Reduction also independently found #82050.) GCC 7 does not complain. gcc version 8.0.0 20171017 (experimental) [trunk revision 253812] (Debian 20171017-1) atum17:~> cat ~/reduce2/tmp.i class a { } template class c { c(c &) : (static_cast a && e.d; a d atum17:~> /usr/lib/gcc-snapshot/bin/g++ -c ~/reduce2/tmp.i [...] /srv/sesse/reduce2/tmp.i:2:72: internal compiler error: tree check: expected tree that contains 'decl common' structure, have 'identifier_node' in get_inner_reference, at expr.c:6999 } template class c { c(c &) : (static_cast a && e.d; a d ^
[Bug c++/82269] -Wignored-qualifiers should not trigger on templated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269 --- Comment #4 from sgunderson at bigfoot dot com --- This one is perhaps a better case: ../sql/parse_tree_column_attrs.h: In constructor 'PT_blob_type::PT_blob_type(Blob_type, const CHARSET_INFO*, bool)': ../sql/parse_tree_column_attrs.h:548:59: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers] : PT_type(static_cast<decltype(PT_type::type)>(blob_type)), ^ I looked for a bug before filing but didn't find any; which one should I subscribe to?
[Bug c++/82269] -Wignored-qualifiers should not trigger on templated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269 sgunderson at bigfoot dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from sgunderson at bigfoot dot com --- OK, actually looking at the specific caller, perhaps it should :-) I'm still a bit torn, though. I'll close for now and see if I can find a better example.
[Bug c++/82269] New: -Wignored-qualifiers should not trigger on templated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82269 Bug ID: 82269 Summary: -Wignored-qualifiers should not trigger on templated code Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, When compiling MySQL with GCC 8.0.0 20170917, I get In file included from ../include/my_byteorder.h:53:0, from ../include/m_ctype.h:29, from ../sql/parse_tree_helpers.h:24, from ../unittest/gunit/opt_ref-t.cc:23: ../include/template_utils.h: In instantiation of 'T pointer_cast(void*) [with T = unsigned char* const]': ../sql/sql_optimizer.cc:9973:60: required from here ../include/template_utils.h:70:10: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers] return static_cast(p); ^ I think this is a bit too aggressive. The function in question reads template inline T pointer_cast(void *p) { return static_cast(p); } Sure, it's possible to put std::remove_cv_t around the type, but should it really be needed?
[Bug libstdc++/80335] perf of copying std::optional
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80335 sgunderson at bigfoot dot com changed: What|Removed |Added CC||sgunderson at bigfoot dot com --- Comment #1 from sgunderson at bigfoot dot com --- This also affects the _returning_ std::optional; since it is not trivially copy constructible, std::optional must be returned (at least on amd64) by means of a hidden parameter instead of in registers. Since this affects the return type ABI, it can't be changed easily after-the-fact, so if possible, it should be fixed before C++17 support becomes non-experimental.
[Bug c/81980] Spurious -Wmissing-format-attribute warning in 32-bit mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81980 --- Comment #1 from sgunderson at bigfoot dot com --- Forgot to say: Also present in trunk r251306 and all the way back to at least 4.8.
[Bug c/81980] New: Spurious -Wmissing-format-attribute warning in 32-bit mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81980 Bug ID: 81980 Summary: Spurious -Wmissing-format-attribute warning in 32-bit mode Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, The following (reduced) code gives a warning if and only if I add -m32: atum17:~> cat test.c #include char a; void set_message(const char *fmt, va_list ap) __attribute__((format(printf, 1, 0))); void set_message_by_errcode(va_list ap) { set_message(, ap); } atum17:~> gcc -Wmissing-format-attribute -c test.c atum17:~> gcc -Wmissing-format-attribute -c test.c -m32 test.cc: In function ‘void set_message_by_errcode(va_list)’: test.cc:6:61: warning: function ‘void set_message_by_errcode(va_list)’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] void set_message_by_errcode(va_list ap) { set_message(, ap); } ^ I believe the warning is spurious, since there's no way you could construct a valid printf format attribute for set_message_by_errcode (it doesn't take in a string parameter). This holds for both C and C++. atum17:~> gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 7.2.0-1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Debian 7.2.0-1)
[Bug c++/81668] LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 --- Comment #9 from sgunderson at bigfoot dot com --- (In reply to Manuel López-Ibáñez from comment #8) > Actually, what would be more useful is to detect that the difference in type > comes from S and point out where S has been declared as different types. Yes, that would be even better. But save for that :-) > Note that this is not the same bug I pointed out for > > ../include/violite.h:288:8: warning: type ‘struct st_vio’ violates the C++ > One Definition Rule [-Wodr] > ../include/violite.h:288:0: note: a different type is defined in another > translation unit > > The :0: indicates something wrong with the location info. If the location is > unknown, it would be better to use UNKNOWN_LOCATION. Yes, I know. It's a bit odd, but it doesn't bother me as much in this case.
[Bug c++/81668] LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 --- Comment #7 from sgunderson at bigfoot dot com --- (In reply to Manuel López-Ibáñez from comment #6) >> fts0pars.y:62:0: note: a field with different name is defined in another >> translation unit > Did you cut the above? It looks like a note without a previous warning. > Also, GCC will have trouble to point out the correct location when compiling > a generated file that contains linemarkers, unless the linemarkers exactly > point out to the original file AND the original file is available to read. Sorry, yes, it was cut (I didn't intend to include it, as it is related to another and very real warning). Let me make a more minimal example to illustrate my issue (adapted from the case in 81716). I thought I'd pasted it already, but evidently it never made Bugzilla. atum17:~> cat test1.cc #include "test.h" void foo(S *t) { q[0] = nullptr; } atum17:~> cat test2.cc #include #include "test.h" class S { int m; }; void bar(S *t) { printf("%p\n", q[0]); } atum17:~> cat test.h class S; extern S *q[10]; atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -O2 -flto -o test.so test1.cc test2.cc test.h:2:11: warning: type of 'q' does not match original declaration [-Wlto-type-mismatch] extern S *q[10]; ^ test.h:2:11: note: 'q' was previously declared here extern S *q[10]; ^ test.h:2:11: note: code may be misoptimized unless -fno-strict-aliasing is used /usr/lib/x86_64-linux-gnu/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: error: ld returned 1 exit status What I'd like is some sort of indication about where test.h came in from (test1.cc and test2.cc).
[Bug c++/81668] LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 --- Comment #5 from sgunderson at bigfoot dot com --- (In reply to Markus Trippelsdorf from comment #3) > I don't see any bug, all relevant information is in the warnings. My point is that all relevant information _isn't_ in the warnings. In particular: The context of the .h file (which .o/.cc file it was compiled as part of in the two cases) is nowhere to be found. If I had that, it would be a lot easier to preprocess the two files and try to find what the difference is. Seemingly at least one of these was a GCC bug (#81716); with some luck, the others I cannot figure out are, too.
[Bug c++/81716] New: Bogus -Wlto warning with forward-declared pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81716 Bug ID: 81716 Summary: Bogus -Wlto warning with forward-declared pointers Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, It seems that if you forward-declare a class in one translation unit (and use pointers to it), it will count as a different type for LTO detection purposes, which doesn't sound right. Might there be that it implicitly gets a type of nullptr_t? Or something else? gcc version 8.0.0 20170618 (experimental) [trunk revision 249349] (Debian 20170618-1) atum17:~> cat test1.cc class S; extern S *q[10]; void foo(S *t) { q[0] = nullptr; } atum17:~> cat test2.cc #include class S { int m; }; extern S *q[10]; void bar(S *t) { printf("%p\n", q[0]); } atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -O2 -flto -o test.so test1.cc test2.cc test2.cc:6:11: warning: type of 'q' does not match original declaration [-Wlto-type-mismatch] extern S *q[10]; ^ test1.cc:2:11: note: 'q' was previously declared here extern S *q[10]; ^ test1.cc:2:11: note: code may be misoptimized unless -fno-strict-aliasing is used /usr/lib/x86_64-linux-gnu/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: error: ld returned 1 exit status
[Bug c++/81668] LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 --- Comment #2 from sgunderson at bigfoot dot com --- Running with -fno-diagnostics-show-caret does not help any: ../include/violite.h:288:8: warning: type ‘struct st_vio’ violates the C++ One Definition Rule [-Wodr] ../include/violite.h:288:0: note: a different type is defined in another translation unit ../include/violite.h:339:46: note: the first difference of corresponding definitions is field ‘viodelete’ ../include/violite.h:339:0: note: a field of same name but different type is defined in another translation unit It's hard for me to look at the preprocessed source code, because I don't know what to preprocess. Like I said, there's probably a thousand translation units including this .h file; how would I know which one to look through to find the two differing definitions?
[Bug c++/81668] New: LTO ODR warnings are not helpful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81668 Bug ID: 81668 Summary: LTO ODR warnings are not helpful Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, I'm trying to make MySQL compile with LTO. There are a lot of ODR violations (which I'm trying to fix), but sometimes, the warnings are too vague to give any real information. An example: [797/1336] Building CXX object unittest/gunit/CMakeFiles/merge_large_tests-t.dir/opt_ref-t.cc.o In file included from ../include/my_byteorder.h:53:0, from ../include/m_ctype.h:29, from ../include/my_compare.h:25, from ../sql/field.h:22, from ../unittest/gunit/fake_table.h:27, from ../unittest/gunit/opt_ref-t.cc:23: ../include/template_utils.h: In instantiation of 'T pointer_cast(void*) [with T = unsigned char* const]': ../sql/sql_optimizer.cc:9901:60: required from here ../include/template_utils.h:70:10: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers] return static_cast(p); ^ [852/1336] Linking CXX executable runtime_output_directory/pfs-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless -fno-strict-aliasing is used [854/1336] Linking CXX executable runtime_output_directory/pfs_instr-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless -fno-strict-aliasing is used [855/1336] Linking CXX executable runtime_output_directory/pfs_instr_class-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless -fno-strict-aliasing is used [856/1336] Linking CXX executable runtime_output_directory/pfs_account-oom-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless -fno-strict-aliasing is used [857/1336] Linking CXX executable runtime_output_directory/pfs_host-oom-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: code may be misoptimized unless -fno-strict-aliasing is used [858/1336] Linking CXX executable runtime_output_directory/pfs_user-oom-t ../storage/perfschema/pfs.h:72:40: warning: type of 'THR_PFS_contexts' does not match original declaration [-Wlto-type-mismatch] extern thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note: 'THR_PFS_contexts' was previously declared here thread_local PFS_table_context *THR_PFS_contexts[THR_PFS_NUM_KEYS]; ^ ../storage/perfschema/pfs.cc:2072:33: note
[Bug c++/81277] New: assert() in multiversioned functions causes copmilation error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81277 Bug ID: 81277 Summary: assert() in multiversioned functions causes copmilation error Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- As the bug title says: #include #include __attribute__((target("default"))) void foo(int x) { assert(x >= 0); } __attribute__((target("arch=haswell"))) void foo(int x) { assert(x >= 0); } When compiled: /tmp/ccCQK5gF.s: Assembler messages: /tmp/ccCQK5gF.s:71: Error: symbol `_ZZ3fooiE19__PRETTY_FUNCTION__' is already defined
[Bug c++/81276] New: Function multiversioning doesn't work with C++ templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81276 Bug ID: 81276 Summary: Function multiversioning doesn't work with C++ templates Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, Seemingly one can't ask for a template to be multiversioned: template __attribute__ ((target("default"))) void func(T *x) {} template __attribute__ ((target("arch=haswell"))) void func(T *x) {} gives, when compiled: func.cpp:7:6: error: ambiguating new declaration ‘template void func(T*)’ void func(T *x) {} ^~~~ func.cpp:3:6: note: old declaration ‘template void func(T*)’ void func(T *x) {} ^~~~ target_clones works, but is useless for me because it turns off inlining.
[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858 --- Comment #4 from sgunderson at bigfoot dot com --- I think this should work as reduction: struct Empty { }; template struct A { A =(const A&) { T t(3); return *this; } }; class B { A a; }; int main(void) { B b1, b2; b1 = b2; } The error is attributed to the line with “class B”, without ever mentioning the “b1 = b2;” line.
[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858 --- Comment #2 from sgunderson at bigfoot dot com --- Yes, I mean that the error message isn't clear (and it's basically the same error message in 4.8, so no regression). I don't think I understand the difficulties involved. Doesn't the error come as a direct result of my copying? If I do this with e.g. std::vector, I get a much clearer error message, which directly points to the line in question: […] /usr/include/c++/6/bits/vector.tcc:195:19: required from ‘std::vector<_Tp, _Alloc>& std::vector<_Tp, _Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp = std::unique_ptr; _Alloc = std::all ocator<std::unique_ptr >]’ test.cc:7:7: required from here My main gripe is that with unordered_map, the error traceback stops in the internal details of _Hashtable: /usr/include/c++/7/bits/unordered_map.h:101:11: required from here In the real-world case in question, I eventually had to go into unordered_map.h (yes, in /usr/include) and replace “= default;” with “= delete;” to figure out who was calling the copy constructor.
[Bug c++/80858] New: When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858 Bug ID: 80858 Summary: When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Using gcc version 7.1.0 (Debian 7.1.0-5) (but the error goes back to at least 4.8, and amazingly, also in Clang), on this piece of code, simplified from a much bigger test case: #include #include int main(void) { std::unordered_map<int, std::unique_ptr> a, b; a = b; } The code is wrong, and GCC correctly rejects it, but the error message is less than helpful, since it doesn't mention the line with the assignment on, or really anything hinting at who asked the copy constructor to be invoked: $ g++-7 -c test.cc In file included from /usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0, from /usr/include/c++/7/bits/allocator.h:46, from /usr/include/c++/7/memory:63, from test.cc:1: /usr/include/c++/7/ext/new_allocator.h: In instantiation of ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair >; _Args = {const std::pair > >&}; _Tp = std::pair >]’: /usr/include/c++/7/bits/alloc_traits.h:475:4: required from ‘static void std::allocator_traits<std::allocator<_Tp1> >::construct(std::allocator_traits<std::allocator<_Tp1> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair >; _Args = {const std::pair > >&}; _Tp = std::pair >; std::allocator_traits<std::allocator<_Tp1> >::allocator_type = std::allocator<std::pair > >]’ /usr/include/c++/7/bits/hashtable_policy.h:2066:37: required from ‘std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type* std::__detail::_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&& ...) [with _Args = {const std::pair > >&}; _NodeAlloc = std::allocator<std::__detail::_Hash_node<std::pair >, false> >; std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type = std::__detail::_Hash_node<std::pair >, false>]’ /usr/include/c++/7/bits/hashtable.h:1023:54: required from ‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&)::<lambda(const __node_type*)> [with _Key = int; _Value = std::pair >; _Alloc = std::allocator<std::pair > >; _ExtractKey = std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<false, false, true>; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__node_type = std::__detail::_Hash_node<std::pair >, false>; typename _Traits::__hash_cached = std::integral_constant<bool, false>]’ /usr/include/c++/7/bits/hashtable.h:1022:9: required from ‘struct std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&) [with _Key = int; _Value = std::pair >; _Alloc = std::allocator<std::pair > >; _ExtractKey = std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<false, false, true>]::<lambda(const __node_type*)>’ /usr/include/c++/7/bits/hashtable.h:1021:14: required from ‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>& std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&) [with _Key = int; _Value = std::pair >; _Alloc = std::allocator<std::pair > >; _ExtractKey = std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<false, false, true>]’ /usr/include/c++/7/bits/unordered_map.h:101:11: required from here /usr/include
[Bug c++/79746] [7 Regression] Confusing -Wunused-but-set-parameter warning with virtual inheritance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746 --- Comment #6 from sgunderson at bigfoot dot com --- Thanks. But I'm still curious; is the second code snippet well-formed or not?
[Bug c++/79746] Confusing -Wunused-but-set-parameter warning with virtual inheritance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746 --- Comment #1 from sgunderson at bigfoot dot com --- Actually this is interesting; this code (derived from the previous one) compiles without warning in GCC 7.0 and Clang, but gives an error in GCC 6.3: struct Base { Base(const char *foo) : m_foo(foo) {} virtual int func() = 0; const char *m_foo; }; struct Derived : public virtual Base { Derived(const char *foo) { (void)foo; } }; Which compiler is right?
[Bug c++/79750] New: -Wimplicit-fallthrough= comment detection gets confused by #if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79750 Bug ID: 79750 Summary: -Wimplicit-fallthrough= comment detection gets confused by #if Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, It seems that fallthrough comments are not properly parsed if they are followed by a preprocessor statement. Minified test case: atum17:~> /usr/lib/gcc-snapshot/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20170226-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,go,brig,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --with-gcc-major-version-only --program-prefix= --enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=yes --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian 20170226-1) atum17:~> cat test.cc int func(int x) { switch (x) { case 0: x = 1; //-fallthrough #if 1 case 1: #endif case 2: ++x; } return x; } atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wall -Wextra -c test.cc test.cc: In function 'int func(int)': test.cc:5:5: warning: this statement may fall through [-Wimplicit-fallthrough=] x = 1; ~~^~~ test.cc:8:2: note: here case 1: ^~~~ If I remove the #if 1, there is no warning.
[Bug c++/79746] New: Confusing -Wunused-but-set-parameter warning with virtual inheritance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79746 Bug ID: 79746 Summary: Confusing -Wunused-but-set-parameter warning with virtual inheritance Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, This is a minified testcase of MySQL when trying to compile with a 7.0 snapshot: atum17:~> /usr/lib/gcc-snapshot/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20170226-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,go,brig,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --with-gcc-major-version-only --program-prefix= --enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=yes --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian 20170226-1) atum17:~> cat test2.cc struct Base { Base(const char *foo) : m_foo(foo) {} virtual int func() = 0; const char *m_foo; }; struct Derived : public virtual Base { Derived(const char *foo) : Base(foo) {} }; atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wunused-but-set-parameter -c test2.cc test2.cc: In constructor 'Derived::Derived(const char*)': test2.cc:9:22: warning: parameter 'foo' set but not used [-Wunused-but-set-parameter] Derived(const char *foo) : Base(foo) {} I think the warning is actually somehow correct, but it's very confusing until you see what's going on. I think the logic goes something like: Virtual bases are always set through the most derived class. Since Derived has a pure virtual (func()), it can't be the most derived class, and thus, its call to Base() can never actually happen. Thus, “foo” is unused. Perhaps a better warning would be something like test2.cc:9:22: warning: class 'Derived' inherits virtually from 'Base' but is not possible to instantiate by itself, so it can never be the most derived class, and the call to 'Base::Base(const char *foo)' is always ignored but that might be too wordy.
[Bug c++/79727] -Wimplicit-fallthrough=3 doesn't seem to match any comments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79727 --- Comment #2 from sgunderson at bigfoot dot com --- Wait, it can't do with a substring match? That wasn't clear at all from the documentation, and it makes the default a lot more strict than I assumed. Some of the regexes are rather strange, then; one would assume that the ones starting with [ \t.!]* are to capture word boundaries; why would . and ! be there otherwise? To capture strange comment syntaxes like this? // !else, fallthrough-
[Bug c++/79727] New: -Wimplicit-fallthrough=3 doesn't seem to match any comments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79727 Bug ID: 79727 Summary: -Wimplicit-fallthrough=3 doesn't seem to match any comments Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, I can't get the supposed fallthrough comments (on https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html) to work: gcc version 7.0.1 20170226 (experimental) [trunk revision 245744] (Debian 20170226-1) atum17:~> cat test.cc #include #include int main(int argc, char **argv) { switch (argc) { case 2: printf("something\n"); // -fallthrough case 3: printf("whatever\n"); break; } } atum17:~> /usr/lib/gcc-snapshot/bin/g++ -Wimplicit-fallthrough=3 -c test.cc test.cc: In function 'int main(int, char**)': test.cc:8:9: warning: this statement may fall through [-Wimplicit-fallthrough=] printf("something\n"); ~~^~~ test.cc:10:2: note: here case 3: ^~~~ I've tried a variety of the other patterns that are supposed to match, but I can't get it to work on level 3. Level 2 appears to work as documented.
[Bug target/71993] __builtin_cpu_supports() does not support "f16c"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71993 --- Comment #2 from sgunderson at bigfoot dot com --- Right.
[Bug target/71993] New: __builtin_cpu_supports() does not support "f16c"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71993 Bug ID: 71993 Summary: __builtin_cpu_supports() does not support "f16c" Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- As the summary says. You just get: lib.cc:4:1: error: Parameter to builtin not valid: f16c Tested with: gcc version 7.0.0 20160707 (experimental) [trunk revision 238117] (Debian 20160707-1)
[Bug tree-optimization/71990] Function multiversioning prohibits inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990 --- Comment #4 from sgunderson at bigfoot dot com --- OK, so it would have to be a special kind of cloning, not the one you can do yourself from code as of today? As a user, I suppose there's no really good way of dealing with this currently, right? Short of maybe doing manual multiversioning with __builtin_cpu_supports() and hoping that the compiler can hoist that out of all the loops.
[Bug tree-optimization/71990] Function multiversioning prohibits inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990 --- Comment #2 from sgunderson at bigfoot dot com --- Would pushing the mv automatically upwards into callers really help? There's still no way that I can see to inline the function; I mean, pushing upwards is what I've been trying to do here manually with the target clones.
[Bug tree-optimization/71990] New: Function multiversioning prohibits inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71990 Bug ID: 71990 Summary: Function multiversioning prohibits inlining Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Target Milestone: --- Hi, I'm trying to write a library that uses F16C instructions in certain places, and since they're not really universally accessible (and ld.so hardware capabilities seem to have been long abandoned), I've tried to use function multiversioning for it. However, trying to combine it with inlining seems to draw a blank; a very simplified example: klump:~> /usr/lib/gcc-snapshot/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/lib/gcc-snapshot/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/7.0.0/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20160707-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,java,go,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --enable-shared --enable-linker-build-id --disable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-7-snap-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-7-snap-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --disable-werror --enable-checking=yes --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.0.0 20160707 (experimental) [trunk revision 238117] (Debian 20160707-1) klump:~> cat test.cc #include __attribute__ ((target("default"))) inline int foo() { return 0; } __attribute__ ((target("avx"))) inline int foo() { return 1; } int bar() { int sum = 0; for (int i = 0; i < 100; ++i) { sum += foo(); } return sum; } int main(void) { printf("%d\n", bar()); } klump:~> /usr/lib/gcc-snapshot/bin/g++ -O2 -o test test.cc klump:~> nm --demangle test | egrep 'foo|bar' 00400c40 i _Z3foov.ifunc() 00400bf0 T bar() 00400c20 W foo() 00400c30 W foo() [clone .avx] 00400c40 W foo() [clone .resolver] Of course, in reality, my foo() would do something more complicated, like call _cvtss_sh() or similar; this is a toy example. But it illustrates that the function multiversioning blocks inlining. If I compile with -mavx, the entire multiversioning goes away (only the AVX version is emitted), so I hoped that I could use target cloning on bar(): __attribute__ ((target_clones("avx", "default"))) int bar() { // same code... but unfortunately, no. There's a bar() clone for AVX emitted, but it still calls the resolving function for foo(); no inlining. So I really can't find any usable way of using this feature if your architecture switch is in inlined functions (in my case, convert to/from fp16).
[Bug rtl-optimization/68282] Optimization fails to remove unnecessary sign extension instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68282 sgunderson at bigfoot dot com changed: What|Removed |Added CC||sgunderson at bigfoot dot com --- Comment #2 from sgunderson at bigfoot dot com --- Shouldn't it be possible to fold the incl into the mov, too? shrl$2, %edi movltable+4(,%rdi,4), %eax retq
[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776 sgunderson at bigfoot dot com changed: What|Removed |Added CC||sgunderson at bigfoot dot com --- Comment #6 from sgunderson at bigfoot dot com --- Without knowing anything about the GCC internals here, I could perhaps also point out that GCC should know that these have limited range. As a trivial example: int foo(int x) { int z = __builtin_ctz(x); if (z 2000) { return 1; } else { return 0; } } There's no way this function can return anything but 0, and VRP should probably be taught that. (I wonder if this would fix the unneccessary sign extension too?)
[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776 --- Comment #7 from sgunderson at bigfoot dot com --- Wait, sorry, someone's already pointed that out. Ignore me, then... I can at least confirm it still happens with GCC 4.8.1.
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #6 from sgunderson at bigfoot dot com --- BZHI seems to have the same problem.
[Bug target/57624] BZHI instrinsic is missing
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57624 --- Comment #2 from sgunderson at bigfoot dot com --- Shouldn't really the documentation say so, then? The entire GCC manual seems to make no note of this header at all, as far as I can see.
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #8 from sgunderson at bigfoot dot com --- I really did spot the BZHI problem in actual code; that's how I found out :-) I rewrote it slightly and the problem disappeared, though.
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #10 from sgunderson at bigfoot dot com --- On Thu, Jun 27, 2013 at 12:27:02PM +, jakub at gcc dot gnu.org wrote: Then please provide preprocessed testcase for it (plus command line options). Because I'm really curious how it could have been matched. Sorry, the code is a) not so easy to make public right now, and b) this particular edit has been lost in the mists of time (like I said, I wrote it slightly differently and then it was gone). But the scrollback in my terminal still has this for “proof”: sesse@gruessi:~/addie$ g++-4.8 -O2 -march=native -o addie addie.cc /tmp/ccJweT2R.s: Assembler messages: /tmp/ccJweT2R.s:82: Error: operand size mismatch for `bzhi' Sorry I couldn't be more helpful. :-) /* Steinar */
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #11 from sgunderson at bigfoot dot com --- On Thu, Jun 27, 2013 at 12:32:18PM +, sgunderson at bigfoot dot com wrote: Sorry, the code is a) not so easy to make public right now, and b) this particular edit has been lost in the mists of time (like I said, I wrote it slightly differently and then it was gone). But the scrollback in my termin al still has this for “proof”: Hah, I reproduced it. I'll try to distill it down to a small test case. /* Steinar */
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #12 from sgunderson at bigfoot dot com --- Created attachment 30389 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30389action=edit BZHI bug example (compile with g++-4.8 -O2 -mbmi2 -c foo.cc)
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #13 from sgunderson at bigfoot dot com --- There. That ought to satisfy your curiosity. :-) I get: sesse@gruessi:~/addie$ g++-4.8 -O2 -mbmi2 -c foo.cc /tmp/ccX2oEfE.s: Assembler messages: /tmp/ccX2oEfE.s:21: Error: operand size mismatch for `bzhi' due to bzhi_ZL5shift(,%rax,8), %rdx, %rdx
[Bug target/57623] New: BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 Bug ID: 57623 Summary: BEXTR intrinsic has memory operands switched around (fails to compile code) Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Hi, Given I'm on gcc 4.8.1 (Debian 4.8.1-2). Given the following test program: sesse@gruessi:~$ cat bextr-test.c #include stdint.h uint64_t func(uint64_t x, uint64_t *y) { return __builtin_ia32_bextr_u64(x, *y); } trying to compile it fails: sesse@gruessi:~$ gcc-4.8 -O2 -mbmi -c bextr-test.c --save-temps bextr-test.s: Assembler messages: bextr-test.s:9: Error: operand size mismatch for `bextr' seemingly because GCC's idea of r/m is broken for this instruction: sesse@gruessi:~$ cat bextr-test.s .filebextr-test.c .text .p2align 4,,15 .globlfunc .typefunc, @function func: .LFB0: .cfi_startproc bextr(%rsi), %rdi, %rax ret .cfi_endproc .LFE0: .sizefunc, .-func .identGCC: (Debian 4.8.1-2) 4.8.1 .section.note.GNU-stack,,@progbits As far as I understand, the second operand can be r/m64, but the first can only be r64.
[Bug target/57624] New: BZHI instrinsic is missing
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57624 Bug ID: 57624 Summary: BZHI instrinsic is missing Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sgunderson at bigfoot dot com Hi, The GCC documentation (http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html) claims there should be such an intrinsic, added in gcc 4.7: unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) Yet, with gcc 4.8.1 (Debian 4.8.1-2), nothing of the sort exists: sesse@gruessi:~$ gcc-4.8 -Wall -O2 -mbmi2 -c bzhi-test.c bzhi-test.c: In function ‘func’: bzhi-test.c:5:2: warning: implicit declaration of function ‘_bzhi_u64’ [-Wimplicit-function-declaration] return _bzhi_u64(x, y); ^ A function call is generated instead, which was obviously not what I intended. :-) I thought this was maybe just a documentation error, but __builtin_ia32_bzhi_u64 also does not exist.
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #2 from sgunderson at bigfoot dot com --- On Sat, Jun 15, 2013 at 04:33:14PM +, jakub at gcc dot gnu.org wrote: The fix for the compiler is easy, but at least the AVX2 spec documents that _bextr_u{32,64} intrinsics actually take 3 arguments (source, start and length), with the latter two always unsigned int, while our intrinsic has only two arguments (where the latter is expected to be (start 255) | (length 8)). Not sure if we want to change this, and if so, just for 4.9+, or also for 4.8.2+ and 4.7.4+? If you decide to change it, at least consider keeping the old version around; for instance, the start/length combination could come from a table. In general, if you actually have to do shifting and stuff to create this operand, the gain of the instruction is already lost. /* Steinar */
[Bug target/57623] BEXTR intrinsic has memory operands switched around (fails to compile code)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57623 --- Comment #4 from sgunderson at bigfoot dot com --- On Sat, Jun 15, 2013 at 05:10:57PM +, jakub at gcc dot gnu.org wrote: If both start and length are constants, then it will be folded by the compiler, similarly if you use it inside of loop and start/length will be loop invariants, the computation can be hoisted out of the loop. Sure, but again, neither of these match my situation. I really need to do a lookup into a table (with a non-constant index) to get the value. /* Steinar */
[Bug tree-optimization/55155] New: Autovectorization does not use unaligned loads/stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55155 Bug #: 55155 Summary: Autovectorization does not use unaligned loads/stores Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I am on gcc version 4.7.1 (Debian 4.7.1-7) and a project of mine had code that looked like this: beklager:~ cat example.cpp void func(float * __restrict prod_features, float * __restrict grad_prod_features, float alpha, unsigned num_prods) { float *pf = (float *)__builtin_assume_aligned(prod_features, 16); float *gpf = (float *)__builtin_assume_aligned(grad_prod_features, 16); for (unsigned i = 0; i num_prods * 16; ++i) { prod_features[i] -= alpha * grad_prod_features[i]; //pf[i] -= alpha * gpf[i]; } } This would seem like a great case for autovectorization, so I tried: beklager:~ g++ -Wall -O2 -ftree-vectorize -msse4.1 -c example.cpp example.cpp: In function ‘void func(float*, float*, float, unsigned int)’: example.cpp:2:9: warning: unused variable ‘pf’ [-Wunused-variable] example.cpp:3:9: warning: unused variable ‘gpf’ [-Wunused-variable] The resulting code, however, is a train wreck: beklager:~ objdump --disassemble --demangle example.o example.o: file format elf64-x86-64 Disassembly of section .text: func(float*, float*, float, unsigned int): 0:55 push %rbp 1:c1 e2 04 shl$0x4,%edx 4:85 d2test %edx,%edx 6:53 push %rbx 7:0f 84 ef 00 00 00je fc func(float*, float*, float, unsigned int)+0xfc d:49 89 f8 mov%rdi,%r8 10:41 83 e0 0f and$0xf,%r8d 14:49 c1 e8 02 shr$0x2,%r8 18:49 f7 d8 neg%r8 1b:41 83 e0 03 and$0x3,%r8d 1f:44 39 c2 cmp%r8d,%edx 22:44 0f 42 c2 cmovb %edx,%r8d 26:83 fa 04 cmp$0x4,%edx 29:0f 87 d0 00 00 00ja ff func(float*, float*, float, unsigned int)+0xff 2f:41 89 d0 mov%edx,%r8d 32:31 c0xor%eax,%eax 34:0f 1f 40 00 nopl 0x0(%rax) 38:f3 0f 10 14 86 movss (%rsi,%rax,4),%xmm2 3d:8d 48 01 lea0x1(%rax),%ecx 40:f3 0f 59 d0 mulss %xmm0,%xmm2 44:f3 0f 10 0c 87 movss (%rdi,%rax,4),%xmm1 49:f3 0f 5c ca subss %xmm2,%xmm1 4d:f3 0f 11 0c 87 movss %xmm1,(%rdi,%rax,4) 52:48 83 c0 01 add$0x1,%rax 56:41 39 c0 cmp%eax,%r8d 59:77 ddja 38 func(float*, float*, float, unsigned int)+0x38 5b:44 39 c2 cmp%r8d,%edx 5e:0f 84 98 00 00 00je fc func(float*, float*, float, unsigned int)+0xfc 64:89 d5mov%edx,%ebp 66:45 89 c1 mov%r8d,%r9d 69:44 29 c5 sub%r8d,%ebp 6c:41 89 eb mov%ebp,%r11d 6f:41 c1 eb 02 shr$0x2,%r11d 73:42 8d 1c 9d 00 00 00 lea0x0(,%r11,4),%ebx 7a:00 7b:85 dbtest %ebx,%ebx 7d:74 59je d8 func(float*, float*, float, unsigned int)+0xd8 7f:0f 28 c8 movaps %xmm0,%xmm1 82:49 c1 e1 02 shl$0x2,%r9 86:0f 57 db xorps %xmm3,%xmm3 89:4e 8d 14 0f lea(%rdi,%r9,1),%r10 8d:0f c6 c9 00 shufps $0x0,%xmm1,%xmm1 91:49 01 f1 add%rsi,%r9 94:31 c0xor%eax,%eax 96:45 31 c0 xor%r8d,%r8d 99:0f 28 e1 movaps %xmm1,%xmm4 9c:0f 1f 40 00 nopl 0x0(%rax) a0:0f 28 cb movaps %xmm3,%xmm1 a3:41 83 c0 01 add$0x1,%r8d a7:41 0f 28 14 02 movaps (%r10,%rax,1),%xmm2 ac:41 0f 12 0c 01 movlps (%r9,%rax,1),%xmm1 b1:41 0f 16 4c 01 08movhps 0x8(%r9,%rax,1),%xmm1 b7:0f 59 cc mulps %xmm4,%xmm1 ba:0f 5c d1 subps %xmm1,%xmm2 bd:41 0f 29 14 02 movaps %xmm2,(%r10,%rax,1) c2:48 83 c0 10 add$0x10,%rax c6:45 39 d8 cmp%r11d,%r8d c9:72 d5jb a0 func(float*, float*, float, unsigned int)+0xa0 cb:01 d9add%ebx,%ecx cd:39 ddcmp%ebx,%ebp cf:74 2bje fc
[Bug target/54589] struct offset add should be folded into address calculation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 --- Comment #2 from sgunderson at bigfoot dot com 2012-09-17 09:18:16 UTC --- FWIW, in my original code, func() is a part of a loop body (it keeps reading values from src in a loop). It doesn't really change anything in the generated code, though.
[Bug tree-optimization/54589] New: [missed-optimization] struct offset add should be folded into address calculation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589 Bug #: 54589 Summary: [missed-optimization] struct offset add should be folded into address calculation Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I found this in 4.4 (Ubuntu 10.04), and have confirmed it's still there in gcc (Debian 20120820-1) 4.8.0 20120820 (experimental) [trunk revision 190537] This code: #include emmintrin.h struct param { int a, b, c, d; __m128i array[256]; }; void func(struct param *p, unsigned char *src, int *dst) { __m128i x = p-array[*src]; *dst = _mm_cvtsi128_si32(x); } compiles with -O2 on x86-64 to this assembler: func: 0:0f b6 06 movzbl (%rsi),%eax 3:48 83 c0 01 add$0x1,%rax 7:48 c1 e0 04 shl$0x4,%rax b:8b 04 07 mov(%rdi,%rax,1),%eax e:89 02mov%eax,(%rdx) 10:c3 retq The add should be folded into the address calculation here. (The shl can't, because it's too big.) Curiously enough, if I misalign the struct element by removing c and d, and declaring the struct __attribute__((packed)), GCC will do that; the mov will then be from $8(%rdi,%rax,1),%eax and there is no redundant add.
[Bug tree-optimization/54592] New: [4.8 Regression] [missed-optimization] Cannot fuse SSE move and add together
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54592 Bug #: 54592 Summary: [4.8 Regression] [missed-optimization] Cannot fuse SSE move and add together Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I have, on x86-64, gcc version 4.7.1 (Debian 4.7.1-9) gcc version 4.8.0 20120820 (experimental) [trunk revision 190537] (Debian 20120820-1) Given the following test program: #include emmintrin.h void func(__m128i *foo, size_t a, size_t b, int *dst) { __m128i x = foo[a]; __m128i y = foo[b]; __m128i sum = _mm_add_epi32(x, y); *dst = _mm_cvtsi128_si32(sum); } GCC 4.8 with -O2 compiles it to 0:48 c1 e6 04 shl$0x4,%rsi 4:48 c1 e2 04 shl$0x4,%rdx 8:66 0f 6f 0c 17 movdqa (%rdi,%rdx,1),%xmm1 d:66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0 12:66 0f fe c1 paddd %xmm1,%xmm0 16:66 0f 7e 01 movd %xmm0,(%rcx) 1a:c3 retq The mov into %xmm1 here doesn't seem to make sense; it should rather be paddd-ed in directly. And indeed, GCC 4.7 with -O2 gets this right: 0:48 c1 e6 04 shl$0x4,%rsi 4:48 c1 e2 04 shl$0x4,%rdx 8:66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0 d:66 0f fe 04 17 paddd (%rdi,%rdx,1),%xmm0 12:66 0f 7e 01 movd %xmm0,(%rcx) 16:c3 retq This would seem like a regression to me.
[Bug target/42778] Superfluous stack management code is generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42778 sgunderson at bigfoot dot com changed: What|Removed |Added CC||sgunderson at bigfoot dot ||com --- Comment #3 from sgunderson at bigfoot dot com 2012-09-15 16:02:37 UTC --- This seems to be no longer wrong in 4.8.
[Bug target/54593] New: [missed-optimization] Move from SSE to integer register goes through the stack without -march=native
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 Bug #: 54593 Summary: [missed-optimization] Move from SSE to integer register goes through the stack without -march=native Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I have reproduced this on 4.4, 4.6, 4.7 and 4.8 (Debian 20120820-1, trunk version 190537). Given the following code: #include x86intrin.h int test1(__m128i v) { return _mm_cvtsi128_si32(v); } GCC generates 0:66 0f 7e 44 24 f4movd %xmm0,-0xc(%rsp) 6:8b 44 24 f4 mov-0xc(%rsp),%eax a:c3 retq Shouldn't it go directly to %eax instead of through the stack? Granted, on Netburst this takes ten cycles or so, but this is x86-64. It appears to be some sort of tuning issue, since if I use -mtune=native (I am on an Atom) I get: 0:66 0f 7e c0 movd %xmm0,%eax 4:90 nop 5:90 nop 6:90 nop 7:90 nop 8:90 nop 9:90 nop a:c3 retq which is sort-of what I expect. Well, the NOPs are a bit weird, but... :-)
[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 --- Comment #2 from sgunderson at bigfoot dot com 2012-09-15 16:38:34 UTC --- Interesting. So it's a conscious choice that “generic” does this?
[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 --- Comment #4 from sgunderson at bigfoot dot com 2012-09-15 16:54:28 UTC --- I'm not sure if I understand the comment very well; it talks about Pentium 4, but none of them run 64-bit code, do they?
[Bug target/54593] [missed-optimization] Move from SSE to integer register goes through the stack without -march=native
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 --- Comment #6 from sgunderson at bigfoot dot com 2012-09-15 20:28:02 UTC --- Ah. So basically it hurts AMD enough (the opposite doesn't hit Intel enough) that the choice was made to make it that way generic too. Well, as long as it's a deliberate choice, I assume it's a reasonable tradeoff, so thanks for the enlightenment. :-)
[Bug tree-optimization/51513] New: [missed optimization] Only partially optimizes away unreachable switch default case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51513 Bug #: 51513 Summary: [missed optimization] Only partially optimizes away unreachable switch default case Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I have code that looks like this: pannekake:~ cat test.c void foo(); void bar(); void baz(); void func(int i) { switch (i) { case 0: foo(); break; case 1: bar(); break; case 2: baz(); break; case 3: baz(); break; case 4: bar(); break; case 5: foo(); break; case 6: foo(); break; case 7: bar(); break; case 8: baz(); break; case 9: baz(); break; case 10: bar(); break; default: __builtin_unreachable(); break; } } Compiling this yields: pannekake:~ gcc-4.6 -O2 -c test.c objdump --disassemble test.o test.o: file format elf64-x86-64 Disassembly of section .text: func: 0:83 ff 0a cmp$0xa,%edi 3:76 03jbe8 func+0x8 5:0f 1f 00 nopl (%rax) 8:89 ffmov%edi,%edi a:31 c0xor%eax,%eax c:ff 24 fd 00 00 00 00 jmpq *0x0(,%rdi,8) 13:0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 18:e9 00 00 00 00 jmpq 1d func+0x1d 1d:0f 1f 00 nopl (%rax) 20:e9 00 00 00 00 jmpq 25 func+0x25 25:0f 1f 00 nopl (%rax) 28:e9 00 00 00 00 jmpq 2d func+0x2d The first compare is, as you can see, unneeded; the code for the default case itself (a repz ret) has been optimized away due to the __builtin_unreachable(), but the compare and branch remains. I've also seen it sometimes be able to remove the jump instruction itself, but not the compare.
[Bug tree-optimization/51513] [missed optimization] Only partially optimizes away unreachable switch default case
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51513 --- Comment #1 from sgunderson at bigfoot dot com 2011-12-12 10:54:16 UTC --- Forgot this: pannekake:~ gcc-4.6 -v Using built-in specs. COLLECT_GCC=gcc-4.6 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.6.2 (Debian 4.6.2-5)
[Bug tree-optimization/49872] Missed optimization: Could coalesce neighboring memsets
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49872 --- Comment #2 from sgunderson at bigfoot dot com 2011-07-28 10:09:51 UTC --- I'm not sure if I've seen exactly this construction in real-world code, but I've certainly seen examples of the hybrid I sketched out (looking at one was what motivated me to file the bug), ie. something like: struct S { int f[1024]; int g; }; void func(struct S* s) { memset(s-f, 0, sizeof(s-f)); s-g = 0; } which I would argue should be rewritten to void func(struct S* s) { memset(s-f, 0, sizeof(s-f) + sizeof(s-g)); } I'd argue that programmers should not be doing this kind of optimization themselves, since it's very prone to break when changing the structure, especially as alignment etc. comes into play.
[Bug target/49865] New: Unneccessary reload causes small size regression from 4.6.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865 Summary: Unneccessary reload causes small size regression from 4.6.1 Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Target: i?86-*-* Comparing 4.6.1 with gcc-snapshot from Debian: gcc version 4.7.0 20110709 (experimental) [trunk revision 176106] (Debian 20110709-1) Given this code: fugl:~ cat test.cpp #include string.h class MyClass { void func(); float f[1024]; int i; }; void MyClass::func() { memset(f, 0, sizeof(f)); i = 0; } and compiling with fugl:~ /usr/lib/gcc-snapshot/bin/g++ -Os -c test.cpp g++ produces, according to objdump: _ZN7MyClass4funcEv: 0:55 push %ebp 1:31 c0xor%eax,%eax 3:89 e5mov%esp,%ebp 5:b9 00 04 00 00 mov$0x400,%ecx a:57 push %edi b:8b 7d 08 mov0x8(%ebp),%edi e:f3 abrep stos %eax,%es:(%edi) 10:8b 45 08 mov0x8(%ebp),%eax 13:c7 80 00 10 00 00 00 movl $0x0,0x1000(%eax) 1a:00 00 00 1d:5f pop%edi 1e:5d pop%ebp 1f:c3 ret while 4.6.1 has a more efficient sequence: _ZN7MyClass4funcEv: 0:55 push %ebp 1:b9 00 04 00 00 mov$0x400,%ecx 6:89 e5mov%esp,%ebp 8:31 c0xor%eax,%eax a:8b 55 08 mov0x8(%ebp),%edx d:57 push %edi e:89 d7mov%edx,%edi 10:f3 abrep stos %eax,%es:(%edi) 12:c7 82 00 10 00 00 00 movl $0x0,0x1000(%edx) 19:00 00 00 1c:5f pop%edi 1d:5d pop%ebp 1e:c3 ret It seems 4.6 is able to take a copy of the this pointer from a register before the rep stos operation, which is one byte smaller than reloading it from the stack when it needs to clear i. Of course, the _most_ efficient code sequence here would be doing the i = 0 before the memset, but I'm not sure if this is legal. However, eax should still contain zero, so the mov could be done from eax instead of from a constant.
[Bug target/49865] Unnecessary reload causes small size regression from 4.6.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865 --- Comment #1 from sgunderson at bigfoot dot com 2011-07-27 11:38:57 UTC --- (In reply to comment #0) Of course, the _most_ efficient code sequence here would be doing the i = 0 before the memset, but I'm not sure if this is legal. However, eax should still contain zero, so the mov could be done from eax instead of from a constant. Actually, thinking about it, the most efficient code sequence would be just giving 4100 to memset instead of 4096, but that's for an enhancement request at some point.
[Bug tree-optimization/49872] New: Missed optimization: Could coalesce neighboring memsets
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49872 Summary: Missed optimization: Could coalesce neighboring memsets Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Given the following code: #include string.h struct S { int f[1024]; int g[1024]; }; void func(struct S* s) { memset(s-f, 0, sizeof(s-f)); memset(s-g, 0, sizeof(s-g)); } GCC currently generates two memsets. The code with -O2 is a bit hard to read, so I'm just pasting the -Os assembly for clarity: func: 0:55 push %ebp 1:31 c0xor%eax,%eax 3:89 e5mov%esp,%ebp 5:b9 00 04 00 00 mov$0x400,%ecx a:57 push %edi b:8b 7d 08 mov0x8(%ebp),%edi e:f3 abrep stos %eax,%es:(%edi) 10:8b 55 08 mov0x8(%ebp),%edx 13:66 b9 00 04 mov$0x400,%cx 17:81 c2 00 10 00 00add$0x1000,%edx 1d:89 d7mov%edx,%edi 1f:f3 abrep stos %eax,%es:(%edi) 21:5f pop%edi 22:5d pop%ebp 23:c3 ret Ideally GCC should also be able to coalesce this together with memsets not written as memset, e.g. s-g[0] = 0;.
[Bug target/49865] [4.7 Regression] Unnecessary reload causes small bloat
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865 --- Comment #2 from sgunderson at bigfoot dot com 2011-07-27 17:28:19 UTC --- (In reply to comment #1) Actually, thinking about it, the most efficient code sequence would be just giving 4100 to memset instead of 4096, but that's for an enhancement request at some point. Filed as bug #49872.
[Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715 Summary: Could do more efficient unsigned-to-float to conversions based on range information Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com I have code that looks vaguely like this: float func(unsigned x) { return (x 0xf) * 0.01f; } When I compile it, GCC gives a long and relatively slow sequence: fugl:~ gcc-4.6 -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc-4.6 COLLECT_LTO_WRAPPER=/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/lto-wrapper Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.1-3' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-multiarch --with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/i386-linux-gnu --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib/i386-linux-gnu --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.6.1 (Debian 4.6.1-3) fugl:~ gcc-4.6 -O2 -march=pentium3 -msse2 -mfpmath=sse -c test.c fugl:~ objdump --disassemble test.o test.o: file format elf32-i386 Disassembly of section .text: func: 0:83 ec 04 sub$0x4,%esp 3:8b 54 24 08 mov0x8(%esp),%edx 7:89 d0mov%edx,%eax 9:81 e2 ff ff 00 00and$0x,%edx f:25 ff ff 0f 00 and$0xf,%eax 14:c1 e8 10 shr$0x10,%eax 17:f3 0f 2a c0 cvtsi2ss %eax,%xmm0 1b:f3 0f 2a ca cvtsi2ss %edx,%xmm1 1f:f3 0f 59 05 00 00 00 mulss 0x0,%xmm0 26:00 27:f3 0f 58 c1 addss %xmm1,%xmm0 2b:f3 0f 59 05 04 00 00 mulss 0x4,%xmm0 32:00 33:f3 0f 11 04 24 movss %xmm0,(%esp) 38:d9 04 24 flds (%esp) 3b:58 pop%eax 3c:c3 ret 3d:8d 76 00 lea0x0(%esi),%esi I assume this is because x is unsigned (I cannot easily change this, as I depend on wraparound). However, if I insert a cast to int after the and operation, I get the same results, and a much better sequence: 0040 func2: 40:83 ec 04 sub$0x4,%esp 43:8b 44 24 08 mov0x8(%esp),%eax 47:25 ff ff 0f 00 and$0xf,%eax 4c:f3 0f 2a c0 cvtsi2ss %eax,%xmm0 50:f3 0f 59 05 04 00 00 mulss 0x4,%xmm0 57:00 58:f3 0f 11 04 24 movss %xmm0,(%esp) 5d:d9 04 24 flds (%esp) 60:5a pop%edx 61:c3 ret In other words, the modified code looks like this: float func2(unsigned x) { return (int)(x 0xf) * 0.01f; } This should be possible for GCC to do when it has range information that says the sign bit cannot be set.
[Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715 --- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC --- Wow, answer in record time :-) I don't know anything about GCC internals, so I can't comment much on the patch; my only worry here is what would happen if you had a very narrow mask, e.g. (x 0xf) and you try to coerce it into the minimum possible type (a char); wouldn't you end up doing some sort of expansion with movzbl again?
[Bug target/49583] Reloading stack operands in the wrong order, so needs to insert fxch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49583 --- Comment #3 from sgunderson at bigfoot dot com 2011-07-03 17:20:11 UTC --- Hi, My bug report was (as you can see in the title) not about the fstps/fld sequence; it was about the extraneous fxch instructions. (My original code was with -ffast-math, but I didn't want to burden the example with too many flags.) In any case, even if I am to ignore “one or two”, how many fxch are too many? I can give you can code where there are _five_ (in what is a tight inner loop for me), just by expanding on the example in question.
[Bug tree-optimization/49583] New: Reloading stack operands in the wrong order, so needs to insert fxch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49583 Summary: Reloading stack operands in the wrong order, so needs to insert fxch Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Created attachment 24638 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24638 Minimal testcase Hi, It seems that when generating x87 code, GCC sometimes reloads items from the stack in the wrong order, and then goes to great lengths to swap them around. I have an example with six loads and six fxch instructions, but attached is a minimal example. Compiling with gcc version 4.6.1 (Debian 4.6.1-1) as follows: pannekake:~ gcc-4.6 -m32 -Wall -O2 -march=pentium3 -c fxch.c The odd sequence is around this: 41:d9 44 24 48 flds 0x48(%esp) 45:dd 5c 24 08 fstpl 0x8(%esp) 49:dd 14 24 fstl (%esp) 4c:d9 5c 24 10 fstps 0x10(%esp) 50:e8 fc ff ff ff call 51 process+0x51 55:d9 5c 24 1c fstps 0x1c(%esp) 59:d9 44 24 1c flds 0x1c(%esp) 5d:d9 44 24 10 flds 0x10(%esp) 61:d9 c9fxch %st(1) 63:d9 1c b7 fstps (%edi,%esi,4) 66:46 inc%esi 67:39 eecmp%ebp,%esi In particular, why did it use fstps immediately followed by flds of the same value? And if it really wants to reload (in my more complex example, it really needs to), why not just do the loads in the right order from the start instead of doing the fxch?
[Bug target/48139] __builtin_lrintf() becomes a library call, not an cvtss2si instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139 --- Comment #2 from sgunderson at bigfoot dot com 2011-03-16 12:03:40 UTC --- But the lrintf() man page says explicitly that these functions cannot set errno. Is this the man page being too glibc-specific, or something else?
[Bug target/48139] __builtin_lrintf() becomes a library call, not an cvtss2si instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139 --- Comment #6 from sgunderson at bigfoot dot com 2011-03-16 22:59:53 UTC --- (In reply to comment #5) So, there's no glibc bug, but I don't think this makes a compelling case for any particular gcc behavior. The implementation is gcc+glibc, so gcc could say that its implementation of lrint never sets errno, and all would be conforming. Or gcc could say that users will pick a libc based on whether they want errno to be set, and so it should emit the call. Or gcc could optimize lrint in C99 (where errno-setting is forbidden) but not in C1x (where it's allowed). Well, if C99/C1x _allows_ gcc to do this, I'd say it's a missed optimization opportunity not to. :-) FWIW, my code is C++. One local workaround is to set __attribute__((optimize(no-math-errno))) on the functions whose assembly contains the undesired call, but that's a bit fragile in the face of changing inlining decisions. Indeed; in my case, the function is pretty much guaranteed to get inlined, so I'd have to sprinkle those attributes around all the potential callers.
[Bug c/48139] New: __builtin_lrintf() becomes a library call, not an cvtss2si instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48139 Summary: __builtin_lrintf() becomes a library call, not an cvtss2si instruction Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, It seems there is no way on x86-64 (short of an asm() statement) to get direct access to the cvtss2si instruction, ie. convert a single float to an int in the current rounding mode. Even __builtin_lrintf() becomes a library call to glibc's lrintf(), which in itself only contains a single instruction and then a ret. (If I set -fno-math-errno, I do get the instruction, but this is unfortunately not an option for me.) I've been told that this may or may not be correct behavior; it's a bit unclear if lrintf() should set errno or not according to C99 and glibc's math_errhandling setting. I guess this either is a missed optimization in GCC _or_ a bug in glibc, though. It seems to me the former is more likely, though, given that the entire point of lrint() and friends seems to be being able to do quick float-to-int without having to deal with special code for NaN and the likes.
[Bug rtl-optimization/45670] New: Less efficient x86 addressing mode selection on 4.6, causes -Os size regression from 4.5
Hi, Given the following test C++ file: class Class { public: void func(); float *buf; int size; }; void Class::func() { for (int i = 0; i size; ++i) { buf[i] = 0; } } 4.6 (see below for exact version) will generate larger code (36 vs. 30 bytes) than 4.5.1 (Debian 4.5.1-6) given -Os. The output is Class::func(): 0: 55 push %ebp 1: 31 c0 xor%eax,%eax 3: 89 e5 mov%esp,%ebp 5: 8b 4d 08mov0x8(%ebp),%ecx 8: 53 push %ebx 9: 8b 59 04mov0x4(%ecx),%ebx c: eb 10 jmp1e Class::func()+0x1e e: 8d 14 85 00 00 00 00lea0x0(,%eax,4),%edx 15: 40 inc%eax 16: 03 11 add(%ecx),%edx 18: c7 02 00 00 00 00 movl $0x0,(%edx) 1e: 39 d8 cmp%ebx,%eax 20: 7c ec jl e Class::func()+0xe 22: 5b pop%ebx 23: 5d pop%ebp 24: c3 ret Basically the problem is that the lea is large (due to the zero immediate taking up 32 bits); 4.5 uses a variation where the address calculation takes both a base and an index register, which has a shorter form not requiring to store the zero. (The joys of x86; lea edx, [eax*4 + ecx] takes less space then lea edx, [eax*4]...) === Configured with: ../src/configure -v --with-pkgversion='Debian 20100828-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --disable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --with-plugin-ld=ld.gold --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --disable-werror --enable-checking=yes --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.6.0 20100828 (experimental) [trunk revision 163616] (Debian 20100828-1) -- Summary: Less efficient x86 addressing mode selection on 4.6, causes -Os size regression from 4.5 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sgunderson at bigfoot dot com GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45670
[Bug tree-optimization/38328] New: Massive performance regression for jpeg_idct_islow
First of all, I'm using Debian's gcc-snapshot package: gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian 20081117-1) Let me know if I should try to rebuild with another GCC version. I tested my image scaler (http://bzr.sesse.net/qscale/) and libjpeg with 4.4 vs. 4.3, and got the following oprofile graph for the same load in both cases. 4.3: samples %app name symbol name 5182 21.8484 libjpeg.so.62.0.0jpeg_idct_islow 5150 21.7135 libjpeg.so.62.0.0decode_mcu 3582 15.1025 qscale vscale 1237 5.2154 libjpeg.so.62.0.0jpeg_fill_bit_buffer 592 2.4960 qscale hscale 4.4: samples %app name symbol name 7054 31.9056 qscale jpeg_idct_islow 4401 19.9059 qscale decode_mcu 3584 16.2106 qscale vscale 1352 6.1152 qscale jpeg_fill_bit_buffer 606 2.7410 qscale hscale Note that decode_mcu is 17% faster (probably due to better register allocation), but jpeg_idct_islow is 36% slower! jpeg_fill_bit_buffer is also a tiny bit slower, but that's not as critical. (The overall effect is that the JPEG decoding as a whole runs slower.) I have not looked at the generated code, but it's definitely not good. FWIW, it's repeatable between runs -- the sample counts change very little (1-2%, perhaps). -- Summary: Massive performance regression for jpeg_idct_islow Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sgunderson at bigfoot dot com GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #2 from sgunderson at bigfoot dot com 2008-11-30 15:06 --- OK, I looked at the source. The issue here seems to be that 4.4 likes to compile this: z3 = ((z3) * (- ((INT32) 16069))); into this: 10 0.0403 : 805cc87: lea(%ecx,%ecx,4),%ebx : 805cc8a: lea(%ebx,%ebx,4),%ebx 20 0.0805 : 805cc8d: lea(%ebx,%ebx,4),%ebx 7 0.0282 : 805cc90: lea(%ecx,%ebx,2),%ebx 3 0.0121 : 805cc93: shl$0x4,%ebx 38 0.1530 : 805cc96: add%ecx,%ebx 8 0.0322 : 805cc98: lea(%ecx,%ebx,4),%esi 4.3 uses imul here, which is a lot faster. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #4 from sgunderson at bigfoot dot com 2008-11-30 20:32 --- Subject: Re: Massive performance regression for jpeg_idct_islow On Sun, Nov 30, 2008 at 04:23:31PM -, rguenth at gcc dot gnu dot org wrote: Which tuning are you using? Try enabling -mtune=generic (possibly by default). The compile flags are -g -O2 -D_REENTRANT, IIRC. No weird compile options. /* Steinar */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #6 from sgunderson at bigfoot dot com 2008-11-30 20:40 --- Subject: Re: Massive performance regression for jpeg_idct_islow On Sun, Nov 30, 2008 at 08:37:31PM -, rguenth at gcc dot gnu dot org wrote: --- Comment #5 from rguenth at gcc dot gnu dot org 2008-11-30 20:37 --- What is the gcc output if you append -v? fugl:~ /usr/lib/gcc-snapshot/bin/gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20081117-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,c++,java,fortran,objc,obj-c++,ada --prefix=/usr/lib/gcc-snapshot --enable-shared --with-system-zlib --disable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --disable-plugin --with-java-home=/usr/lib/gcc-snapshot/java-1.5.0-gcj-4.4-1.5.0.0/jre --enable-java-home --with-jvm-root-dir=/usr/lib/gcc-snapshot/jvm --with-jvm-jar-dir=/usr/lib/gcc-snapshot/jvm-exports --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-mpfr --enable-targets=all --enable-cld --disable-werror --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian 20081117-1) /* Steinar */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #8 from sgunderson at bigfoot dot com 2008-11-30 21:19 --- Subject: Re: Massive performance regression for jpeg_idct_islow On Sun, Nov 30, 2008 at 09:04:07PM -, rguenth at gcc dot gnu dot org wrote: Append -v to the command-line you use for compiling ;) Seriously, if using -mtune=generic works then this is a Debian packaging issue of their gcc-snapshot compiler. fugl:~/nmu/libjpeg6b-6b /usr/lib/gcc-snapshot/bin/gcc -D_REENTRANT -g -Wall -O2 -g -I. -c ./jidctint.c -fPIC -DPIC -o .libs/jidctint.o -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 20081117-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,c++,java,fortran,objc,obj-c++,ada --prefix=/usr/lib/gcc-snapshot --enable-shared --with-system-zlib --disable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --disable-plugin --with-java-home=/usr/lib/gcc-snapshot/java-1.5.0-gcj-4.4-1.5.0.0/jre --enable-java-home --with-jvm-root-dir=/usr/lib/gcc-snapshot/jvm --with-jvm-jar-dir=/usr/lib/gcc-snapshot/jvm-exports --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-mpfr --enable-targets=all --enable-cld --disable-werror --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.4.0 20081117 (experimental) [trunk revision 141948] (Debian 20081117-1) COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC' '-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486' /usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/cc1 -quiet -v -I. -D_REENTRANT -DPIC ./jidctint.c -quiet -dumpbase jidctint.c -mtune=i486 -auxbase-strip .libs/jidctint.o -g -g -O2 -Wall -version -fPIC -o /tmp/cc5hqg0m.s ignoring nonexistent directory /usr/local/include/i486-linux-gnu ignoring nonexistent directory /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../../i486-linux-gnu/include ignoring nonexistent directory /usr/include/i486-linux-gnu #include ... search starts here: #include ... search starts here: . /usr/local/include /usr/lib/gcc-snapshot/include /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/include /usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/include-fixed /usr/include End of search list. GNU C (Debian 20081117-1) version 4.4.0 20081117 (experimental) [trunk revision 141948] (i486-linux-gnu) compiled by GNU C version 4.4.0 20081117 (experimental) [trunk revision 141948], GMP version 4.2.2, MPFR version 2.3.2. GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 445209552aa2d93e7e967b7473e83cd6 COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC' '-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486' as -V -Qy -o .libs/jidctint.o /tmp/cc5hqg0m.s GNU assembler version 2.18.0 (i486-linux-gnu) using BFD version (GNU Binutils for Debian) 2.18.0.20080103 COMPILER_PATH=/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/libexec/gcc/i486-linux-gnu/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc-snapshot/lib/gcc/i486-linux-gnu/4.4.0/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-D_REENTRANT' '-g' '-Wall' '-O2' '-g' '-I.' '-c' '-fPIC' '-DPIC' '-o' '.libs/jidctint.o' '-v' '-mtune=i486' -mtune=generic still produces these long series of leas. /* Steinar */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #9 from sgunderson at bigfoot dot com 2008-11-30 21:22 --- Subject: Re: Massive performance regression for jpeg_idct_islow On Sun, Nov 30, 2008 at 09:19:08PM -, sgunderson at bigfoot dot com wrote: -mtune=generic still produces these long series of leas. Sorry, I objdumped the wrong file. -mtune=generic appears to fix it (although I haven't checked the performance). /* Steinar */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328
[Bug tree-optimization/38328] Massive performance regression for jpeg_idct_islow
--- Comment #11 from sgunderson at bigfoot dot com 2008-11-30 22:48 --- Subject: Re: Massive performance regression for jpeg_idct_islow On Sun, Nov 30, 2008 at 09:29:29PM -, rguenth at gcc dot gnu dot org wrote: so it uses -mtune=i486 - this optimizes the multiplication for i486 where imul is slow. The difference to 4.3 is a packaging issue in Debian. Thanks! I'll file a bug against the package. /* Steinar */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38328