[Bug target/87455] sse_packed_single_insn_optimal is suboptimal on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455 --- Comment #6 from Fanael --- Any hope of getting this fixed in GCC 10? It should just be a matter of removing Zen[12] from X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL.
[Bug target/87455] sse_packed_single_insn_optimal is suboptimal on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455 --- Comment #5 from Fanael --- Created attachment 44829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44829=edit WIP patch > We already have TARGET_SSE_TYPELESS_STORES for stores, so perhaps we want > something like typeless reg-reg moves and loads flag? Something along the lines of the attached patch? It's work-in-progress and only very lightly tested, but appears to work.
[Bug target/87455] sse_packed_single_insn_optimal is suboptimal on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455 --- Comment #3 from Fanael --- > May be we should remove xorps generation part. If it were up to me, I'd keep to for BDVER[1234] only, because xorps is still one byte shorted than either xorpd or pxor and is as fast there, and introduce a separate tune option for untyped vector *moves* specifically, which would apply to BD, but also Zen, Pentium M, Core, Skylake (but not anything in between, i.e. Nehalem to Broadwell (though my data on Ivy Bridge, Haswell and Broadwell is not conclusive)) and other µarches where register-to-register vector moves are renamed (as in Zen), untyped (as in Skylake) or always of the same type (as in Core).
[Bug target/87455] sse_packed_single_insn_optimal is suboptimal on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455 --- Comment #1 from Fanael --- Assembly diff between the two: --- /dev/fd/63 2018-09-27 17:59:06.120507763 +0200 +++ /dev/fd/62 2018-09-27 17:59:06.120507763 +0200 @@ -7,21 +7,21 @@ main: .LFB5179: .cfi_startproc - movaps .LC0(%rip), %xmm0 - movaps .LC1(%rip), %xmm1 + movdqa .LC0(%rip), %xmm0 + movdqa .LC1(%rip), %xmm1 movl$10, %eax - movaps .LC2(%rip), %xmm2 + movdqa .LC2(%rip), %xmm2 .p2align 4,,15 .L2: paddd %xmm2, %xmm1 - xorps %xmm0, %xmm2 + pxor%xmm0, %xmm2 decl%eax paddd %xmm1, %xmm0 - movaps %xmm2, %xmm3 - xorps %xmm2, %xmm1 + movdqa %xmm2, %xmm3 + pxor%xmm2, %xmm1 paddd %xmm0, %xmm3 - xorps %xmm1, %xmm0 - movaps %xmm3, %xmm2 + pxor%xmm1, %xmm0 + movdqa %xmm3, %xmm2 jne .L2 movaps %xmm3, -40(%rsp) movaps %xmm1, -56(%rsp)
[Bug target/87455] New: sse_packed_single_insn_optimal is suboptimal on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455 Bug ID: 87455 Summary: sse_packed_single_insn_optimal is suboptimal on Zen Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fanael4 at gmail dot com Target Milestone: --- GCC by default enables -mtune-ctrl=sse_packed_single_insn_optimal on -mtune=znver1, even though that microarchitecture doesn't like it for the same reason Intel's microarchitectures don't: there's additional latency for domain crossing operations, using e.g. xorps for integer data costs one cycle more than using pxor. Example code: #include int main() { auto x = _mm_setr_epi32(1, 2, 3, 4); auto y = _mm_setr_epi32(5, 6, 7, 8); auto z = _mm_setr_epi32(9, 10, 11, 12); for(int i = 0; i < 10; ++i) { x = _mm_add_epi32(x, y); y = _mm_xor_si128(y, z); z = _mm_add_epi32(z, x); x = _mm_xor_si128(x, y); y = _mm_add_epi32(y, z); z = _mm_xor_si128(z, x); } asm volatile("" :: "m"(x), "m"(y), "m"(z)); } Compiled with GCC 8.2, with -O3 -mtune=znver1 running it yields the following perf counters: $ perf stat -e task-clock,cycles,instructions ./a.out Performance counter stats for './a.out': 1 193,69 msec task-clock:u #0,989 CPUs utilized 4 040 330 384 cycles:u # 3386697,723 GHz 10 002 005 027 instructions:u#2,48 insn per cycle 1,206801245 seconds time elapsed 1,190625000 seconds user 0,003995000 seconds sys However, the code compiled with -O3 -mtune=znver1 -mtune-ctrl=^sse_packed_single_insn_optimal is significantly faster: $ perf stat -e task-clock,cycles,instructions ./a.out Performance counter stats for './a.out': 894,08 msec task-clock:u #0,998 CPUs utilized 3 012 492 242 cycles:u # 3369678,123 GHz 10 002 004 492 instructions:u#3,32 insn per cycle 0,895728255 seconds time elapsed 0,894688000 seconds user 0,0 seconds sys This is on a Ryzen 5 2500U.
[Bug target/66655] [5/6 Regression] miscompilation due to ipa-ra on MinGW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66655 Fanael changed: What|Removed |Added CC||fanael4 at gmail dot com --- Comment #16 from Fanael --- The same linker errors happen when building a mingw toolchain.
[Bug c++/61465] New: Bogus parameter set but not used warning in constructor initialization list
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61465 Bug ID: 61465 Summary: Bogus parameter set but not used warning in constructor initialization list Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: fanael4 at gmail dot com When compiling the following code with C++11 and -Wunused-but-set-parameter on struct Foo { Foo(void* x) : y{static_castchar*(x)} {} char* y; }; GCC (tested on 4.7.0 and 4.9.1) incorrectly complains: x.cpp:2:13: warning: parameter 'x' set but not used [-Wunused-but-set-parameter] Which is clear nonsense, as the parameter is used. If the initialization list is changed to use parentheses, or the static_cast is removed and y's type is changed, the warning goes away.
[Bug bootstrap/60830] [4.9 Regression] ICE on bootstrapping on cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60830 --- Comment #48 from Fanael fanael4 at gmail dot com --- Is revision 209946 an attempt to fix this?
[Bug libgcc/61003] New: [4.9 Regression] Segfault in __deregister_frame_info_bases when exiting, on i686-mingw32 with dw2 unwinding
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61003 Bug ID: 61003 Summary: [4.9 Regression] Segfault in __deregister_frame_info_bases when exiting, on i686-mingw32 with dw2 unwinding Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: critical Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: fanael4 at gmail dot com Target: i686-*-mingw32 When targeting i686-mingw32 all programs compiled with GCC with DWARF2 unwinding, including GCC itself, segfault when exiting. The segfaults are null pointer references in __deregister_frame_info_bases, called from atexit handler runner. Tested on 4.9.1 r. 209891. Didn't test the trunk. Backtrace (from Wine, on real Windows it's very similar): #0 0x010b7810 in __deregister_frame_info_bases (begin=0x7ed5bb6f MSVCRT__cexit+127) at ../../../../src/gcc/libgcc/unwind-dw2-fde.c:216 #1 0x05c2fd38 in ?? () #2 0x7ed5bb6f in MSVCRT__cexit () from /usr/bin/../lib32/wine/msvcrt.dll.so #3 0x7ed5be3e in MSVCRT_exit () from /usr/bin/../lib32/wine/msvcrt.dll.so #4 0x004014c3 in __tmainCRTStartup () #5 0x7b86041c in call_process_entry () from /usr/bin/../lib32/wine/kernel32.dll.so #6 0x7b861563 in ExitProcess () from /usr/bin/../lib32/wine/kernel32.dll.so #7 0x7bc80490 in call_thread_func_wrapper () from /usr/bin/../lib32/wine/ntdll.dll.so #8 0x7bc834cf in call_thread_func () from /usr/bin/../lib32/wine/ntdll.dll.so #9 0x7bc8046e in RtlRaiseException () from /usr/bin/../lib32/wine/ntdll.dll.so #10 0x7bc53931 in call_dll_entry_point () from /usr/bin/../lib32/wine/ntdll.dll.so #11 0xf7559b6d in wine_call_on_stack () from /usr/bin/../lib32/libwine.so.1 #12 0xf7559c4e in wine_switch_to_stack () from /usr/bin/../lib32/libwine.so.1
[Bug libgcc/61003] [4.9 Regression] Segfault in __deregister_frame_info_bases when exiting, on i686-mingw32 with dw2 unwinding
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61003 --- Comment #1 from Fanael fanael4 at gmail dot com --- Note: I was compiling with -O2, so the line number may not be very indicative. Should I post a backtrace of -O0?
[Bug libstdc++/59665] New: User code can cause ambiguous references to std in libstdc++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59665 Bug ID: 59665 Summary: User code can cause ambiguous references to std in libstdc++ Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: fanael4 at gmail dot com The following PURE EVIL but legal code: namespace foo { namespace std {} } using namespace foo; #include algorithm causes the compiler to spew out lots of error: reference to 'std' is ambiguous errors deep in the bowels of libstdc++, for rather obvious reasons. Granted, this is code is worse than reload, but it's legal, and as such shouldn't break the standard library.
[Bug c++/58924] Non-member invocation of overload of operator when the first argument is a temporary of type std::stringstream
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58924 Fanael fanael4 at gmail dot com changed: What|Removed |Added CC||fanael4 at gmail dot com --- Comment #1 from Fanael fanael4 at gmail dot com --- That's expected behavior AFAIU. 'operator(basic_ostreamcharT, traits os, const T x)' is a better match for const char[K] than 'basic_ostreamcharT,traits basic_ostreamcharT,traits::operator(const void* p)', hence the former gets called, which then forwards the arguments to 'operator(basic_ostreamcharT, traits os, const charT* x)'.
[Bug c++/58924] Non-member invocation of overload of operator when the first argument is a temporary of type std::stringstream
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58924 --- Comment #2 from Fanael fanael4 at gmail dot com --- Er, 'operator(basic_ostreamcharT, traits os, const charT* x)', without the r-value ref, of course.
[Bug c++/58063] default arguments evaluated twice per call
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58063 Fanael fanael4 at gmail dot com changed: What|Removed |Added CC||fanael4 at gmail dot com --- Comment #10 from Fanael fanael4 at gmail dot com --- My attempt at reducing: struct basic_ios { virtual ~basic_ios() {} bool operator!() const { return false; } }; struct ostream : virtual basic_ios { ostream() {} ~ostream() {} private: ostream(const ostream); ostream operator=(const ostream); }; ostream operator(ostream os, const char* s) { __builtin_printf(%s, s); return os; } ostream cout; void f(bool x = !(cout hi!\n)) { __builtin_printf(%d\n, static_castint(x)); } int main() { f(); } Seems like virtual inheritance is the culprit.
[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293 --- Comment #4 from Fanael fanael4 at gmail dot com --- (In reply to Vladimir Makarov from comment #1) But I am planning to fix it until end of June. Any progress on this one? Patching GCC to use Satan^H^H^H^H^Hreload is a workaround, but one I'd rather avoid if at all possible.
[Bug c++/57724] wrong error: returning a value from a constructor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57724 Fanael fanael4 at gmail dot com changed: What|Removed |Added CC||fanael4 at gmail dot com --- Comment #1 from Fanael fanael4 at gmail dot com --- 12.1/12 (C++03) and 12.1/9 (C++11) state: A return statement in the body of a constructor shall not specify a return value. It's a bug in Clang, not in GCC.
[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293 Fanael fanael4 at gmail dot com changed: What|Removed |Added Target|i686-w64-mingw32|i?86-*-* --- Comment #3 from Fanael fanael4 at gmail dot com --- Reproduced on x86_64-unknown-linux-gnu (*) with -m32, thus not related to MinGW itself. (*) $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /build/gcc-multilib/src/gcc-4.8-20130502/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu --disable-install-libiberty --enable-multilib --disable-libssp --disable-werror --enable-checking=release Thread model: posix gcc version 4.8.0 20130502 (prerelease) (GCC)
[Bug target/57293] New: [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293 Bug ID: 57293 Summary: [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?) Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fanael4 at gmail dot com About a week ago GCC started to emit frame pointers even in presence of -fomit-frame-pointer whenever there's a call to a function with a callee cleans up the arguments calling convention. Normally it wouldn't be a big deal, but IA-32 is seriously register-starved, and since one of the secondary platforms (i.e. Windows) uses __thiscall__ for C++ member functions by default, for some code the availability of ebp can make a noticeable difference. I can provide a benchmark if one is needed. The culprit seems to be revisions 198140 and 198555. Testcase: /* compile with -O2 -fomit-frame-pointer */ __attribute__((__noinline__, __noclone__, __stdcall__)) void g(int a) { __builtin_printf(in g(): %d\n, a); } __attribute__((__noinline__, __noclone__, __thiscall__)) void h(int a, int b) { __builtin_printf(in h(): %d %d\n, a, b); } void f() { g(0); h(0, 1); __builtin_puts(in f()); } What GCC 4.7.0 and 4.8.1 20130430 produce for f : _f: LFB2: .cfi_startproc subl$28, %esp .cfi_def_cfa_offset 32 movl$0, (%esp) call_g@4 .cfi_def_cfa_offset 28 xorl%ecx, %ecx subl$4, %esp .cfi_def_cfa_offset 32 movl$1, (%esp) call_h .cfi_def_cfa_offset 28 subl$4, %esp .cfi_def_cfa_offset 32 movl$LC2, (%esp) call_puts addl$28, %esp .cfi_def_cfa_offset 4 ret .cfi_endproc What 4.8.1 20130510 produces: _f: LFB2: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl%esp, %ebp .cfi_def_cfa_register 5 subl$24, %esp movl$0, (%esp) call_g@4 xorl%ecx, %ecx subl$4, %esp movl$1, (%esp) call_h subl$4, %esp movl$LC2, (%esp) call_puts leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc Target: i686-w64-mingw32 Configured with: ../../src/gcc-svn/configure --build=x86_64-unknown-linux-gnu --host=i686-w64-mingw32 --target=i686-w64-mingw32 --disable-multilib --disable-multiarch --disable-nls --enable-languages=c,c++,lto --disable-win32-registry --enable-openmp --enable-libgomp --enable-threads=posix --enable-plugins --enable-static --enable-shared --disable-symvers --enable-fully-dynamic-string --disable-sjlj-exceptions --disable-libstdcxx-pch --enable-libstdcxx-time --with-arch=i686 --enable-checking=release --disable-werror --with-gnu-as --with-gnu-ld --disable-rpath --with-gmp=/mingw/i686-final/libs-out-dir --with-mpfr=/mingw/i686-final/libs-out-dir --with-mpc=/mingw/i686-final/libs-out-dir --with-isl=/mingw/i686-final/libs-out-dir --with-cloog=/mingw/i686-final/libs-out-dir --with-libiconv-prefix=/mingw/i686-final/libs-out-dir --with-system-zlib --prefix=/mingw/i686-final/out-dir Thread model: posix gcc version 4.8.1 20130510 (prerelease) (GCC)
[Bug lto/55493] [4.8 Regression] LTO always ICEs on native i686-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 Fanael fanael4 at gmail dot com changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||WORKSFORME --- Comment #11 from Fanael fanael4 at gmail dot com 2013-02-27 16:39:56 UTC --- Revision 196313 appears to work.
[Bug lto/55493] [4.8 Regression] LTO always ICEs on i686-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 Fanael fanael4 at gmail dot com changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|WORKSFORME | --- Comment #7 from Fanael fanael4 at gmail dot com 2013-01-18 18:45:50 UTC --- Reopening. The build that worked for me was a cross compiler on x86_64-unknown-linux-gnu host. Native compiler on i686-mingw32 host still fails when trying to use LTO. So either I'm doing something terribly wrong (I think I'm not, I built everything from scratch with the very same compiler version this time), or there *is* a bug.
[Bug lto/55493] [4.8 Regression] LTO always ICEs on i686-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 --- Comment #6 from Fanael fanael4 at gmail dot com 2013-01-16 18:15:25 UTC --- Oh right. Works for me too. I should've tested it more thoroughly first. Sorry for bothering you guys.
[Bug lto/55493] [4.8 Regression] LTO always ICEs on i686-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 --- Comment #3 from Fanael fanael4 at gmail dot com 2013-01-09 14:19:09 UTC --- (In reply to comment #2) Are you sure that you do not somehow pull in LTO objects from older releases? Yes, happens even with -nostdlib, even on builds where I'm absolutely sure there's no way an old LTO object could sneak in.
[Bug lto/55493] New: [4.8 Regression] LTO always ICEs on i686-w64-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 Bug #: 55493 Summary: [4.8 Regression] LTO always ICEs on i686-w64-mingw32 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: fana...@gmail.com Revision 193777 ICEs with lto1.exe: internal compiler error: cannot read LTO decls from the object file when linking with -flto on i686-w64-mingw32. This happens for all inputs, even for int main() { return 0; } or (that is, an empty file). gcc -v Using built-in specs. COLLECT_GCC=C:\devel\mingw\bin\gcc.exe COLLECT_LTO_WRAPPER=c:/devel/mingw/bin/../libexec/gcc/i686-w64-mingw32/4.8.0/lto-wrapper.exe Target: i686-w64-mingw32 Configured with: ../../gcc-4.8-svn/configure --disable-nls --build=i686-w64-mingw32 --disable-multilib --enable-languages=c,c++,lto --disable-win32-registry --enable-openmp --enable-libgomp --enable-graphite --enable-cxx-flags='-fno-function-sections -fno-data-sections' --enable-threads=posix --disable-symvers --enable-fully-dynamic-string --disable-libstdcxx-pch --with-arch=i686 --with-tune=generic --enable-checking=release --disable-werror --disable-sjlj-exceptions --prefix=/c/builds/gcc/toolchain/out --with-gmp=/c/builds/gcc/prerequisites/out --with-mpfr=/c/builds/gcc/prerequisites/out --with-mpc=/c/builds/gcc/prerequisites/out --with-isl=/c/builds/gcc/prerequisites/out --with-cloog=/c/builds/gcc/prerequisites/out --with-libiconv-prefix=/c/builds/gcc/prerequisites/out --with-host-libstdcxx=-lstdc++ --enable-cloog-backend=isl Thread model: posix gcc version 4.8.0 20121124 (experimental) (GCC)
[Bug lto/55493] [4.8 Regression] LTO always ICEs on i686-w64-mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55493 --- Comment #1 from fanael4 at gmail dot com 2012-11-27 16:25:58 UTC --- Forgot to mention: 4.7.2 - works 4.8.0 r193777 - fails