[Bug c++/77396] New: address sanitizer crashes if all static global variables are optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77396 Bug ID: 77396 Summary: address sanitizer crashes if all static global variables are optimized Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- Following stackoverflow discussion: http://stackoverflow.com/questions/39081183/errors-with-g-5-and-6-when-using-address-sanitizer-and-additional-asan-flags-f/39152524#39152524 Compiling static int data = 0; static int dummy = data; int main() { } with g++ -O2 -fsanitizer=address and running the executable with ASAN_OPTIONS=verbosity=0:strict_string_checks=true:detect_odr_violation=2:check_initialization_order=true:detect_stack_use_after_return=true:strict_init_order=true ./a.out leads since g++5 to an executable for which the sanitizer crashes during the run. The problem is that `data` and `dummy` get optimized together with __asan_register_globals(), but __asan_before_dynamic_init is still called.
[Bug c++/77397] New: function initializing global static variables not optimized again fully
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77397 Bug ID: 77397 Summary: function initializing global static variables not optimized again fully Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- Compiling static int data = 0; static int dummy = data; int main() { } with g++ -Os leads to a less than perfect assembler code: .file "main.cpp" .section.text.startup,"ax",@progbits .globl main .type main, @function main: .LFB0: .cfi_startproc xorl%eax, %eax ret .cfi_endproc .LFE0: .size main, .-main .type _GLOBAL__sub_I_main, @function _GLOBAL__sub_I_main: .LFB2: .cfi_startproc ret .cfi_endproc .LFE2: .size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main .ident "GCC: (GNU) 6.1.0" .section.note.GNU-stack,"",@progbits Because the static variables are optimized away, __GLOBAL__sub_I_main (used to initialize them in the versions prior to 5.0) is not needed at all. This carelessness is a likely cause for bug 77396.
[Bug c++/80372] non-optimal handling of copying a std::complex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80372 --- Comment #3 from ead <no...@turm-lahnstein.de> --- (In reply to Andrew Pinski from comment #2) > What options are you using? -O2 or -O3 ? -mcpu=native ? It is compiled with -O3, but it is the same for -O2 or -Os. If compiled with -march=native, the result uses four vmovsd instead of four movsd, which does not change much: the new version is still the same slow and 36byte large.
[Bug c++/80372] New: non-optimal handling of copying a std::complex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80372 Bug ID: 80372 Summary: non-optimal handling of copying a std::complex Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- While copying a std::complex from a memory location to another, four movsd operations are used. However it is possible to use two movups, which are faster (at least on some hardware) and need less memory (36 bytes for movsd-version, but only 16 the the movups-version). Consider the following example: #include void get(std::complex *res){ res[1]=res[0]; } is compiled to: movsd (%rdi), %xmm0 movsd %xmm0, 16(%rdi) movsd 8(%rdi), %xmm0 movsd %xmm0, 24(%rdi) ret but could be: movups (%rdi), %xmm0 movups %xmm0, 16(%rdi) ret That is in fact, what clang and icc17 do.
[Bug c++/80373] New: non-optimal handling of copying a std::complex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80373 Bug ID: 80373 Summary: non-optimal handling of copying a std::complex Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- While copying a std::complex from a memory location to another, four movsd operations are used. However it is possible to use two movups, which are faster (at least on some hardware) and need less memory (36 bytes for movsd-version, but only 16 the the movups-version). Consider the following example: #include void get(std::complex *res){ res[1]=res[0]; } is compiled to: movsd (%rdi), %xmm0 movsd %xmm0, 16(%rdi) movsd 8(%rdi), %xmm0 movsd %xmm0, 24(%rdi) ret but could be: movups (%rdi), %xmm0 movups %xmm0, 16(%rdi) ret That is in fact, what clang and icc17 do.
[Bug c/87268] New: Missed optimization for a tailcall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87268 Bug ID: 87268 Summary: Missed optimization for a tailcall Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- For a simple code like this: extern int shared; void doit(int *); int call_doit(){ doit(); } when compiled with -O3 the resulting assembler is without tailcall optimization: call_doit: subq$8, %rsp movl$shared, %edi calldoit addq$8, %rsp ret There are two thing that are probably not needed: 1. The whole "subq$8, %rsp / addq$8, %rsp" is not really necessary, isn't it? 2. call instead of simple jmp, which would be possible due to tailcall optimization. Possibly it was not performed, because subq/addq are still hanging around. If I'm not mistaken, something like: call_doit: movl$global, %edi jmp doit should be possible as output.
[Bug middle-end/87268] Missed optimization for a tailcall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87268 --- Comment #3 from ead --- Sorry, I only saw that clang gives me what I expect... and overlooked the warning. call_doit should return void and not int.
[Bug c++/86497] New: Regression for x!=x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86497 Bug ID: 86497 Summary: Regression for x!=x Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- When compiling bool is_nan1(double x){ return x!=x; } with g++-8.1 -O3 the resulting assembler (https://godbolt.org/g/BBFM3Q) is _Z7is_nan1d: ucomisd %xmm0, %xmm0 movl $1, %edx setne %al cmovp %edx, %eax ret However, for version 7.3 the result was (https://godbolt.org/g/tR69jf) better: _Z7is_nan1d: ucomisd %xmm0, %xmm0 setp %al ret Also for 8.1 -Os is the assembler somewhat strange: _Z7is_nan1d: ucomisd %xmm0, %xmm0 movb $1, %al jp .L2 setne %al
[Bug c++/84891] New: -fno-signed-zeros leads to optimization which should be possible only if also -ffinite-math-only is on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84891 Bug ID: 84891 Summary: -fno-signed-zeros leads to optimization which should be possible only if also -ffinite-math-only is on Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- Please consider the following example #include #include #include std::complex mult(std::complex c, double im){ std::complex jomega(0.0, im); return c*jomega; } int main(){ //(-nan,-nan) expected: std::cout<<"case INF: "<<mult(std::complex(INFINITY,0.0),0.0<<"\n"; //(nan,nan) expected: std::cout<<"case NAN: "<<mult(std::complex(NAN,0.0), 0.0)<<"\n"; } when compiled with -fno-signed-zeros the compiler seems to make some optimizations which should not be possible without -ffinite-math-only, because the result of 0.0*nan and 0.0*inf isn't 0.0. Live here http://coliru.stacked-crooked.com/a/bca026da888d5b5d echo "IEEE 754"; g++ -std=c++17 -O2 -Wall -pedantic -pthread main.cpp && ./a.out echo "Non IEEE 754"; g++ -std=c++17 -O2 -Wall -fno-signed-zeros -pedantic -pthread main.cpp && ./a.out gives us: IEEE 754 case INF: (-nan,-nan) case NAN: (nan,nan) Non IEEE 754 case INF: (nan,0) case NAN: (nan,0) The resulting assembler is (see also https://godbolt.org/g/TSvcSp) mult(std::complex, double): mulsd %xmm2, %xmm1 movapd %xmm0, %xmm3 mulsd %xmm2, %xmm3 movapd %xmm1, %xmm0 movapd %xmm3, %xmm1 xorpd .LC0(%rip), %xmm0 ret .LC0: .long 0 .long -2147483648 .long 0 .long 0 with only two multiplication instead of four. The clang-behavior is more similar to gcc-version 4.6.
[Bug c++/85292] New: multiple definition of default argument emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85292 Bug ID: 85292 Summary: multiple definition of default argument emitted Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- Please consider the following code: #include namespace timeit{ typedef std::function Fun; //returns time needed for execution in seconds, calls setup once template void timeit(const Fun1 , const Fun =[]{}) { setup(); f(); } } int main(){ timeit::timeit([](){return 0.0;}); timeit::timeit([](){ return; }); } When trying to compile, I get the following error message: /tmp/cc4aRGV3.s: Assembler messages: /tmp/cc4aRGV3.s:36: Error: symbol `_ZNSt14_Function_base13_Base_managerIN6timeitUlvE_EE10_M_managerERSt9_Any_dataRKS4_St18_Manager_operation' is already defined /tmp/cc4aRGV3.s:58: Error: symbol `_ZNSt17_Function_handlerIFvvEN6timeitUlvE_EE9_M_invokeERKSt9_Any_data' is already defined See it live at http://coliru.stacked-crooked.com/a/da080519413414da I don't see, why this code should not compile (and if it is ill-formed then the compiler and not the assembler should report the error).
[Bug middle-end/84891] -fno-signed-zeros leads to optimization which should be possible only if also -ffinite-math-only is on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84891 --- Comment #4 from ead <no...@turm-lahnstein.de> --- From my naive point of view: - The c++ standard doesn't define how complex-number-multiplication should work, it is implementation defined/gcc-specific (I'm not a standard-scholar, so might be very wrong about it). - One can deduce from the results in the IEEE 754 mode, how this multiplication (implementation-defined and gcc-specific) is implemented. - One's expectation with -fno-signed-zeros is that only transformations which honor infs/nans can be performed by the optimizer. Clearly, we cannot refer to the standard to see, what the result with -fno-signed-zeros should be, because -fno-signed-zeros is not covered by standard, but is a gcc-specific option. So in this case, the effect of -fno-signed-zeros is not covered by the description in the man-pages (at least as I understand that): -fno-signed-zeros Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn't significant. The default is -fsigned-zeros.
[Bug c/90356] New: Missed optimization for variables initialized to 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90356 Bug ID: 90356 Summary: Missed optimization for variables initialized to 0.0 Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- For the following example: float doit(float k){ float c[2]={0.0}; c[1]+=k; return c[0]+c[1]; } the resulting assembler (-O2) is (https://gcc.godbolt.org/z/sSi9OC): doit: pxor%xmm1, %xmm1 addss %xmm1, %xmm0 addss %xmm1, %xmm0 ret but should be more like: doit: # pxor%xmm1, %xmm1 ; or maybe xorps addss %xmm1, %xmm0 retq because c[0] is 0.0.
[Bug middle-end/90356] Missed optimization for variables initialized to 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90356 --- Comment #3 from ead --- I guess -0.0+0.0=0.0 is the reason we have to add it once. I think there is no need to add 0.0 twice. Btw. compiled with -fno-signed-zeros, the code gets optimized to doit: ret as expected.
[Bug c/91348] New: Missed optimization: not passing hidden pointer but copying memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91348 Bug ID: 91348 Summary: Missed optimization: not passing hidden pointer but copying memory Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- For the following example: struct Vec3{ double x, y, z; }; void vadd_v2(struct Vec3* a, struct Vec3* out); struct Vec3 use_v1(struct Vec3 *in){ struct Vec3 out; vadd_v2(in, ); return out; } the resulting assembler (-O2 -Wall) is: use_v1: pushq %r12 movq%rdi, %r12 movq%rsi, %rdi subq$32, %rsp movq%rsp, %rsi callvadd_v2 movq16(%rsp), %rax movdqa (%rsp), %xmm0 movq%rax, 16(%r12) movq%r12, %rax movups %xmm0, (%r12) addq$32, %rsp popq%r12 ret However, the hidden pointer could be passed directly into vadd_v2, which is what clang is doing: use_v1: # @use_v1 pushq %rbx movq%rdi, %rbx movq%rsi, %rdi movq%rbx, %rsi callq vadd_v2 movq%rbx, %rax popq%rbx retq See also https://godbolt.org/z/rT41Sj
[Bug c/91515] New: missed optimization: no tailcall for types of class MEMORY
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91515 Bug ID: 91515 Summary: missed optimization: no tailcall for types of class MEMORY Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- Produced assembler (-O2) for struct Vec3{ double x, y, z; }; struct Vec3 create(void); struct Vec3 use(){ return create(); } looks as follows (live: https://godbolt.org/z/v-HjX0): use: pushq %r12 movq%rdi, %r12 callcreate movq%r12, %rax popq%r12 ret Hower, I think that under System V AMD64 - ABI, the tailcall optimization: use: jmpcreate as create will move %rdi-value to %rax anyway.
[Bug c/91398] Possible missed optimization: Can a pointer be passed as hidden pointer in x86-64 System V ABI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91398 --- Comment #3 from ead --- Thank you for the expanations and your time!
[Bug c/91398] New: Possible missed optimization: Can a pointer be passed as hidden pointer in x86-64 System V ABI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91398 Bug ID: 91398 Summary: Possible missed optimization: Can a pointer be passed as hidden pointer in x86-64 System V ABI Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: no...@turm-lahnstein.de Target Milestone: --- For the following example: struct Vec3{ double x, y, z; }; struct Vec3 do_something(void); void use(struct Vec3 *restrict out){ *out = do_something(); } The resulting assembly (-O2) is: use: pushq %rbx movq%rdi, %rbx subq$32, %rsp movq%rsp, %rdi calldo_something movdqu (%rsp), %xmm0 movq16(%rsp), %rax movups %xmm0, (%rbx) movq%rax, 16(%rbx) addq$32, %rsp popq%rbx ret Here on godbolt: https://godbolt.org/z/kUPFox However, as out is restrict, it could be passed as hidden pointer to do_something, which would lead to the following assembler: use: jmp do_something ; %rdi is now the hidden pointer So is it a missed optimization, or is there something in x86-64 System V ABI that would forbid the above?