[Bug c++/77396] New: address sanitizer crashes if all static global variables are optimized

2016-08-28 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77396

Bug ID: 77396
   Summary: address  sanitizer crashes if all static global
variables are optimized
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

Following stackoverflow discussion:
http://stackoverflow.com/questions/39081183/errors-with-g-5-and-6-when-using-address-sanitizer-and-additional-asan-flags-f/39152524#39152524

Compiling 

static int data = 0; 
static int dummy = data; 
int main() { }

with g++ -O2 -fsanitizer=address and running the executable with
ASAN_OPTIONS=verbosity=0:strict_string_checks=true:detect_odr_violation=2:check_initialization_order=true:detect_stack_use_after_return=true:strict_init_order=true
./a.out

leads since g++5 to an executable for which the sanitizer crashes during the
run. 

The problem is that `data` and `dummy` get optimized together with
__asan_register_globals(), but __asan_before_dynamic_init is still called.

[Bug c++/77397] New: function initializing global static variables not optimized again fully

2016-08-28 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77397

Bug ID: 77397
   Summary: function initializing global static variables not
optimized again fully
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

Compiling 

static int data = 0;  static int dummy = data; int main() { }

with g++ -Os leads to a less than perfect assembler code:

.file   "main.cpp"
.section.text.startup,"ax",@progbits
.globl  main
.type   main, @function
main:
.LFB0:
.cfi_startproc
xorl%eax, %eax
ret
.cfi_endproc
.LFE0:
.size   main, .-main
.type   _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB2:
.cfi_startproc
ret
.cfi_endproc
.LFE2:
.size   _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.ident  "GCC: (GNU) 6.1.0"
.section.note.GNU-stack,"",@progbits


Because the static variables are optimized away, __GLOBAL__sub_I_main (used to
initialize them in the versions prior to 5.0) is not needed at all.

This carelessness is a likely cause for bug 77396.

[Bug c++/80372] non-optimal handling of copying a std::complex

2017-04-09 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80372

--- Comment #3 from ead <no...@turm-lahnstein.de> ---
(In reply to Andrew Pinski from comment #2)
> What options are you using?  -O2 or -O3 ? -mcpu=native ?

It is compiled with -O3, but it is the same for -O2 or -Os. 

If compiled with -march=native, the result uses four vmovsd instead of four
movsd, which does not change much: the new version is still the same slow and
36byte large.

[Bug c++/80372] New: non-optimal handling of copying a std::complex

2017-04-08 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80372

Bug ID: 80372
   Summary: non-optimal handling of copying a std::complex
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

While copying a std::complex from a memory location to another, four
movsd operations are used. However it is possible to use two movups, which are
faster (at least on some hardware) and need less memory (36 bytes for
movsd-version, but only 16 the the movups-version).

Consider the following example:

#include   
void get(std::complex *res){
   res[1]=res[0];
}

is compiled to:

movsd   (%rdi), %xmm0
movsd   %xmm0, 16(%rdi)
movsd   8(%rdi), %xmm0
movsd   %xmm0, 24(%rdi)
ret

but could be:

movups  (%rdi), %xmm0
movups  %xmm0, 16(%rdi)
ret

That is in fact, what clang and icc17 do.

[Bug c++/80373] New: non-optimal handling of copying a std::complex

2017-04-08 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80373

Bug ID: 80373
   Summary: non-optimal handling of copying a std::complex
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

While copying a std::complex from a memory location to another, four
movsd operations are used. However it is possible to use two movups, which are
faster (at least on some hardware) and need less memory (36 bytes for
movsd-version, but only 16 the the movups-version).

Consider the following example:

#include   
void get(std::complex *res){
   res[1]=res[0];
}

is compiled to:

movsd   (%rdi), %xmm0
movsd   %xmm0, 16(%rdi)
movsd   8(%rdi), %xmm0
movsd   %xmm0, 24(%rdi)
ret

but could be:

movups  (%rdi), %xmm0
movups  %xmm0, 16(%rdi)
ret

That is in fact, what clang and icc17 do.

[Bug c/87268] New: Missed optimization for a tailcall

2018-09-10 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87268

Bug ID: 87268
   Summary: Missed optimization for a tailcall
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

For a simple code like this:

  extern int shared;
  void doit(int *);
  int call_doit(){
doit();
  }

when compiled with -O3 the resulting assembler is without tailcall
optimization:

   call_doit:
subq$8, %rsp
movl$shared, %edi
calldoit
addq$8, %rsp
ret

There are two thing that are probably not needed:

   1. The whole "subq$8, %rsp / addq$8, %rsp"  is not really necessary,
isn't it?
   2. call instead of simple jmp, which would be possible due to tailcall
optimization. Possibly it was not performed, because subq/addq are still
hanging around.

If I'm not mistaken, something like:

   call_doit:
movl$global, %edi
jmp doit

should be possible as output.

[Bug middle-end/87268] Missed optimization for a tailcall

2018-09-10 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87268

--- Comment #3 from ead  ---
Sorry, I only saw that clang gives me what I expect... and overlooked the
warning.

  call_doit should return void and not int.

[Bug c++/86497] New: Regression for x!=x

2018-07-11 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86497

Bug ID: 86497
   Summary: Regression for x!=x
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

When compiling

bool is_nan1(double x){
return x!=x;
}

with g++-8.1  -O3 the resulting assembler (https://godbolt.org/g/BBFM3Q) is 

_Z7is_nan1d:
  ucomisd %xmm0, %xmm0
  movl $1, %edx
  setne %al
  cmovp %edx, %eax
  ret

However, for version 7.3 the result was (https://godbolt.org/g/tR69jf) better:

_Z7is_nan1d:
  ucomisd %xmm0, %xmm0
  setp %al
  ret

Also for 8.1 -Os is the assembler somewhat strange:

_Z7is_nan1d:
  ucomisd %xmm0, %xmm0
  movb $1, %al
  jp .L2
  setne %al

[Bug c++/84891] New: -fno-signed-zeros leads to optimization which should be possible only if also -ffinite-math-only is on

2018-03-15 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84891

Bug ID: 84891
   Summary: -fno-signed-zeros leads to optimization which should
be possible only if also -ffinite-math-only is on
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

Please consider the following example

   #include 
   #include 
   #include 

   std::complex mult(std::complex c, double im){
   std::complex jomega(0.0, im);
   return c*jomega;
   }


   int main(){
 //(-nan,-nan) expected:
 std::cout<<"case INF:
"<<mult(std::complex(INFINITY,0.0),0.0<<"\n";

 //(nan,nan) expected:
 std::cout<<"case NAN: "<<mult(std::complex(NAN,0.0), 0.0)<<"\n";   
   }

when compiled with -fno-signed-zeros the compiler seems to make some
optimizations which should not be possible without -ffinite-math-only, because
the result of 0.0*nan and 0.0*inf isn't 0.0. Live here
http://coliru.stacked-crooked.com/a/bca026da888d5b5d


echo "IEEE 754"; g++ -std=c++17 -O2 -Wall -pedantic -pthread main.cpp &&
./a.out
echo "Non IEEE 754"; g++ -std=c++17 -O2 -Wall -fno-signed-zeros -pedantic
-pthread main.cpp && ./a.out

gives us:

IEEE 754
case INF: (-nan,-nan)
case NAN: (nan,nan)
Non IEEE 754
case INF: (nan,0)
case NAN: (nan,0)


The resulting assembler is (see also https://godbolt.org/g/TSvcSp)

mult(std::complex, double):
mulsd   %xmm2, %xmm1
movapd  %xmm0, %xmm3
mulsd   %xmm2, %xmm3
movapd  %xmm1, %xmm0
movapd  %xmm3, %xmm1
xorpd   .LC0(%rip), %xmm0
ret
  .LC0:
.long   0
.long   -2147483648
.long   0
.long   0

with only two multiplication instead of four. The clang-behavior is more
similar to gcc-version 4.6.

[Bug c++/85292] New: multiple definition of default argument emitted

2018-04-08 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85292

Bug ID: 85292
   Summary: multiple definition of default argument emitted
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

Please consider the following code:

#include 


namespace timeit{
typedef std::function  Fun;

//returns time needed for execution in seconds, calls setup once
template
void timeit(const Fun1 , const Fun =[]{})
{  
   setup();
   f();
}
}

int main(){
   timeit::timeit([](){return 0.0;});
   timeit::timeit([](){ return; });
}

When trying to compile, I get the following error message:

/tmp/cc4aRGV3.s: Assembler messages:

/tmp/cc4aRGV3.s:36: Error: symbol
`_ZNSt14_Function_base13_Base_managerIN6timeitUlvE_EE10_M_managerERSt9_Any_dataRKS4_St18_Manager_operation'
is already defined

/tmp/cc4aRGV3.s:58: Error: symbol
`_ZNSt17_Function_handlerIFvvEN6timeitUlvE_EE9_M_invokeERKSt9_Any_data' is
already defined

See it live at http://coliru.stacked-crooked.com/a/da080519413414da

I don't see, why this code should not compile (and if it is ill-formed then the
compiler and not the assembler should report the error).

[Bug middle-end/84891] -fno-signed-zeros leads to optimization which should be possible only if also -ffinite-math-only is on

2018-03-19 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84891

--- Comment #4 from ead <no...@turm-lahnstein.de> ---
From my naive point of view:

   - The c++ standard doesn't define how complex-number-multiplication should
work, it is implementation defined/gcc-specific (I'm not a standard-scholar, so
might be very wrong about it).

   - One can deduce from the results in the IEEE 754 mode, how this
multiplication (implementation-defined and gcc-specific) is implemented.

   - One's expectation with -fno-signed-zeros is that only transformations
which honor infs/nans can be performed by the optimizer. Clearly, we cannot
refer to the standard to see, what the result with -fno-signed-zeros should be,
because -fno-signed-zeros is not covered by standard, but is a gcc-specific
option.

So in this case, the effect of -fno-signed-zeros is not covered by the
description in the man-pages (at least as I understand that):



-fno-signed-zeros

Allow optimizations for floating point arithmetic that ignore the
signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and
-0.0 values, which then prohibits simplification of expressions such as x+0.0
or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a
zero result isn't significant.

The default is -fsigned-zeros.

[Bug c/90356] New: Missed optimization for variables initialized to 0.0

2019-05-05 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90356

Bug ID: 90356
   Summary: Missed optimization for variables initialized to 0.0
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

For the following example:

float doit(float k){
float c[2]={0.0};
c[1]+=k;
return c[0]+c[1];
}

the resulting assembler (-O2) is (https://gcc.godbolt.org/z/sSi9OC):

doit:
pxor%xmm1, %xmm1
addss   %xmm1, %xmm0
addss   %xmm1, %xmm0
ret

but should be more like:

doit:   # 
pxor%xmm1, %xmm1  ; or maybe xorps
addss   %xmm1, %xmm0
retq


because c[0] is 0.0.

[Bug middle-end/90356] Missed optimization for variables initialized to 0.0

2019-05-05 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90356

--- Comment #3 from ead  ---
I guess -0.0+0.0=0.0 is the reason we have to add it once. I think there is no
need to add 0.0 twice.


Btw. compiled with -fno-signed-zeros, the code gets optimized to 

doit:
ret

as expected.

[Bug c/91348] New: Missed optimization: not passing hidden pointer but copying memory

2019-08-04 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91348

Bug ID: 91348
   Summary: Missed optimization: not passing hidden pointer but
copying memory
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

For the following example:

struct Vec3{
double x, y, z;
};

void vadd_v2(struct Vec3* a, struct Vec3* out);

struct Vec3 use_v1(struct Vec3 *in){
struct Vec3 out;
vadd_v2(in, );
return out;
}


the resulting assembler (-O2 -Wall) is:

use_v1:
pushq   %r12
movq%rdi, %r12
movq%rsi, %rdi
subq$32, %rsp
movq%rsp, %rsi
callvadd_v2
movq16(%rsp), %rax
movdqa  (%rsp), %xmm0
movq%rax, 16(%r12)
movq%r12, %rax
movups  %xmm0, (%r12)
addq$32, %rsp
popq%r12
ret

However, the hidden pointer could be passed directly into vadd_v2, which is
what clang is doing:

use_v1: # @use_v1
pushq   %rbx
movq%rdi, %rbx
movq%rsi, %rdi
movq%rbx, %rsi
callq   vadd_v2
movq%rbx, %rax
popq%rbx
retq

See also https://godbolt.org/z/rT41Sj

[Bug c/91515] New: missed optimization: no tailcall for types of class MEMORY

2019-08-21 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91515

Bug ID: 91515
   Summary: missed optimization: no tailcall for types of class
MEMORY
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

Produced assembler (-O2) for

   struct Vec3{
double x, y, z;
   };

   struct Vec3 create(void);

   struct Vec3 use(){
return create();
   }

looks as follows (live: https://godbolt.org/z/v-HjX0):

use:
pushq   %r12
movq%rdi, %r12
callcreate
movq%r12, %rax
popq%r12
ret

Hower, I think that under System V AMD64 - ABI, the tailcall optimization:

use:
jmpcreate

as create will move  %rdi-value to %rax anyway.

[Bug c/91398] Possible missed optimization: Can a pointer be passed as hidden pointer in x86-64 System V ABI

2019-08-09 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91398

--- Comment #3 from ead  ---
Thank you for the expanations and your time!

[Bug c/91398] New: Possible missed optimization: Can a pointer be passed as hidden pointer in x86-64 System V ABI

2019-08-08 Thread no...@turm-lahnstein.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91398

Bug ID: 91398
   Summary: Possible missed optimization: Can a pointer be passed
as hidden pointer in x86-64 System V ABI
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: no...@turm-lahnstein.de
  Target Milestone: ---

For the following example:

struct Vec3{
double x, y, z;
};

struct Vec3 do_something(void);

void use(struct Vec3 *restrict out){
*out = do_something();
}

The resulting assembly (-O2) is:

use:
pushq   %rbx
movq%rdi, %rbx
subq$32, %rsp
movq%rsp, %rdi
calldo_something
movdqu  (%rsp), %xmm0
movq16(%rsp), %rax
movups  %xmm0, (%rbx)
movq%rax, 16(%rbx)
addq$32, %rsp
popq%rbx
ret

Here on godbolt: https://godbolt.org/z/kUPFox

However, as out is restrict, it could be passed as hidden pointer to
do_something, which would lead to the following assembler:

use:
jmp do_something ; %rdi is now the hidden pointer

So is it a missed optimization, or is there something in x86-64 System V ABI
that would forbid the above?