[Bug middle-end/116651] New: Memory allocation elision for std::vector like cases

2024-09-09 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116651

Bug ID: 116651
   Summary: Memory allocation elision for std::vector like cases
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Code close to the following was noted in many user applications:

bool test1(const std::vector& in) {
return in == std::vector{"*"};
}

Here people wish to make sure that the vector contains only a single "*"
element. In other words they assume that the above code snippet would be
optimized to something like:

bool test2(const std::vector& in) {
return in.size() == 1 && in[0] == "*";
}


Unfortunately that does not happen: https://godbolt.org/z/r59a4nobP

Note that all the functions are inlined however the new+delete are not elided.


Minimized reproducer: https://godbolt.org/z/jvcEd8zo6

[Bug middle-end/115309] New: Simple coroutine based generator is not optimized well

2024-05-31 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115309

Bug ID: 115309
   Summary: Simple coroutine based generator is not optimized well
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following minimal C++ coroutine based generator: 

#include 

namespace {
struct generator {
  struct promise_type {
using handle = std::coroutine_handle;
unsigned value{};

generator get_return_object() noexcept {
  return generator{handle::from_promise(*this)};
}

std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }  
void return_void() noexcept {}
void unhandled_exception() { __builtin_abort(); }

std::suspend_always yield_value(unsigned v) noexcept {
  value = v;
  return {};
}
  };

  ~generator() noexcept { m_coro.destroy(); }
  unsigned operator*() { return m_coro.promise().value; }
private:
  promise_type::handle m_coro;
  explicit generator(promise_type::handle coro) noexcept: m_coro{coro} {}
};

generator generate_1() { co_yield 1; }
}

unsigned test() {
auto gen = generate_1();
return *gen;
}



The expected assembly is:
test():
mov eax, 1
ret

However, trunk GCC with `-O2 -std=c++23` flags generates 60+ instructions with
dynamic merory allocations and function calls.

Godbolt playground: https://godbolt.org/z/6PvfTfx9n


Looks that the main part of the problem is the missing allocation elision for
coroutine.

Note that the same problem arises with the Standard C++ std::generator:
https://godbolt.org/z/EvEPT7d1T

[Bug middle-end/114661] New: Bit operations not optimized to multiplication

2024-04-09 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114661

Bug ID: 114661
   Summary: Bit operations not optimized to multiplication
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

unsigned mul(unsigned char c) {
if (c > 3) __builtin_unreachable();
return c << 18 | c << 15 |
c << 12 | c << 9 |
c << 6 | c << 3 | c;
}

GCC with -O2 generates the following assembly:

mul(unsigned char):
  movzx edi, dil
  lea edx, [rdi+rdi*8]
  lea eax, [0+rdx*8]
  mov ecx, edx
  sal edx, 15
  or eax, edi
  sal ecx, 9
  or eax, ecx
  or eax, edx
  ret

However it could be optimized to just:

mul(unsigned char):
  imul eax, edi, 299593
  ret

Compiling with -Os does not help.

Godbolt playground: https://godbolt.org/z/YszzMbovK

P.S.: without `c << 18 | c << 15 |` the bit operations are transformed to
multiplication.

[Bug middle-end/114660] Exponentiating by squaring not performed for x * y * y * y * y

2024-04-09 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114660

--- Comment #1 from Antony Polukhin  ---
The above godbolt link for an old version of GCC, here's for 14.0
https://godbolt.org/z/dTPYY1T9W

[Bug middle-end/114660] New: Exponentiating by squaring not performed for x * y * y * y * y

2024-04-09 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114660

Bug ID: 114660
   Summary: Exponentiating by squaring not performed for x * y * y
* y * y
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

For the following code:

int mul(int x, int y) {
return x * y * y * y * y;
}


with -O2 GCC produces the frollowing assembly:

mul(int, int):
  mov eax, edi
  imul eax, esi
  imul eax, esi
  imul eax, esi
  imul eax, esi
  ret


However, a more optimal code could be generated with less multiplications:

mul(int, int):
mov eax, edi
imulesi, esi
imuleax, esi
imuleax, esi
ret

Godbolt playground: https://godbolt.org/z/6dP11jPfx

[Bug middle-end/114559] New: After function inlining some optimizations missing

2024-04-02 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114559

Bug ID: 114559
   Summary: After function inlining some optimizations missing
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

template 
int AtomicUpdate(int& atomic, Func updater) {
  int old_value = atomic;
  while (true) {
const int new_value = updater(int{old_value});
if (old_value == new_value) return old_value;
if (__atomic_compare_exchange_n(&atomic, &old_value, new_value, 1, 5, 5))
return new_value;
  }
}

int AtomicMin(int& atomic, int value) {
  return AtomicUpdate(atomic, [value](int old_value) {
return value < old_value ? value : old_value;
  });
}


With -O2 GCC produces the assembly:


AtomicMin(int&, int):
mov eax, DWORD PTR [rdi]
.L3:
cmp esi, eax
mov edx, eax
cmovle  edx, esi
jge .L4
lock cmpxchgDWORD PTR [rdi], edx
jne .L3
.L1:
mov eax, edx
ret
.L4:
mov edx, eax
jmp .L1


However, a more optimal assembly is possible:


AtomicMin(int&, int):# @AtomicMin(int&, int)
mov eax, dword ptr [rdi]
.LBB0_1:# =>This Inner Loop Header: Depth=1
cmp eax, esi
jle .LBB0_4
lockcmpxchg dword ptr [rdi], esi
jne .LBB0_1
mov eax, esi
.LBB0_4:
ret


Note that manual inlining of the lambda improves the codegen:

int AtomicMin(int& atomic, int value) {
  int old_value = atomic;
  while (true) {
const int new_value = (value < old_value ? value : old_value);
if (old_value == new_value) return old_value;
if (__atomic_compare_exchange_n(&atomic, &old_value, new_value, 1, 5, 5))
return new_value;
  }
}

Results in

AtomicMin(int&, int):
mov eax, DWORD PTR [rdi]
.L3:
cmp esi, eax
mov edx, eax
cmovle  edx, esi
jge .L1
lock cmpxchgDWORD PTR [rdi], edx
jne .L3
.L1:
mov eax, edx
ret


Godbolt playground: https://godbolt.org/z/G6YEGb15q

[Bug middle-end/114391] catch() and immediate throw; could be optimized to noop

2024-03-19 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114391

--- Comment #2 from Antony Polukhin  ---
> Is there something to optimize when foo() cannot be tail-called?

Yes. Just `catch (...) { throw; }`, no more restrictions. I do not even think,
that it should be the outer most EH region:


void foo();
void bar();

void test() {
try {
foo();
} catch (...) {
throw;
}
bar();
}


is fine to optimize to just


void test() {
foo();
bar();
}

[Bug middle-end/114391] New: catch() and immediate throw; could be optimized to noop

2024-03-19 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114391

Bug ID: 114391
   Summary: catch() and immediate throw; could be optimized to
noop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

void foo();

void test() {
try {
foo();
} catch (...) {
throw;
}
}


At the moment, the compiler at -O2 generates the assembly:

test():
  sub rsp, 24
  call foo()
  add rsp, 24
  ret
  mov rdi, rax
  jmp .L2

test() [clone .cold]:
.L2:
  call __cxa_begin_catch
  call __cxa_rethrow
  mov QWORD PTR [rsp+8], rax
  call __cxa_end_catch
  mov rdi, QWORD PTR [rsp+8]
  call _Unwind_Resume


However, an optimal assembly is:

test():
  jmp foo()


Please, add an optimization that removes catch() + immediate throw.


The sample code could be often met in release builds, due to some invariants
checks or debug logging are removed depending on NDEBUG:

void test() {
try {
foo();
} catch (...) {
#ifdef NDEBUG
std::cerr << "Unhandled exception!" << std::endl <<
boost::current_exception_diagnostic_information();
#endif
throw;
}
}  


Godbolt playground: https://godbolt.org/z/qdG91cMe1

[Bug middle-end/114270] New: Integer multiplication on floating point constant with conversion back to integer is not optimized

2024-03-07 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114270

Bug ID: 114270
   Summary: Integer multiplication on floating point constant with
conversion back to integer is not optimized
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following example:

unsigned test(unsigned x) {
return (unsigned)(x * 0.5);
}

With -O2 GCC generates the code with a fair conversion to fp and
multiplication:

test(unsigned int):
  mov edi, edi
  pxor xmm0, xmm0
  cvtsi2sd xmm0, rdi
  mulsd xmm0, QWORD PTR .LC0[rip]
  cvttsd2si rax, xmm0
  ret

However the multiplication does not overflow and the floating point constant is
a normal number.

A more optimal code should look like the following:

test(unsigned int):
  mov eax, edi
  shr eax
  ret

Probably the optimization could be used for
* any multiplication of integer on positive fp-number less or equal to 1.0
* any division of integer on positive fp-number greater or equal to 1.0
if the result is converted back to integer

[Bug middle-end/113959] New: Optimize `__builtin_isnan(x) || __builtin_isinf(x)` to `__builtin_isfinite(x)`

2024-02-16 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113959

Bug ID: 113959
   Summary: Optimize `__builtin_isnan(x) || __builtin_isinf(x)` to
`__builtin_isfinite(x)`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Sometimes people check for finite number using `__builtin_isnan(x) ||
__builtin_isinf(x)`. However `__builtin_isfinite(x)` produces a better
assembly.

Please, add the optimization.

Godbolt playground: https://godbolt.org/z/5r38169fn

[Bug tree-optimization/112683] New: Optimizing memcpy range by extending to word bounds

2023-11-23 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112683

Bug ID: 112683
   Summary: Optimizing memcpy range by extending to word bounds
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the minimized source code from libstdc++

```
struct string {
unsigned long _M_string_length;
enum { _S_local_capacity = 15 };
char _M_local_buf[_S_local_capacity + 1];
};

string copy(const string& __str) noexcept {
string result;

if (__str._M_string_length > __str._S_local_capacity)
__builtin_unreachable();

result._M_string_length = __str._M_string_length;
__builtin_memcpy(result._M_local_buf, __str._M_local_buf,
 __str._M_string_length + 1);

return result;
}
```

Right now GCC with -O2 emits a long assembly with ~50 instructions
https://godbolt.org/z/a89bh17hd

However, note that
* the `result._M_local_buf` is uninitialized,
* there's at most 16 bytes to copy to `result._M_local_buf` which is of size 16
bytes

So the compiler could optimize the code to always copy 16 bytes. The behavior
change is not observable by user as the uninitialized bytes could contain any
data, including the same bytes as `_str._M_local_buf`.

As a result of always copying 16 bytes, the assembly becomes more than 7 times
shorter, conditional jumps go away: https://godbolt.org/z/r5GPYTs4Y

[Bug libstdc++/112682] New: More efficient std::basic_string move construction

2023-11-23 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112682

Bug ID: 112682
   Summary: More efficient std::basic_string move construction
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

A few places in bits/basic_string.h have the following code:

```
if (__str._M_is_local())
  {
_M_init_local_buf();
traits_type::copy(_M_local_buf, __str._M_local_buf,
  __str.length() + 1);
  }
```

Despite the knowledge, that `__str.length()` is not greater than 15 the
compiler emits (and inlines) a memcpy call. That results in a quite big set of
instructions https://godbolt.org/z/j35MMfxzq

Replacing `__str.length() + 1` with `_S_local_capacity + 1` explicitly forces
the compiler to copy the whole `__str._M_local_buf`. As a result the assembly
becomes almost 5 times shorter and without any function calls or multiple
conditional jumps https://godbolt.org/z/bfq8bxra9


P.S.: not sure, if it is allowed to copy uninitialized data via
traits_type::copy and would the sanitizers be happy with such copy attempt.

[Bug tree-optimization/112584] New: Suboptimal stack usage on third memcpy

2023-11-17 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112584

Bug ID: 112584
   Summary: Suboptimal stack usage on third memcpy
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


struct string_view {
const char* data;
unsigned long size;
};

void AppendToCharArray(char*& data, string_view s1, string_view s2, string_view
s3) {
  __builtin_memcpy(data, s1.data, s1.size);
  data += s1.size;

  __builtin_memcpy(data, s2.data, s2.size);
  data += s2.size;

  __builtin_memcpy(data, s3.data, s3.size);
  data += s3.size;
}


With -O2 it generates an assembly with 6 push and 6 pop instructions. However,
there's a better assembly possible:

  push r15
  push r14
  push r12
  push rbx
  push rax
  mov rbx, r8
  mov r14, rcx
  mov r15, rdx
  mov r12, rdi
  mov rdi, qword ptr [rdi]
  call memcpy
  add r15, qword ptr [r12]
  mov qword ptr [r12], r15
  mov rdi, r15
  mov rsi, r14
  mov rdx, rbx
  call memcpy
  add rbx, qword ptr [r12]
  mov qword ptr [r12], rbx
  mov rsi, qword ptr [rsp + 48]
  mov r14, qword ptr [rsp + 56]
  mov rdi, rbx
  mov rdx, r14
  call memcpy
  add qword ptr [r12], r14
  add rsp, 8
  pop rbx
  pop r12
  pop r14
  pop r15
  ret

Godbolt playground: https://godbolt.org/z/EY8E1GGPz

[Bug libstdc++/112440] New: Compiler does not grok basic_string::resize and basic_string::reserve if _CharT is char

2023-11-08 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112440

Bug ID: 112440
   Summary: Compiler does not grok basic_string::resize and
basic_string::reserve if _CharT is char
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

#include 
void test1(std::size_t summ) {
std::string result;
result.resize(summ);

if (result.size() > summ) {
__builtin_abort();
}
}

The resulting assembly contains `call abort` and code to check the string size:
https://godbolt.org/z/zcj3Pc3G8

Looks like this is due to char* aliasing with string internals, switching to
std::u8string removes the `call abort` related assembly:
https://godbolt.org/z/a6bKaqqn5

I've failed to come up with a generic solution, but looks like adding
__builtin_unreachable() to the end of basic_string::resize and
basic_string::reserve helps: https://godbolt.org/z/vWcjqGK94


P.S.: such hints help to shorten the assembly for reserve+append*n cases
https://godbolt.org/z/nsEGsWdP3 , https://godbolt.org/z/qMf4b7dd8 ,
https://godbolt.org/z/1r6dd6d5M which are quire common

[Bug c++/111690] New: Redefinition of operator == not detected with friend <=>

2023-10-04 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111690

Bug ID: 111690
   Summary: Redefinition of operator == not detected with friend
<=>
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: accepts-invalid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

```
#include 

struct Foo { 
  friend auto operator<=>(const Foo&, const Foo&) = default;
};

bool operator==(const Foo& x, const Foo& y) noexcept {
  return true;
}

void Test() {
Foo{} == Foo{};
}
```

If my reading of [class.compare.default] p4 correct, then an == operator
function is already declared implicitly due to operator<=>.

So there should be an error of redeclaring or redefining the operator==.

Godbolt playground: https://godbolt.org/z/YP5vEMeYs

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-11 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #13 from Antony Polukhin  ---
There's a typo at
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`

[Bug target/110457] Unnecessary movsx eax, dil

2023-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110457

--- Comment #4 from Antony Polukhin  ---
Oh, if there's an disagreement I'm fine with closing this issue as
invalid/later/won't_fix

[Bug tree-optimization/110459] New: Trivial on stack variable was not optimized away

2023-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110459

Bug ID: 110459
   Summary: Trivial on stack variable was not optimized away
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

struct array {
char data[4];
};

auto sample2(char c) {
  array buffer = {c, 0, 0, 0};
  return buffer;
}


With GCC-14 and -O2 it produces the following assembly:

sample2(char):
xor eax, eax
mov BYTE PTR [rsp-22], 0
mov WORD PTR [rsp-24], ax
mov eax, DWORD PTR [rsp-24]
sal eax, 8
mov al, dil
ret


It could be further optimized to just:

sample2(char):
movzx   eax, dil
ret


Godbolt playground: https://godbolt.org/z/nxKhvo3ns

[Bug target/110457] Unnecessary movsx eax, dil

2023-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110457

--- Comment #1 from Antony Polukhin  ---
> However, it could be shortened to just:

sample1(char):
 imul   eax,edi,0x10111
 ret; missed in previous message

[Bug target/110457] New: Unnecessary movsx eax, dil

2023-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110457

Bug ID: 110457
   Summary: Unnecessary movsx   eax, dil
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

For the following code

int sample1(char c) {
  return (c << 4) + (c << 8) + (c << 16) + c;
}


GCC-14 with -O2 generates the assembly:

sample1(char):
 movsx  eax,dil
 imul   eax,eax,0x10111
 ret


However, it could be shortened to just:

sample1(char):
 imul   eax,edi,0x10111


Godbolt playground: https://godbolt.org/z/7GGdedEY8

[Bug c++/110363] New: New use-after-move warning

2023-06-22 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110363

Bug ID: 110363
   Summary: New use-after-move warning
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

There's a quite common rule "after the object was moved from it is in the
indeterminate state; it should be either destroyed or a new value should be
assigned to it". The C++ Standard Library follows that rule, many libraries and
companies follow that rule.


Please introduce some '-Wuse-after-move' that warns if the object could be used
after move:


struct resource {
resource(resource&&) noexcept;
~resource();

void kill_it() && noexcept;
void should_warn_use_after_move() const & noexcept;
};

void should_warn_use_after_move(resource& r) noexcept;

void do_something(resource r) {
static_cast(r).kill_it(); // moved out
should_warn_use_after_move(r);// warn
r.should_warn_use_after_move();   // warn
}


Some related request on stackoverflow
https://stackoverflow.com/questions/72532377/g-detect-use-after-stdmove

[Bug tree-optimization/110362] New: Range information on lower bytes of __uint128_t

2023-06-22 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110362

Bug ID: 110362
   Summary: Range information on lower bytes of __uint128_t
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following example:

int test (__uint128_t a, __uint128_t b) {
  __uint128_t __a = b | (a << 32);
  return __a & 0x;
}

At the moment GCC-14 with -O2 generates the following assembly:

test(unsigned __int128, unsigned __int128):
mov rsi, rdi
mov rax, rdx
sal rsi, 32
or  rax, rsi
ret


Which could be simplified to just:

test(unsigned __int128, unsigned __int128):
mov rax, rdx
ret

Godbolt playground: https://godbolt.org/z/K9x5vnhxq

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #2 from Antony Polukhin  ---
-fno-trapping-math had no effect

Some tests with nans seem to produce the same results for both code snippets:
https://godbolt.org/z/GaKM3EhMq

[Bug tree-optimization/110170] New: Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Bug ID: 110170
   Summary: Sub-optimal conditional jumps in conditional-swap with
floating point
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Some of the C++ algorithms are written in attempt to avoid conditional jumps in
tight loops. For example, code close the following could be seen in libc++:

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}


GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
movsd   xmm1, QWORD PTR [rdi]
movsd   xmm0, QWORD PTR [rsi]
comisd  xmm0, xmm1
jbe .L2
movqrax, xmm1
movapd  xmm1, xmm0
movqxmm0, rax
.L2:
movsd   QWORD PTR [rsi], xmm1
movsd   QWORD PTR [rdi], xmm0
ret


A conditional jump could be probably avoided in the following way:

__cond_swap(double*, double*):
movsd   xmm0, qword ptr [rdi]
movsd   xmm1, qword ptr [rsi]
movapd  xmm2, xmm0
minsd   xmm2, xmm1
maxsd   xmm1, xmm0
movsd   qword ptr [rsi], xmm1
movsd   qword ptr [rdi], xmm2
ret


Playground: https://godbolt.org/z/v3jW67x91

[Bug tree-optimization/109931] Knowledge on literal not used in optimization

2023-05-22 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109931

--- Comment #3 from Antony Polukhin  ---
> But that's because nothing in the function asserts this?  Without fully
> specializing and unrolling on the constant "hello" argument at least.

Yes, I was hoping for that unrolling to happen

Probably a more simplified case:


constexpr bool EqualICase(const char* lowercase, const char* y) noexcept {
for (;;) {
const auto lowercase_c = *lowercase;
if (!lowercase_c) return true;
if (lowercase_c != *y) {
return false;
}
++lowercase;
++y;
}
}
bool test2(const char* y) {
return EqualICase("he", y);
}

With range info for loads from read-only constants I'd expect this to become
just a

test2(char const*):
cmp BYTE PTR [rdi], 104
jne .L3
cmp BYTE PTR [rdi+1], 101
seteal
ret
.L3:
xor eax, eax
ret


rather than a fair loop with checks for \0

Godbolt: https://godbolt.org/z/z6rTYEzWx

[Bug tree-optimization/109931] New: Knowledge on literal not used in optimization

2023-05-22 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109931

Bug ID: 109931
   Summary: Knowledge on literal not used in optimization
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Function for comparing a lower-cased string with runtime string of known size:


constexpr bool ICaseEqualLowercase(const char* lowercase, const char* y,
   unsigned size) noexcept {
constexpr char kLowerToUpperMask = static_cast(~unsigned{32});
for (unsigned i = 0; i < size; ++i) {
const auto lowercase_c = lowercase[i];
if (lowercase_c != y[i]) {
if (!('a' <= lowercase_c && lowercase_c <= 'z') ||
(lowercase_c & kLowerToUpperMask) != y[i]) {
return false;
}
}
}

return true;
}
bool test2(const char* y) {
return ICaseEqualLowercase("hello", y, 5);
}


With GCC trunk and -O2 flags the GCC fails to understand that all the
characters of `lowercase` are lowercase ASCII and the expression `!('a' <=
lowercase_c && lowercase_c <= 'z')` is always `false`.

Because of that, additional instructions in loop are emitted:
lea esi, [rdx-97]
cmp sil, 25
ja  .L6


Godbolt playground: https://godbolt.org/z/xrc1T4oeW

[Bug tree-optimization/109829] New: Optimizing __builtin_signbit(x) ? -x : x or abs for FP

2023-05-12 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109829

Bug ID: 109829
   Summary: Optimizing __builtin_signbit(x) ? -x : x or abs for FP
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following 2 functions:

__float128 abs1(__float128 x) { return __builtin_fabsf128(x); }
__float128 abs2(__float128 x) { return __builtin_signbit(x) ? -x : x; }

They should provide the same results, however the codegen is different:

abs1(__float128):
pandxmm0, XMMWORD PTR .LC0[rip]
ret
abs2(__float128):
movmskpseax, xmm0
testal, 8
je  .L4
pxorxmm0, XMMWORD PTR .LC1[rip]
.L4:
ret


Looks like match.pd miss the __builtin_signbit(x) ? -x : x ->
__builtin_fabs*(x) pattern.

Playground: https://godbolt.org/z/bsxeozGqv

[Bug middle-end/108465] New: Optimize (a < b) == (b < a) to a == b

2023-01-19 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108465

Bug ID: 108465
   Summary: Optimize (a < b) == (b < a) to a == b
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

For GCC 12 the following code with -O2:

int compare_eq(int a, int b) {
return ((a < b) == (b < a));
}

compiles into the following assembly:

compare_eq(int, int):
cmp edi, esi
setldl
setgal
cmp dl, al
seteal
movzx   eax, al
ret

Which is suboptimal. More optimal assembly would be:

compare_eq(int, int):
xor eax, eax
cmp edi, esi
seteal
ret

Godbolt Playground: https://godbolt.org/z/4sfcTjjjb

Motivation: in generic C++ code the comparison is often done via a functor. The
algorithm is only allowed to use that functor:

if (__comp(a, b) == __comp(b, a)) {
return x;
} else if (__comp(b, a)) {
return y;
}

Because of that, with the inlined functor the comparison becomes ((a < b) == (b
< a))

[Bug libstdc++/71579] type_traits miss checks for type completeness in some traits

2022-09-01 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71579

--- Comment #22 from Antony Polukhin  ---
> Maybe we should consider dropping all the static assertions from traits that 
> are implemented using a compiler built-in.

Sounds like the right thing to do.

> Our type trait and the __has_virtual_destructor built-in both seem to get 
> this wrong, rejecting Incomplete[2], which is not a class type, and so 
> doesn't need to be complete (or maybe the precondition is wrong and there's a 
> library defect?)

The library precondition seems right. As I read it, the trait just checks for
`virtual` on the destructor. If there's no destructor - it is fine, no
`virtual` on it.

[Bug libstdc++/104361] New: Biased Reference Counting for the standard library

2022-02-03 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104361

Bug ID: 104361
   Summary: Biased Reference Counting for the standard library
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

There's a research named "Biased Reference Counting: Minimizing Atomic
Operations in Garbage Collection" that shows how to speed up reference counting
for some platforms for more than 20%
https://dl.acm.org/doi/pdf/10.1145/3243176.3243195 . 

The research does not talk about speedup of C++ but it is based on an
observation that most objects are only accessed by a single thread, which
allows most RC operations to be performed non-atomically. That observation fits
std::shared_ptr usage patterns.

Such a change seems to be an ABI break for shared_ptr, however may be it could
be used for stop_token and other new reference counted types.

[Bug c++/103745] New: Warn on throwing an exception not derived from std::exception

2021-12-16 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103745

Bug ID: 103745
   Summary: Warn on throwing an exception not derived from
std::exception
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Throwing an exception that is derived from std::exception is a common practice.
Cases when that practice should be skipped are very rare. However, many
beginners do not know about that and erroneously do not derive their exceptions
from std::exception. There are also cases when classes have close names and
users throw the wrong type by a typo.

Please add a warning about throwing an exception not derived from
std::exception.

Godbolt playground: https://godbolt.org/z/7Phf3nafW

[Bug tree-optimization/19661] unnecessary atexit calls emitted for static objects with empty destructors

2021-09-24 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19661

--- Comment #10 from Antony Polukhin  ---
Any progress?

Multiple compilers already eliminate the atexit call. Moreover, some of the
compilers even eliminate the guard variable after that 
https://godbolt.org/z/dbdfMrroa

Note that the atexit elimination would benefit the libstdc++, as the latter now
uses a bunch of constant_init instances that have empty destructor
in libstdc++-v3/src/c++17/memory_resource.cc and
libstdc++-v3/src/c++11/system_error.cc . It would be possible to eliminate the
atexit calls for those cases and speedup startup times
https://godbolt.org/z/MKaWKevzq

[Bug middle-end/101253] New: Optimize i % C1 == C0 || i % C1*C2 == C0 to i % C1 == C0

2021-06-29 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101253

Bug ID: 101253
   Summary: Optimize i % C1 == C0 || i % C1*C2 == C0 to i % C1 ==
C0
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code

bool test_naive(short i) {
return i % 100 == 0 || i % 400 == 0;
}

It could be optimized into

bool test_optim(short i) {
return i % 100 == 0;
}


Godbolt playground: https://godbolt.org/z/zW49qcs7G

P.S.: Inspired by the manual optimizations in libstdc++
https://github.com/gcc-mirror/gcc/commit/b92d12d3fe3f1aa56d190d960e40c62869a6cfbb

[Bug middle-end/101252] New: Optimize (b ? i % C0 : i % C1) into i & (b ? C0-1 : C1-1) for power of 2 C0 and C1

2021-06-29 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101252

Bug ID: 101252
   Summary: Optimize (b ? i % C0 : i % C1) into i & (b ? C0-1 :
C1-1) for power of 2 C0 and C1
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code

bool test_naive0(bool b, short i) {
return (b ? i % 4 : i % 16)==0;
}

It could be optimized into

bool test_optim0(bool b, short i) {
return (i & (b ? 15 : 3))==0;
}


Godbolt playground: https://godbolt.org/z/8vj999M3c

P.S.: Inspired by the manual optimizations in libstdc++
https://github.com/gcc-mirror/gcc/commit/b92d12d3fe3f1aa56d190d960e40c62869a6cfbb

[Bug middle-end/101251] New: Optimize i % (b ? C0 : C1) into i & (b ? C0-1 : C1-1) for power of 2 C0 and C1

2021-06-29 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101251

Bug ID: 101251
   Summary: Optimize i % (b ? C0 : C1) into i & (b ? C0-1 : C1-1)
for power of 2 C0 and C1
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following code

bool test_optim01(bool b, short i) {
return i % (b ? 4 : 16)==0;
}

It could be optimized into

bool test_optim0(bool b, short i) {
return (i & (b ? 15 : 3))==0;
}


Godbolt playground: https://godbolt.org/z/j15br4Kd4

P.S.: Inspired by the manual optimizations in libstdc++
https://github.com/gcc-mirror/gcc/commit/b92d12d3fe3f1aa56d190d960e40c62869a6cfbb

[Bug c++/58487] Missed return value optimization

2021-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58487

Antony Polukhin  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #3 from Antony Polukhin  ---
Minimized example, move constructor should not be called:

struct A {
  A() = default;
  A(A&&);
};

A test() {
  if (true) {
A a;
return a;
  } else {
return A{};
  }
}


Godbolt playground: https://godbolt.org/z/4Pzq83WWY

[Bug c++/58050] No return value optimization when calling static function through unnamed temporary

2021-06-28 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58050

Antony Polukhin  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #1 from Antony Polukhin  ---
This was fixed in GCC-10.1 https://godbolt.org/z/b4ohfnK3x

[Bug c++/100746] New: NRVO should not introduce aliasing

2021-05-24 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100746

Bug ID: 100746
   Summary: NRVO should not introduce aliasing
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:

struct NrvoPassed {
NrvoPassed() = default;
NrvoPassed(const NrvoPassed&);
NrvoPassed(NrvoPassed&&);

int i = 0;
};

auto test(int* data) {
NrvoPassed x;
*data = 3;
if (x.i != 0) __builtin_abort();
return x;
}


Resulting assembly contains call to `abort`:

test(int*):
  mov DWORD PTR [rdi], 0
  mov DWORD PTR [rsi], 3
  mov edx, DWORD PTR [rdi]
  test edx, edx
  jne .L3
  mov rax, rdi
  ret
test(int*) [clone .cold]:
.L3:
  push rax
  call abort

Optimizer thinks that the value of `x.i` is aliased by `data`, however `data`
is a local variable and it's address could not leak before the object is
constructed.

Some other compilers already have the proposed optimization:
https://godbolt.org/z/aqdveadnE

Adding `__restrict` to `data` fixes the codegen:

test2(int*):
  mov DWORD PTR [rdi], 0
  mov rax, rdi
  mov DWORD PTR [rsi], 3
  ret

Probably `__restrict` should be always added to the storage address passed for
NRVO.

[Bug libstdc++/89120] std::minmax_element 2.5 times slower than hand written loop

2021-05-17 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120

--- Comment #2 from Antony Polukhin  ---
Long story short: I've found no way to improve the standard library code to
always work faster. I'm in favor of closing this ticket as invalid/wont fix.

Long story:

I've tried to add a specialization of minmax_element algorithm for std::less
comparators and arithmetic types. That specialization was doing more
comparisons but in a more predictable way. On big datasets the performance
increased, but decreased on small datasets.


Then I've tried another approach. If the comparison of __first with __next is
barely predictable, then just avoid branching on it.

Portable solution:

bool __b = __comp(__next, __first);   
_ForwardIterator __pots[3] = {__first, __next, __first};
_ForwardIterator __pot_min = *(__pots + __b);
_ForwardIterator __pot_max = *(__pots + __b + 1);

Special case for random access iterators:

bool __b = __comp(__next, __first);
_ForwardIterator __pot_min = __first, __pot_max = __next;
__pot_min += b;
__pot_max -= b;


Unfortunately both those approaches add some overhead for small datasets.
Another disadvantage, is that those approaches produce orthogonal results on
different compilers:  

GCC-9 performance gets better on big datasets
-
Benchmark  Time   CPU Iterations
-
naive_minmax/2 3 ns  3 ns  247522237
naive_minmax/8 7 ns  7 ns  103044422
naive_minmax/262144  1715635 ns1710406 ns407
naive_minmax/1048576 6970755 ns6947034 ns101

branchless_minmax/28 ns  8 ns   81324904
branchless_minmax/8   30 ns 30 ns   23494608
branchless_minmax/262144  457287 ns 456412 ns   1529
branchless_minmax/10485764267914 ns4219969 ns363



Clang-9 performance degrades on big datasets
-
Benchmark  Time   CPU Iterations
-
naive_minmax/2 2 ns  2 ns  380928404
naive_minmax/8 7 ns  7 ns   92642970
naive_minmax/262144   262921 ns 262288 ns   2630
naive_minmax/1048576 1149407 ns1147626 ns618

branchless_minmax/22 ns  2 ns  307146020
branchless_minmax/8   10 ns 10 ns   74417142
branchless_minmax/262144  425880 ns 425241 ns   1637
branchless_minmax/10485761747785 ns1745725 ns397


Final attempt. Different compilers optimize the algorithm differently. Clang
shows good performance on big datasets with >4k elements, GCC - on medium sized
datasets with 128-1k elements. Maybe providing more info on probabilities could
help both compilers to produce better code. But looks like heuristics already
deduce the probabilities to be close to 0.5,
__builtin_expect_with_probability(__b, true, 0.5) changed nothing in the
assembly https://godbolt.org/z/PqWoaKfhW

[Bug c++/80542] Warn about accidental copying of data in range based for

2021-05-09 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80542

--- Comment #2 from Antony Polukhin  ---
This issue could be closed. GCC 11 has the required -Wrange-loop-construct
warning: https://godbolt.org/z/343M6WMjb

[Bug libstdc++/99612] New: Remove "#pragma GCC system_header" from atomic file to warn on incorrect memory order

2021-03-16 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99612

Bug ID: 99612
   Summary: Remove "#pragma GCC system_header" from atomic file to
warn on incorrect memory order
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

GCC has -Winvalid-memory-model that warns if wrong memory model is used with
atomic

  auto ret = a.load(std::memory_order_release); // warning
  a.store(10, std::memory_order_acquire); // warning

Unfortunately, that warning does not work by default, because  header
has a "#pragma GCC system_header" in it.

The only way to get the warning is to use -Wsystem-headers that unleashes all
the warnings from all system headers.

Playground: https://godbolt.org/z/Wca5ef

[Bug middle-end/98817] Optimize if (a != b) a = b;

2021-01-25 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98817

--- Comment #5 from Antony Polukhin  ---
Please, close as invalid

[Bug middle-end/98817] Optimize if (a != b) a = b;

2021-01-25 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98817

--- Comment #2 from Antony Polukhin  ---
(In reply to Jakub Jelinek from comment #1)
> I'm not sure about this.  Turning it into an unconditional store would mean
> that the memory the reference points to must be writable, that might not be
> always the case.

Fair pint.

How about emitting cmov instead of cmp+je?

[Bug middle-end/98817] New: Optimize if (a != b) a = b;

2021-01-25 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98817

Bug ID: 98817
   Summary: Optimize if (a != b) a = b;
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


void arithmetic(int& result, int value) {
if (result != value) {
result = value;
}
}


GCC generates the following assembly:


arithmetic(int&, int):
  cmp DWORD PTR [rdi], esi
  je .L1
  mov DWORD PTR [rdi], esi
.L1:
  ret


The assembly seems suboptimal, because
1) cmov could be used
2) conditional jump could be totally removed, reducing the binary size and
leaving only one mov instruction:

arithmetic(int&, int):
  mov DWORD PTR [rdi], esi
  ret



Godbolt playground https://godbolt.org/z/Pdz7eP with above sample and
std::vector::clear() sample that would also benefit from the above
optimization.

[Bug c++/98814] New: Add fix-it hints for missing asterisk

2021-01-25 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98814

Bug ID: 98814
   Summary: Add fix-it hints for missing asterisk
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Iterators and pointers are quite common in C++ code but newbies tend to forget
to dereference them:


struct my_vector { void push_back(int); };
struct my_iterator { int operator*(); };

void sample(my_vector& vec, my_iterator it) {
vec.push_back(it);
}


A fix-it hint would be helpful for cases when no matching function found but
the argument has an operator*() that returns a matching type.


More examples via godbolt playground: https://godbolt.org/z/dsrqj8

[Bug c++/98768] New: Improve diagnostics for incorrect result type checking "-> Type" in concepts

2021-01-20 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98768

Bug ID: 98768
   Summary: Improve diagnostics for incorrect result type checking
"-> Type" in concepts
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


template 
concept Callable0Arg = requires(Function func) {
func() -> T;
};


The expression "-> T" is valid only if the "func()" returns pointer to a type
that has member "T". At the same time there is an unused "typename T" in the
concept definition.

For such cases a warning like "Unused `T` in concept definition. Did you mean
`-> std::same_as`"  would be really helpful.

[Bug c++/98767] New: Function signature lost in concept diagnostic message

2021-01-20 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98767

Bug ID: 98767
   Summary: Function signature lost in concept diagnostic message
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the example:


template 
concept Callable1Arg = requires(Function func, T value) {
func(value);
};

// Should fail and fails:
static_assert(Callable1Arg);


The diagnotics has the following line:
"in requirements with 'Function func', 'T value' [with T = bool; Function = int
(*)()]"

However the type of the Function is "int (*)(int*)" not "int (*)()"


Godbolt playground: https://godbolt.org/z/afKqq5

[Bug tree-optimization/78427] missed optimization of loop condition

2020-09-26 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78427

Antony Polukhin  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #4 from Antony Polukhin  ---
Any progress?