[Bug c++/100611] coroutines: destructor called too many times for coroutine lambda stored object

2024-04-04 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100611

--- Comment #9 from Avi Kivity  ---
At least, on 13.2.1. Maybe a backport is required.

[Bug c++/100611] coroutines: destructor called too many times for coroutine lambda stored object

2024-04-04 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100611

--- Comment #8 from Avi Kivity  ---
Congratulations on getting the account!

The bug is fixed though.

[Bug c++/66034] Enhancement request: fiber-local storage

2024-03-17 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66034

--- Comment #3 from Avi Kivity  ---
Yes, this is relevant to user-level threads, not coroutines.

[Bug c/103960] Clang's -Wunknown-attributes is more useful than -Wattributes

2023-05-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103960

--- Comment #6 from Avi Kivity  ---
Ah! thanks.

[Bug c/103960] Clang's -Wunknown-attributes is more useful than -Wattributes

2023-05-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103960

Avi Kivity  changed:

   What|Removed |Added

 CC||avi at scylladb dot com

--- Comment #4 from Avi Kivity  ---
It would be nice to have -fignore-attribute-namespace=clang, so we can ignore
[[clang::very_useful_attribute(blah)]] on gcc and
[[gnu::another_good_thing(foo)]] on clang, instead of having to wrap everything
with macros.

(complicated a little by clang recognizing some attributes in the gnu
namespace).

[Bug c++/109227] coroutines: ICE in tree check: expected record_type or union_type or qual_union_type, have array_type in build_special_member_call, at cp/call.cc:11067

2023-03-21 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109227

--- Comment #1 from Avi Kivity  ---
Did you forget to attach bad.cc?

[Bug c++/98056] coroutines: ICE tree check: expected record_type or union_type or qual_union_type, have array_type since r11-2183-g0f66b8486cea8668

2023-03-15 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056

--- Comment #22 from Avi Kivity  ---
This is wonderful, thank you.

[Bug tree-optimization/109053] New: [missed optimization] value-range tracking fails in simple case with __builtin_unreachable

2023-03-07 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109053

Bug ID: 109053
   Summary: [missed optimization] value-range tracking fails in
simple case with __builtin_unreachable
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

I'm trying to use __builtin_unreachable() to inject assumptions about values of
variables into this code. In this case, the fact that a reference count must be
one or greater.

Consider the code:



struct refcounted {
int* p;
refcounted() : p(new int(1)) {}
~refcounted() {
assume_stuff();
if (!--*p) {
delete p;
}
}
refcounted(const refcounted& x) : p(x.p) {
assume_stuff();
++*p;
assume_stuff();
}
refcounted& operator=(const refcounted& x) {
assume_stuff();
x.assume_stuff();
if (this != ) {
++*x.p;
if (!--*p) {
delete p;
}
p = x.p;
}
assume_stuff();
x.assume_stuff();
return *this;
}

void assume_stuff() const {
if (*p <= 0) {
__builtin_unreachable();
}
}
};

refcounted assign(refcounted& a, refcounted& b) {
auto x = a;
a = b;
return x;
}


In the assign() function, although we assign to `a`, we also return it (as
`x`), so there's never a reason to call operator delete. Yet the code does.

assign(refcounted&, refcounted&):
 mov%rdi,%rax
 mov(%rsi),%rdi
 mov%rdi,(%rax)
 addl   $0x1,(%rdi)

; gcc now knows that (%rdi) is 2 or greater

 cmp%rdx,%rsi
 je 68 
 push   %rbp
 mov%rdx,%rbp
 push   %rbx
 mov%rsi,%rbx
 sub$0x18,%rsp
 mov(%rdx),%rcx
 addl   $0x1,(%rcx)
 mov(%rdi),%edx
 sub$0x1,%edx

; gcc now knows that (%rdi) is 1 or greater

 je 40 


; so how can it be zero? 
; if gcc tracked the ranges correctly, it would have eliminated the branch and
made assign() a leaf function

 mov%edx,(%rdi)
 mov%rcx,(%rbx)
 add$0x18,%rsp
 pop%rbx
 pop%rbp
 ret
 cs nopw 0x0(%rax,%rax,1)
 mov$0x4,%esi
 mov%rax,0x8(%rsp)
 call   4f 
R_X86_64_PLT32 operator delete(void*, unsigned long)-0x4
 mov0x0(%rbp),%rcx
 mov0x8(%rsp),%rax
 mov%rcx,(%rbx)
 add$0x18,%rsp
 pop%rbx
 pop%rbp
 ret
 nopw   0x0(%rax,%rax,1)
 ret


Also on: https://godbolt.org/z/Tnehj86hc

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-12-04 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #17 from Avi Kivity  ---
This was apparently fixed by 58a7b1e354530d8dfe7d8fb859c8b8b5a9140f1f. At
least, I no longer observe copying of __old4 in gimple. Instead, I see
addresses taken, and if I squint I can see that object properly destroyed later
on.

I can't test on my full-scale program due to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056.

I note that one of the duplicates is titled "When constructing object, calling
function and performing co_await in same statement, temporary is erroneously
moved trivially", which is exactly what this PR is about, so it can be safely
marked as a duplicate as well.

Please consider backporting https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576,
this is a subtle code generation bug.

[Bug middle-end/105204] -Wuse-after-free=1 inconsistency with conditional free

2022-12-01 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105204

--- Comment #3 from Avi Kivity  ---
Does one have to declare all preconditions? I happen to know that shared_ptr
with a reference count of zero is impossible, and I can understand the compiler
not being able to infer it, but just because it's unable to infer it doesn't
mean my code is wrong.

Maybe the warning should be renamed -Wmaybe-use-after-free to indicate the
compiler just thinks there may be a bug, but also acknowledges it doesn't know
everything.

[Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path

2022-11-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

--- Comment #5 from Avi Kivity  ---
It indeed generates better code. However, it requires that I duplicate the
function body, which can be hard at times (consider f == std::transform and "if
(*b != 0) { *b = g(*b); }" as a lambda input.

[Bug rtl-optimization/107772] function prologue generated even though it's only needed in an unlikely path

2022-11-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

--- Comment #2 from Avi Kivity  ---
I expect something like this:


f(int*, int*):
cmp rdi, rsi
je  .L10
.L4:
cmp DWORD PTR [rsi], 0
jne .L14
.L3
add rsi, 4
cmp rsi, rdi
jne .L4
.L10
ret

.section .text.cold

.L14:
pushrsi
pushrdi
mov rax, DWORD PTR [rsi]
callg(int)
pop rdi
pop rsi
mov DWORD PTR [rsi], eax
jmp .L3

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-11-21 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #16 from Avi Kivity  ---
Bug hasn't fixed itself as of ccb9c7b129206209cfc315ab1a0432b5f517bdd9.

[Bug rtl-optimization/107772] New: [missed optimization] function prologue generated even though it's only needed in an unlikely path

2022-11-20 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772

Bug ID: 107772
   Summary: [missed optimization] function prologue generated even
though it's only needed in an unlikely path
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

Consider


int g(int);

void f(int* b, int* e) {
while (b != e) {
if (__builtin_expect(*b != 0, false)) [[unlikely]] {
*b = g(*b);
}
++b;
}
}

If we believe the __builtin_expect and/or unlikely annotations (had both for
extra safety), the loop usually does nothing. So we would expect any register
saving and restoring to be pushed to the unlikely section. Yet (-O3):

f(int*, int*):
cmp rdi, rsi
je  .L10
pushrbp
mov rbp, rsi
pushrbx
mov rbx, rdi
sub rsp, 8
.L4:
mov edi, DWORD PTR [rbx]
testedi, edi
jne .L14
.L3:
add rbx, 4
cmp rbp, rbx
jne .L4
add rsp, 8
pop rbx
pop rbp
ret
.L14:
callg(int)
mov DWORD PTR [rbx], eax
jmp .L3
.L10:
ret


I count 8 instructions that could/should have been pushed to .L14.

[Bug c++/106847] deprecated class data member makes the class generate diagnostics even when the member is not used

2022-09-08 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106847

--- Comment #3 from Avi Kivity  ---
But I need it to be copied. It's a struct of options:


struct options {
 bool frob = false;
 [[derecated]] bool quux = false; // will be removed soon
 bool grue = false;
 };


The fact that it's deprecated doesn't mean it can be coerced to false.

Sure, I can create custom constructors and assignment operators, but I lose
designated initializers.

btw, the destructor doesn't trigger the deprecation warning, even though it
does use the deprecated member.


I think it's more reasonable to only emit the warning when the member is
explicitly used.

[Bug c++/106847] deprecated class data member makes the class generate diagnostics even when the member is not used

2022-09-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106847

--- Comment #1 from Avi Kivity  ---
Clang equivalent: https://github.com/llvm/llvm-project/issues/55774

[Bug c++/106847] New: deprecated class data member makes the class generate diagnostics even when the member is not used

2022-09-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106847

Bug ID: 106847
   Summary: deprecated class data member makes the class generate
diagnostics even when the member is not used
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

A class with a deprecated member:

struct foo {
[[deprecated]] bool x;
};

Cannot be used without lots of noise from the compiler:

foo make_a_foo() {
foo f;
return f;
}

: In function 'foo make_a_foo()':
:10:12: warning: 'f' is used uninitialized [-Wuninitialized]
   10 | return f;
  |^
:9:9: note: 'f' declared here
9 | foo f;
  | ^
Compiler returned: 0


Although I didn't touch x, the compiler thinks I did. IMO synthesized
constructors (and assignment operators) should suppress the warning and only
explicit access by the user should trigger the warning.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-05-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #15 from Avi Kivity  ---
I see that both sides of the assignment are created in flatten_await_stmt, case
  case TARGET_EXPR.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-05-01 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #14 from Avi Kivity  ---
Do you confirm that my observation about the generated gimple corresponds to
the bug?

If so, is there a central point where this anonymous variables are named?
Perhaps we can breakpoint on D.2741 and see where the copy is generated.

[Bug c++/63164] unnecessary calls to __dynamic_cast

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63164

--- Comment #6 from Avi Kivity  ---
I already owe you many beers, but this one is special.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #11 from Avi Kivity  ---
Created attachment 52899
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52899=edit
simple(r) reproducer

I was able to reduce it to something simple (102 lines), but I was only able to
verify that wrong code is generated on the gimple level (where my understanding
is very limited). Still, to my inexperienced eyes it looks like an object with
a copy constructor is being copied bitwise.

Compile with (gcc 11.3.1)

g++ -Wfatal-errors -w -std=gnu++20  -march=westmere -fvisibility=hidden  -c
-o table.o table.i -fdump-tree-gimple -g 

and observe in table.i.006t.gimple:

struct lw_shared_ptr D.2740 [value-expr: frame_ptr->D.2740_4_5];

and then

frame_ptr->D.2741_4_5.__old4 = frame_ptr->D.2740_4_5;


in other places, variables of the same type are copied using the copy
constructor.

[Bug analyzer/105287] [12/13 Regression] ICE in analyzer get_region_for_local on C++ await cond_var

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105287

Avi Kivity  changed:

   What|Removed |Added

 CC||avi at scylladb dot com

--- Comment #10 from Avi Kivity  ---
This causes a wrong-code regression, PR105426.

[Bug c++/105426] [wrong-code][regression][coroutines] range-for temporaries are not persisted in coroutines

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105426

--- Comment #1 from Avi Kivity  ---
Created attachment 52898
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52898=edit
reproducer

[Bug c++/105426] New: [wrong-code][regression][coroutines] range-for temporaries are not persisted in coroutines

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105426

Bug ID: 105426
   Summary: [wrong-code][regression][coroutines] range-for
temporaries are not persisted in coroutines
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

The attached reproducer, compiled with:


g++ --std=c++20 coroutine-unpersisted-range-for-temporary.cc -g -O3


generates correct results for gcc before 15a176a833f and for clang, but
incorrect results in 15a176a833f and later.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #10 from Avi Kivity  ---
I have something like this:

│ 2680  future<> system_keyspace_make(distributed&
dist_db, distributed& dist_ss,
sharded& dist_gossiper, db::config& cfg) {   │
...
   
│
│ 2687  for (auto&& table : system_keyspace::all_tables(db_config)) {   

...

│ 2695  co_await db.create_keyspace(ksm,
dist_ss.local().get_erm_factory(), true,
replica::database::system_keyspace::yes);   


And I see a crash as if the std::vector returned by all_tables() is not
persisted. This worked in previous gcc versions and works in clang.

I don't now what the standard says, but it really should persist temporaries in
range for that contains a suspension point, or a lot of buggy code will be
written.

[Bug c++/63164] unnecessary calls to __dynamic_cast

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63164

--- Comment #4 from Avi Kivity  ---
Oops, last comment intended for a separate bug. Curse the bug list thing.

[Bug c++/63164] unnecessary calls to __dynamic_cast

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63164

--- Comment #3 from Avi Kivity  ---
I have something like this:

│ 2680  future<> system_keyspace_make(distributed&
dist_db, distributed& dist_ss,
sharded& dist_gossiper, db::config& cfg) {   │
│ 2681  register_virtual_tables(dist_db, dist_ss, dist_gossiper, cfg);  
...
   
│
│ 2687  for (auto&& table : system_keyspace::all_tables(db_config)) {   

...

│ 2695  co_await db.create_keyspace(ksm,
dist_ss.local().get_erm_factory(), true,
replica::database::system_keyspace::yes);   


And I see a crash as if the std::vector returned by all_tables() is not
persisted. This worked in previous gcc versions and works in clang.

I don't now what the standard says, but it really should persist temporaries in
range for that contains a suspension point, or a lot of buggy code will be
written.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #9 from Avi Kivity  ---
I think it makes it worse, now seeing similar crashes, but earlier.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-28 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #8 from Avi Kivity  ---
Hoping this is a duplicate of PR105287. Checking.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-26 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #7 from Avi Kivity  ---
To give more context:

struct ._anon_973 D.2159793 [value-expr: frame_ptr->D.2159793_4_7];
struct lw_shared_ptr D.2159792 [value-expr: frame_ptr->D.2159792_4_7];

(I determined that _anon_973 is the non-coroutine-lambda that is giving me
grief)

frame_ptr->D.2159793_4_7.__this = _51;
frame_ptr->D.2159793_4_7.__old4 =
frame_ptr->D.2159792_4_7;
_52 = frame_ptr->__closure;
_53 = _52->__newtabs;
frame_ptr->D.2159793_4_7.__newtabs = _53;


Here I suspect copying __old4 is bypassing the copy constructor. The source of
the copy (D.2159792) has lw_shared_ptr type.

I'm not fluent in gimple so maybe I'm misinterpreting it.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-26 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #6 from Avi Kivity  ---
Some more findings: if I replace the lambda (which is not a coroutine, but is
contained in a coroutine lambda) with an equivalent struct, the problem goes
away, both at runtime and in terms of an __old4 copy. 
+struct inner_lambda {
+table* zis;
+lw_shared_ptr old4;
+std::vector& newtabs;
+inner_lambda(table* zis, lw_shared_ptr& old3,
std::vector& newtabs)
+: zis(zis), old4(old3), newtabs(newtabs) {}
+future<> operator()() {
+tlogger.info("updating cache {}",
fmt::ptr(old4.get()));
+return zis->update_cache(old4, newtabs);
+}
+};
+
 tlogger.info("before updating cache {}",
fmt::ptr(old3.get()));
-co_await
with_scheduling_group(_config.memtable_to_cache_scheduling_group, [this, old4 =
old3, ] () -> future<> {
+co_await
with_scheduling_group(_config.memtable_to_cache_scheduling_group, /* [this,
old4 = old3, ] () mutable -> future<> {
 tlogger.info("updating cache {}", fmt::ptr(old4.get()));
 return update_cache(old4, newtabs);
-});
+} */ inner_lambda(this, old3, newtabs));
 tlogger.info("updating cache {} done", fmt::ptr(old3.get()));
 _memtables->erase(old3);
 tlogger.debug("Memtable for {}.{} replaced, into {} sstables",
old3->schema()->ks_name(), old3->schema()->cf_name(), newtabs.size());
 tlogger.info("try_flush_memtable_to_sstable post_flush:end old
{} refcnt {}", fmt::ptr(old3.get()), old3.use_count());
 co_return stop_iteration::yes;

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-26 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #5 from Avi Kivity  ---
Can you confirm my suspicion about the copying of __old4 in gimple? If that's
the culprit I believe I can mechanically reduce the reproducer to something
easily manageable.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #3 from Avi Kivity  ---
I see this alleged copy in gcc 10.2.1 too.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #2 from Avi Kivity  ---
I see it in 006t.gimple too:

try
  {
_50 = frame_ptr->__closure;
_51 = _50->__this;
frame_ptr->D.2159793_4_7.__this = _51;
frame_ptr->D.2159793_4_7.__old4 =
frame_ptr->D.2159792_4_7;
_52 = frame_ptr->__closure;
_53 = _52->__newtabs;
frame_ptr->D.2159793_4_7.__newtabs = _53;


newtabs is captured by reference, so the copy from _53 is reasonable. But old4
is captured by value, and so should go through the copy constructor.

Looking at references to old3, it always has its address taken and then passed
to the copy constructor.

[Bug c++/105373] miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

--- Comment #1 from Avi Kivity  ---
I randomly looked at 023t.ssa (mainly because I recognized the acronym).

  _45 = frame_ptr_182(D)->__closure;
  _46 = _45->__this;
  frame_ptr_182(D)->D.2159984_4_7 =
_46->_config.memtable_to_cache_scheduling_group;
  _47 = _ptr_182(D)->D.2159792_4_7;
  _48 = frame_ptr_182(D)->__closure;
  _49 = &_48->__old3;
  seastar::lw_shared_ptr::lw_shared_ptr (_47, _49);
  _50 = frame_ptr_182(D)->__closure;
  _51 = _50->__this;
  frame_ptr_182(D)->D.2159793_4_7.__this = _51;
  frame_ptr_182(D)->D.2159793_4_7.__old4 = frame_ptr_182(D)->D.2159792_4_7;
  _52 = frame_ptr_182(D)->__closure;
  _53 = _52->__newtabs;
  frame_ptr_182(D)->D.2159793_4_7.__newtabs = _53;
  _54 = _ptr_182(D)->D.2159793_4_7;
  frame_ptr_182(D)->D.2160366_4_7 =
seastar::with_scheduling_group,
sstable_write_permit&&)::_ZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permit.Frame*)operator()(replica::table::try_flush_memtable_to_sstable(replica::table::try_flush_memtable_to_sstable(seastar::lw_shared_ptr,
sstable_write_permit&&)::_ZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permit.Frame*)::
mutable::_ZZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permitENUlvE_clEv.Frame*)operator()(replica::table::try_flush_memtable_to_sstable(replica::table::try_flush_memtable_to_sstable(seastar::lw_shared_ptr,
sstable_write_permit&&)::_ZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permit.Frame*)operator()(replica::table::try_flush_memtable_to_sstable(replica::table::try_flush_memtable_to_sstable(seastar::lw_shared_ptr,
sstable_write_permit&&)::_ZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permit.Frame*)::
mutable::_ZZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permitENUlvE_clEv.Frame*)::
mutable::_ZZZN7replica5table29try_flush_memtable_to_sstableEN7seastar13lw_shared_ptrINS_8memtableEEEO20sstable_write_permitENUlvE_clEvENUlvE1_clEv.Frame*)::
> (frame_ptr_182(D)->D.2159984_4_7, _54); [return slot optimization]
  _55 = _ptr_182(D)->D.2160366_4_7;
  frame_ptr_182(D)->Aw5_4_7 = seastar::operator co_await (_55); [return
slot optimization]


In the line

  frame_ptr_182(D)->D.2159793_4_7.__old4 = frame_ptr_182(D)->D.2159792_4_7;


Is this not a bitwise copy, which should be instead a call to the copy
constructor?

[Bug c++/105373] New: miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor

2022-04-25 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105373

Bug ID: 105373
   Summary: miscompile involving lambda coroutines and an object
bitwise copied instead of via the copy constructor
   Product: gcc
   Version: 11.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

This is a bug in a complex piece of code, so I'll need guidance on what further
information to provide (e.g. intermediate code dumps). It reproduces with
various levels of debug information, and the code works with clang.

At the heart there is a shared pointer, which I've enhanced so that all
pointers that point to the same object keep track of each other (also the
pointee points back to one of the pointers). The pointee keeps an incrementing
generation count. In the end I have two pointer objects that are bitwise equal,
even though they should have different generation counts and different
doubly-linked-list pointers. This proves that a bitwise copy happened. It
doesn't prove a miscompile, since my code could have decided to perform a
bitwise copy, but the fact that it works in clang indicates it's a gcc bug. The
bug happens with asan too, and with different sizeof(the smart pointer) so it's
not some stray write.


This is the snippet that causes the trouble:

0tlogger.info("before updating cache {}",
fmt::ptr(old3.get()));
1co_await
with_scheduling_group(_config.memtable_to_cache_scheduling_group, [this, old4 =
old3, ] () -> future<> {
2tlogger.info("updating cache {}", fmt::ptr(old4.get()));
3return update_cache(old4, newtabs);
4});


old3 and old4 are all copies of the same smart pointer (there are also old and
old1 and old2 elsewhere, but they are correct). In the call to update_cache(),
we attempt to make a copy of old4, and an internal check finds the link list is
corrupted. Inspecting old4 in the debugger (from the printout in line 0) and
the source of the copy in line 3 shows they are the same, but have different
addresses:


0 INFO  2022-04-25 12:04:14,722 [shard 0] table - before updating cache
0x60666c40
1 copying @0x6520bef8 <- @0x652108f0 0x60666c40 refcnt 6 gen 43
2 INFO  2022-04-25 12:04:14,722 [shard 0] table - updating cache 0x60666c40
3 (gdb) p this
$1 = (const seastar::lw_shared_ptr * const) 0x6520bed0
4 (gdb) p *this
5 $2 = {_ptr = 0x60666c40, _next = 0x652108f0, _prev = 0x6597ae48,
_generation = 43}
6 (gdb) x/4gx 0x6520bef8
7 0x6520bef8:   0x60666c40  0x652108f0
8 0x6520bf08:   0x6597ae48  0x002b
9 (gdb) x/4gx this
10 0x6520bed0:  0x60666c40  0x652108f0
11 0x6520bee0:  0x6597ae48  0x002b

in line 3 I dump old4 (from the printout in line 1, where old3 is copied into
old4). But "this" doesn't point to old4., it points to a bitwise copy of old4,
as shown in lines 7-8 and 10-11.


Note that both old4 and the bad copy "this" are members of the coroutine frame.


I realize this isn't enough to analyze the situation, I'm happy to provide more
information if you direct me how.

[Bug sanitizer/105336] truncated address sanitizer stack traces

2022-04-24 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

--- Comment #6 from Avi Kivity  ---
(the reproducer was executed by gcc 12 prerelease, not gcc 11)

[Bug sanitizer/105336] truncated address sanitizer stack traces

2022-04-24 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

--- Comment #5 from Avi Kivity  ---
I reduced it to a few lines (attached, intentionally triggers use-after-free).
The culprit is -Og.

With

   g++ coroutine-asan.cc -o coroutine-asan --std=c++20 -fsanitize=address -Og

I see

READ of size 8 at 0x60700020 thread T0
#0 0x4018e2 in test() (/home/avi/tests/coroutine-asan+0x4018e2)
#1 0x40192b in main (/home/avi/tests/coroutine-asan+0x40192b)
#2 0x7fb2e0a1d58f in __libc_start_call_main (/lib64/libc.so.6+0x2d58f)
#3 0x7fb2e0a1d648 in __libc_start_main_alias_1 (/lib64/libc.so.6+0x2d648)
#4 0x401144 in _start (/home/avi/tests/coroutine-asan+0x401144)

0x60700020 is located 0 bytes inside of 80-byte region
[0x60700020,0x60700070)
freed by thread T0 here:
#0 0x7fb2e0fdebc8 in operator delete(void*) (/lib64/libasan.so.8+0xbbbc8)
#1 0x40164f in
test_coroutine(test_coroutine(std::__n4861::coroutine_handle&)::_Z14test_coroutineRNSt7__n486116coroutine_handleIvEE.Frame*)
[clone .actor] (/home/avi/tests/coroutine-asan+0x40164f)
#2 0x40200f  (/home/avi/tests/coroutine-asan+0x40200f)

previously allocated by thread T0 here:
#0 0x7fb2e0fde188 in operator new(unsigned long)
(/lib64/libasan.so.8+0xbb188)
#1 0x4016c0 in test_coroutine(std::__n4861::coroutine_handle&)
(/home/avi/tests/coroutine-asan+0x4016c0)


The stack traces are truncated, compared to

g++ coroutine-asan.cc -o coroutine-asan --std=c++20 -fsanitize=address

which yields

READ of size 8 at 0x60700020 thread T0
#0 0x401cef in std::__n4861::coroutine_handle::resume() const
(/home/avi/tests/coroutine-asan+0x401cef)
#1 0x401b0f in test() (/home/avi/tests/coroutine-asan+0x401b0f)
#2 0x401b85 in main (/home/avi/tests/coroutine-asan+0x401b85)
#3 0x7f79b8be958f in __libc_start_call_main (/lib64/libc.so.6+0x2d58f)
#4 0x7f79b8be9648 in __libc_start_main_alias_1 (/lib64/libc.so.6+0x2d648)
#5 0x4011a4 in _start (/home/avi/tests/coroutine-asan+0x4011a4)

0x60700020 is located 0 bytes inside of 80-byte region
[0x60700020,0x60700070)
freed by thread T0 here:
#0 0x7f79b91aabc8 in operator delete(void*) (/lib64/libasan.so.8+0xbbbc8)
#1 0x4019a5 in
test_coroutine(test_coroutine(std::__n4861::coroutine_handle&)::_Z14test_coroutineRNSt7__n486116coroutine_handleIvEE.Frame*)
[clone .actor] (/home/avi/tests/coroutine-asan+0x4019a5)
#2 0x401cf7 in std::__n4861::coroutine_handle::resume() const
(/home/avi/tests/coroutine-asan+0x401cf7)
#3 0x401b0f in test() (/home/avi/tests/coroutine-asan+0x401b0f)
#4 0x401b85 in main (/home/avi/tests/coroutine-asan+0x401b85)
#5 0x7f79b8be958f in __libc_start_call_main (/lib64/libc.so.6+0x2d58f)

previously allocated by thread T0 here:
#0 0x7f79b91aa188 in operator new(unsigned long)
(/lib64/libasan.so.8+0xbb188)
#1 0x401293 in test_coroutine(std::__n4861::coroutine_handle&)
(/home/avi/tests/coroutine-asan+0x401293)
#2 0x401af9 in test() (/home/avi/tests/coroutine-asan+0x401af9)
#3 0x401b85 in main (/home/avi/tests/coroutine-asan+0x401b85)
#4 0x7f79b8be958f in __libc_start_call_main (/lib64/libc.so.6+0x2d58f)


The traces go all the away to main.

[Bug sanitizer/105336] truncated address sanitizer stack traces

2022-04-24 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

--- Comment #4 from Avi Kivity  ---
Created attachment 52856
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52856=edit
intentionally buggy reproducer

[Bug sanitizer/105336] truncated address sanitizer stack traces

2022-04-23 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

--- Comment #3 from Avi Kivity  ---
I have a multi-gigabyte reproducer. Unfortunately it's part of a huge program
that didn't build with gcc until very recently. It will be quite a task to
reduce it.

[Bug sanitizer/105336] truncated address sanitizer stack traces

2022-04-21 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

--- Comment #1 from Avi Kivity  ---
I guess I should mention the programs uses C++20 coroutines, and it could be
the case that debug information for coroutines is generated incorrectly.

[Bug sanitizer/105336] New: truncated address sanitizer stack traces

2022-04-21 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

Bug ID: 105336
   Summary: truncated address sanitizer stack traces
   Product: gcc
   Version: 11.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

Trying to debug a program with gcc 11 branch
(d26c3e4f733fcb07d90680491dd1d7a9d08c4705), I get truncated asan stack traces:

 
seastar::internal::repeater(flush_permit&)
const::{lambda()#1}>
=
==313819==ERROR: AddressSanitizer: heap-use-after-free on address
0x6143f848 at pc 0x040627a3 bp 0x7fff62f15fb0 sp 0x7fff62f15fa8
READ of size 8 at 0x6143f848 thread T0
#0 0x40627a2 in seastar::debug_shared_ptr_counter_type::check() const
seastar/include/seastar/core/shared_ptr_debug_helper.hh:63
#1 0x505eab6 in seastar::debug_shared_ptr_counter_type::operator long()
const seastar/include/seastar/core/shared_ptr_debug_helper.hh:40
#2 0x505eab6 in seastar::lw_shared_ptr::use_count()
const seastar/include/seastar/core/shared_ptr.hh:356
#3 0x505eab6 in operator() replica/table.cc:620
#4 0x5061947 in
invoke)>&,
seastar::future > seastar/include/seastar/core/future.hh:2141
#5 0x5061947 in operator() seastar/include/seastar/core/future.hh:1658
#6 0x5061947 in call
seastar/include/seastar/util/noncopyable_function.hh:153
#7 0x45d1383 in seastar::noncopyable_function
(seastar::future&&)>::operator()(seastar::future&&) const
seastar/include/seastar/util/noncopyable_function.hh:209
#8 0x45d1383 in
seastar::future::then_wrapped_nrvo,
seastar::noncopyable_function (seastar::future&&)>
>(seastar::noncopyable_function
(seastar::future&&)>&&)::{lambda(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)#1}::operator()(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)
const::{lambda()#1}::operator()() const
seastar/include/seastar/core/future.hh:1674
#9 0x45d1383 in void seastar::futurize
>::satisfy_with_result_of::then_wrapped_nrvo,
seastar::noncopyable_function (seastar::future&&)>
>(seastar::noncopyable_function
(seastar::future&&)>&&)::{lambda(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)#1}::operator()(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)
const::{lambda()#1}>(seastar::internal::promise_base_with_type&&,
seastar::future::then_wrapped_nrvo,
seastar::noncopyable_function (seastar::future&&)>
>(seastar::noncopyable_function
(seastar::future&&)>&&)::{lambda(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)#1}::operator()(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&) const::{lambda()#1}&&)
seastar/include/seastar/core/future.hh:2126
#10 0x45d2191 in
seastar::future::then_wrapped_nrvo,
seastar::noncopyable_function (seastar::future&&)>
>(seastar::noncopyable_function
(seastar::future&&)>&&)::{lambda(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)#1}::operator()(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&) const
seastar/include/seastar/core/future.hh:1673
#11 0x45d2191 in
seastar::continuation,
seastar::noncopyable_function (seastar::future&&)>,
seastar::future::then_wrapped_nrvo,
seastar::noncopyable_function (seastar::future&&)>
>(seastar::noncopyable_function
(seastar::future&&)>&&)::{lambda(seastar::internal::promise_base_with_type&&,
seastar::noncopyable_function
(seastar::future&&)>&,
seastar::future_state&&)#1},
void>::run_and_dispose() seastar/include/seastar/core/future.hh:773
#12 0x17fc8b74 in
seastar::reactor::run_tasks(seastar::reactor::task_queue&)
seastar/src/core/reactor.cc:2344
#13 0x17fcd0ec in seastar::reactor::run_some_tasks()
seastar/src/core/reactor.cc:2754
#14 0x17fd2b00 in seastar::reactor::do_run()
seastar/src/core/reactor.cc:2923
#15 0x17fceba8 in seastar::reactor::run() seastar/src/core/reactor.cc:2806
#16 0x17d0a3e0 in seastar::app_template::run_deprecated(int, char**,
std::function&&) seastar/src/core/app-template.cc:265
#17 0x17d07eb0 in seastar::app_template::run(int, char**,
std::function ()>&&) seastar/src/core/app-template.cc:156
#18 0x3d67f67 in scylla_main 

[Bug c++/104020] [coroutines] ICE in co_await function call with initializer_list arguments

2022-04-17 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104020

Avi Kivity  changed:

   What|Removed |Added

 CC||avi at scylladb dot com

--- Comment #1 from Avi Kivity  ---
Duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056

[Bug c++/103328] [11 Regression] ICE in remap_gimple_stmt with coroutines since r11-7419-g0f161cc8494cf728

2022-04-07 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103328

--- Comment #27 from Avi Kivity  ---
Wonderful, thank you.

[Bug c++/103328] [11/12 Regression] ICE in remap_gimple_stmt with coroutines since r11-7419-g0f161cc8494cf728

2022-04-04 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103328

--- Comment #24 from Avi Kivity  ---
Many thanks for the patch and the commit. Can we have it backported to 11.x?

[Bug sanitizer/95137] Sanitizers seem to be missing support for coroutines

2022-03-24 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95137

--- Comment #50 from Avi Kivity  ---
Your reproducer does pass in trunk. So perhaps just a missing backport.

[Bug c++/103328] [11/12 Regression] ICE in remap_gimple_stmt with coroutines since r11-7419-g0f161cc8494cf728

2022-03-22 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103328

--- Comment #22 from Avi Kivity  ---
Maintainers, please review the patch posted in [1].


[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591907.html

[Bug c++/103328] [11/12 Regression] ICE in remap_gimple_stmt, at tree-inline.c:1921 since r11-7419-g0f161cc8494cf728

2022-03-15 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103328

--- Comment #21 from Avi Kivity  ---
Benno, many thanks for the fix. Please consider posting it to gcc-patches as
that may speed up review and merging.

[Bug rtl-optimization/104484] -freorder-block-and-partition not splitting into sections with __builin_expect()

2022-02-12 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104484

--- Comment #4 from Avi Kivity  ---
gcc could infer that the check for f2 and the call to very_heavy are cold, and
move that little block to a .cold section, whether or not very_heavy is cold.

[Bug rtl-optimization/104484] -freorder-block-and-partition not splitting into sections with __builin_expect()

2022-02-10 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104484

--- Comment #2 from Avi Kivity  ---
(In reply to Andrew Pinski from comment #1)
> But the assembly output does not make sense to the code you gave either.


Apart from the missing .section directives, I didn't notice anything odd. Maybe
movl/testl could be optimized into cmpl.

Ah, I moved this code from C++ so the function signatures are bad, should be
fun(void). With the signatures adjusted, I get

light:
.LFB2:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
callfun
movlf1(%rip), %edx
testl   %edx, %edx
jne .L8
.L6:
addq$8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
jmp fun
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
movlf2(%rip), %eax
testl   %eax, %eax
je  .L6
callvery_heavy
jmp .L6
.cfi_endproc


Which still misses the .section directives and the peephole optimization to
merge movl/testl.

[Bug tree-optimization/104484] New: -freorder-block-and-partition not splitting into sections with __builin_expect()

2022-02-10 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104484

Bug ID: 104484
   Summary: -freorder-block-and-partition not splitting into
sections with __builin_expect()
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

I expected code guarded by __builtin_expect() to be pushed into .text.cold, but
that's not happening:

int f1, f2;

inline int is() {
return __builtin_expect(f1, 0) && f2;
}

void heavy() {
extern void very_heavy();
if (is()) {
very_heavy();
}
}


void fun();

void light() {
fun();
heavy();
fun();
}


with -O3 -freorder-blocks-and-partition:


light:
.LFB2:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
xorl%eax, %eax
callfun
movlf1(%rip), %edx
testl   %edx, %edx
jne .L8.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
movlf2(%rip), %eax
testl   %eax, %eax
je  .L6
xorl%eax, %eax
callvery_heavy
jmp .L6
.cfi_endproc

.L6:
xorl%eax, %eax
addq$8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
jmp fun


I expected .L8 to be in another section, but it isn't.

[Bug tree-optimization/104292] [missed optimization] boolean addition generates suboptimal code

2022-01-30 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104292

--- Comment #1 from Avi Kivity  ---
btw, I see that the equivalent bool_and generates optimal code.

bool_and(bool, bool):
movl%esi, %eax
andl%edi, %eax
ret

Perhaps bool is written with the expectation that any non-zero value is true?
But aren't non-zero values other than 1 undefined behavior?

[Bug tree-optimization/104292] New: [missed optimization] boolean addition generates suboptimal code

2022-01-30 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104292

Bug ID: 104292
   Summary: [missed optimization] boolean addition generates
suboptimal code
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

bool bool_or(bool b1, bool b2) {
return b1 + b2;
}

generates

bool_or(bool, bool):
movzbl  %dil, %edi
movzbl  %sil, %esi
addl%esi, %edi
setne   %al
ret

Whereas it could generate

bool_or(bool, bool):
or   %edi, %esi
mov  %esi, %eax
ret

For a net gain of two instructions (and the final mov can often be elided, so
up to three).

I encountered this while using std::plus<> with boolean types. It could be
optimized as a specialization of std::plus, but I think a magic transformation
in the optimizer would make it apply in more places.

[Bug target/103554] -mavx generates worse code on scalar code

2021-12-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #7 from Avi Kivity  ---
Sorry, I was unclear. That's the entire program with a huge number of slp
opportunities. Each iteration in the program is ~40k instructions. It's not
directly related to the test case, which is artificial, and arose from me
exploring the differences between gcc and clang code generation.

It does demonstrate that clang's slp is counterproductive. I promise to repeat
the experiment with gcc once it stops ICEing.

[Bug target/103554] -mavx generates worse code on scalar code

2021-12-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #5 from Avi Kivity  ---
Here's some big-picture data. Compiled with clang, which seems to ignore these
STLF issues.

no-slp:

42641.91 tps ( 75.1 allocs/op,  12.1 tasks/op,   44929 insns/op)
42446.41 tps ( 75.1 allocs/op,  12.1 tasks/op,   44870 insns/op)
42495.03 tps ( 75.1 allocs/op,  12.1 tasks/op,   44931 insns/op)
42703.40 tps ( 75.1 allocs/op,  12.1 tasks/op,   44916 insns/op)
42798.98 tps ( 75.1 allocs/op,  12.1 tasks/op,   44963 insns/op)

slp:

41536.46 tps ( 75.1 allocs/op,  12.1 tasks/op,   44828 insns/op)
41482.05 tps ( 75.1 allocs/op,  12.1 tasks/op,   44802 insns/op)
41707.23 tps ( 75.1 allocs/op,  12.1 tasks/op,   44874 insns/op)
41811.10 tps ( 75.1 allocs/op,  12.1 tasks/op,   44847 insns/op)
41764.39 tps ( 75.1 allocs/op,  12.1 tasks/op,   44846 insns/op)

So slp definitely has negative impact on ops/sec, even though it reduces
instructions/op. This is on an older machine (newer ones have ~5X perf, with 3X
higher IPC and the rest due to higher frequency).

[Bug target/103554] -mavx generates worse code on scalar code

2021-12-06 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #3 from Avi Kivity  ---
> _Note_ that likely the suboptimal solution presented here is faster because
> it avoids STLF penalties from the calls stack setup which very likely uses
> scalar or differently aligned vector moves.

Interesting point. Agner says (Icelake):

> A read that is bigger than the write, or a read that covers both written and 
> unwritten bytes,
> fails to forward. The write-to-read latency is 19-20 clock cycles.

However, the same code is generated when `in` is a reference, in which case it
may not be in the store queue at all, so we're paying two extra instructions
for nothing. movhps is also 2 uops, so we're paying 3 uops to load 2 elements.

[Bug target/103554] New: -mavx generates worse code on scalar code

2021-12-04 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

Bug ID: 103554
   Summary: -mavx generates worse code on scalar code
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avi at scylladb dot com
  Target Milestone: ---

Test case:

struct s1 {
long a, b, c, d, e, f, g, h;
};

s1 move(s1 in) {
s1 ret;

ret.a = in.d;
ret.b = in.e;
ret.c = in.a;
ret.d = in.b;
return ret;
}


-O3 generates:

move(s1):
  movq 8(%rsp), %xmm0
  movq 32(%rsp), %xmm1
  movq %rdi, %rax
  movhps 16(%rsp), %xmm0
  movhps 40(%rsp), %xmm1
  movups %xmm1, (%rdi)
  movups %xmm0, 16(%rdi)
  ret


-O3 -mavx generates:

move(s1):
pushq   %rbp
movq%rdi, %rax
movq%rsp, %rbp
vmovq   16(%rbp), %xmm2
vmovq   40(%rbp), %xmm3
vpinsrq $1, 24(%rbp), %xmm2, %xmm1
vpinsrq $1, 48(%rbp), %xmm3, %xmm0
vinsertf128 $0x1, %xmm1, %ymm0, %ymm0
vmovdqu %ymm0, (%rdi)
vzeroupper
popq%rbp
ret

Clang -O3 generates this simple code, with or without -mavx (-mavx does use VEX
instructions):

move(s1): # @move(s1)
  movq %rdi, %rax
  movups 32(%rsp), %xmm0
  movups %xmm0, (%rdi)
  movaps 8(%rsp), %xmm0
  movups %xmm0, 16(%rdi)
  retq

[Bug c++/97848] [missed optimization] tls init function check emitted for consinit thread_local variables (C++20)

2021-11-29 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97848

Avi Kivity  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Avi Kivity  ---
Fixed on trunk (12.0)

foo_good():
  movq x@gottpoff(%rip), %rax
  movl %fs:(%rax), %eax
  ret
set_foo(int):
  movq x@gottpoff(%rip), %rax
  movl %edi, %fs:(%rax)
  ret

[Bug tree-optimization/85516] [missed-optimization] gcc does not convert multiple compares against constants to a shift+bitmask test

2021-11-29 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85516

Avi Kivity  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Avi Kivity  ---
Marking as fixed, since at least 11.2.

[Bug tree-optimization/85516] [missed-optimization] gcc does not convert multiple compares against constants to a shift+bitmask test

2021-11-29 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85516

--- Comment #6 from Avi Kivity  ---
gcc 11.2 produces optimized code with the original example:

check_mask(E):
  cmpl $9, %edi
  ja .L3
  movl %edi, %ecx
  movl $562, %eax
  shrq %cl, %rax
  andl $1, %eax
  ret
.L3:
  xorl %eax, %eax
  ret

[Bug c++/103319] [coroutines] ICE in is_this_parameter, at cp/semantics.c:10672

2021-11-29 Thread avi at scylladb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103319

--- Comment #2 from Avi Kivity  ---
A gentle ping, can we backport this?