[Bug tree-optimization/59429] Missed optimization opportunity in qsort-style comparison functions

2024-03-22 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59429

--- Comment #16 from Mathias Stearn  ---
Trunk still generates different code for all cases (in some cases subtly so)
for both aarch64 and x86_64: https://www.godbolt.org/z/1s6sxrMWq. For both
arches, it seems like LE and LG generate the best code.

On aarch64, they probably all have the same throughput, but EL and EG have a
size penalty with an extra instruction.

On x86_64, there is much more variety. EL and EG both get end up with a branch
rather than being branchless, which is probably bad since comparison functions
are often called in ways that the result branches are unpredictable. GE and GL
appear to have regressed since this ticket was created. They now do the
comparison twice rather than reusing the flags from the first comparison:

comGL(int, int):
xor eax, eax
cmp edi, esi
mov edx, 1
setlal
neg eax
cmp edi, esi
cmovg   eax, edx
ret
comGE(int, int):
xor eax, eax
cmp edi, esi
mov edx, 1
setne   al
neg eax
cmp edi, esi
cmovg   eax, edx
ret

[Bug target/113723] -freorder-blocks-and-partition emits bogus asm directives on aarch64

2024-02-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113723

--- Comment #1 from Mathias Stearn  ---
As a workaround sticking [[gnu::optimize("no-reorder-blocks-and-partition")]]
 on each function that triggers this seems to work. But that is obviously not a
scalable solution.

[Bug target/113723] New: -freorder-blocks-and-partition emits bogus asm directives on aarch64

2024-02-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113723

Bug ID: 113723
   Summary: -freorder-blocks-and-partition emits bogus asm
directives on aarch64
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

It tries to subtract labels across the function split which the assembler
rejects. At the very least it does this when generating a switch jump table,
but there may be other cases.

https://www.godbolt.org/z/Ynh1zzY4a

Repro with -O3 -freorder-blocks-and-partition:


[[gnu::cold]]void cold();

template void f();
#define GENERATE_CASE case __COUNTER__: f<__COUNTER__>(); break 

void test(int i) {
switch(i) {
case 1: cold(); break;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
GENERATE_CASE;
}
}

test(int):
cmp w0, 18
bls .L17
.L1:
ret
.L17:
adrpx1, .L4
add x1, x1, :lo12:.L4
ldrbw1, [x1,w0,uxtw]
adr x0, .Lrtx4
add x1, x0, w1, sxtb #2
br  x1
.Lrtx4:
.L4:
.byte   (.L14 - .Lrtx4) / 4
.byte   (.L13 - .Lrtx4) / 4 // <<<< THIS IS THE BAD ONE
.byte   (.L12 - .Lrtx4) / 4

...

test(int) [clone .cold]:
.L13:
b   cold()


/tmp/cck05x4u.s: Assembler messages:
/tmp/cck05x4u.s:48: Error: invalid operands (.text.unlikely and .text sections)
for `-'
Compiler returned: 1

I've repro'd on 11,12, and trunk.

[Bug middle-end/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-02-01 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682

--- Comment #5 from Mathias Stearn  ---
(In reply to Andrew Pinski from comment #2)
> I should note that on x86, 2 cmov in a row might be an issue and worse than
> branches. There is a cost model and the x86 backend rejects that.
> 
> There are some cores where it is worse. I don't know if it applies to recent
> ones though.

Do you know if that applies to any cores that support x86_64? I checked Agner
Fog's tables, and only very very old cores (P4 era) had high reciprocal
throughput, but even then it was less than latency. It looks like all AMD cores
and intel cores newer than ivy bridge (ie everything from the last 10 years)
are able to execute multiple CMOVs per cycle (reciprocal throughput < 1). From
what I can see, it looks like bad CMOV was a particular problem of the Pentium
4 and Prescott cores, and possibly PPro, but I don't see the numbers for it. I
don't think any of those cores should have an impact on the default cost model
in 2024.

[Bug other/113682] New: Branches in branchless binary search rather than cmov/csel/csinc

2024-01-31 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682

Bug ID: 113682
   Summary: Branches in branchless binary search rather than
cmov/csel/csinc
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

I've been trying to eliminate unpredictable branches in a hot function where
perf counters show a high mispredict percentage. Unfortunately, I haven't been
able to find an incantation to get gcc to generate branchless code other than
inline asm, which I'd rather avoid. In this case I've even laid out the
critical lines so that they exactly match the behavior of the csinc and csel
instructions on arm64, but they are still not used.


Somewhat minimized repro:

typedef unsigned long size_t;
struct ITEM {char* addr; size_t len;};
int cmp(ITEM* user, ITEM* tree);

size_t bsearch2(ITEM* user, ITEM** tree, size_t treeSize) {
auto pos = tree;
size_t low = 0;
size_t high = treeSize;
while (low < high) {
size_t i = (low + high) / 2;
int res = cmp(user, tree[i]);

// These should be cmp + csinc + csel on arm
// and lea + test + cmov + cmov on x86.
low = res > 0 ? i + 1 : low; // csinc
high = res < 0 ? i: high; // csel

if (res == 0) return i;
}
return -1;
}


On arm64 that generates a conditional branch on res > 0:
bl  cmp(ITEM*, ITEM*)
cmp w0, 0
bgt .L15 // does low = i + 1 then loops
mov x20, x19
bne .L4 // loop


On x86_64 it does similar:
callcmp(ITEM*, ITEM*)
testeax, eax
jg  .L16 
jne .L17


Note that clang produces the desired codegen for both:

arm:
bl  cmp(ITEM*, ITEM*)
cmp w0, #0
csinc   x23, x23, x22, le
cselx19, x22, x19, lt
cbnzw0, .LBB0_1

x86:
callcmp(ITEM*, ITEM*)@PLT
lea rcx, [r12 + 1]
testeax, eax
cmovg   r13, rcx
cmovs   rbx, r12
jne .LBB0_1

(full output for all 4 available at https://www.godbolt.org/z/aWrKbYPTG. Code
snippets from trunk, but also repos on 13.2)

While ideally gcc would generate the branchless output for the supplied code,
if there is some (reasonable) incantation that would cause it to produce
branchless output, I'd be happy to have that too.

[Bug middle-end/113585] New: Poor codegen turning int compare into -1,0,1

2024-01-24 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113585

Bug ID: 113585
   Summary: Poor codegen turning int compare into -1,0,1
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://www.godbolt.org/z/Y31xG7EeT

These two functions should be equivalent, but comp2() produces better code than
comp1() on both arm64 and x86_64

int comp1(int a, int b) {
return a == b ? 0 : (a < b ? -1 : 1);
}

int comp2(int a, int b) {
return a < b ? -1 : (a > b ? 1 : 0);
}

arm64:
comp1(int, int):
cmp w0, w1
mov w0, -1
csinc   w0, w0, wzr, lt
cselw0, w0, wzr, ne
ret
comp2(int, int):
cmp w0, w1
csetw0, gt
csinv   w0, w0, wzr, ge
ret

x86_64:
comp1(int, int):
xor eax, eax
cmp edi, esi
je  .L1
setge   al
movzx   eax, al
lea eax, [rax-1+rax]
.L1:
ret
comp2(int, int):
xor eax, eax
cmp edi, esi
mov edx, -1
setgal
cmovl   eax, edx
ret

In the arm case, I suspect that the perf will be equivalent, although comp1 has
an extra instruction, so will pay a size penalty. On x86, comp2 is branchless
while comp1 has a branch which could be problematic if not predictable. It
seems like there should be a normalization pass to convert these functions (and
other equivalent ones) into a single normalized representation so that they get
the same codegen.


PS: I tried another equivalent function and it produces even worse codegen on
x86_64, comparing the same registers twice:

int comp3(int a, int b) {
return a > b ? 1 : (a == b ? 0 : -1);
}

comp3(int, int):
xor eax, eax
cmp edi, esi
mov edx, 1
setne   al
neg eax
cmp edi, esi
cmovg   eax, edx
ret

[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization

2023-10-27 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #5 from Mathias Stearn  ---
Mea culpa. The difference between boost and std was due to the code to
fast-path shared_ptrs that aren't actually shared:
https://github.com/gcc-mirror/gcc/blob/be34a8b538c0f04b11a428bd1a9340eb19dec13f/libstdc%2B%2B-v3/include/bits/shared_ptr_base.h#L324-L362.
I still think that optimization is a good idea, even if it looks bad in this
specific microbenchmark. When that is disabled, they have the same perf, even
with the check for single-threaded.

That said, I'd still love an opt out. For now, I'll just propose that we add a
do-nothing bg thread in our benchmark main() to avoid misleading results.

[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization

2023-10-27 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #4 from Mathias Stearn  ---
So even though I already knew in the back of my mind about how this can affect
benchmark results, I *still* got burned by it! I was benchmarking a simple
hazard pointer implementation against shared ptrs by having them crab-walk a
list of 1000 pointers. This was an admittedly simple and unrealistic benchmark
that only ever accessed the objects from a single thread and never had any
contention. It was a few hours after getting the initial results that I
remembered this issue and went back to add a bg thread.

> This needs numbers, not opinions

While my biggest concern was misleading benchmarks (which I now feel a bit
validated by my own mistake :), here are the numbers I see. I added
boost::shared_ptr on the assumption (unvalidated) that the primary difference
would be that it unconditionally uses atomics.

---
Benchmark Time CPU   Iterations
---
BM_scanLinks_HazPtr6420 ns 6420 ns   108948
BM_scanLinks_BoostSharedPtr   11223 ns11223 ns62380
BM_scanLinks_StdSharedPtr  3217 ns 3217 ns   217621
BM_scanLinks_AtomicSharedPtr  28920 ns28920 ns24200

And with a bg thread doing nothing but sleeping:

---
Benchmark Time CPU   Iterations
---
BM_scanLinks_HazPtr6887 ns 6887 ns   101445
BM_scanLinks_BoostSharedPtr   11224 ns11224 ns62373
BM_scanLinks_StdSharedPtr 14221 ns14221 ns49221
BM_scanLinks_AtomicSharedPtr  49574 ns49573 ns14126

[Bug other/111976] Large constant zero-valued objects should go in .bss rather than .rodata

2023-10-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111976

--- Comment #1 from Mathias Stearn  ---
Just to be clear, I think we should only do this for "large" objects or
collections of objects. If you did it for small objects in general, there is a
risk that they will spread out mutable data that is written to over more pages,
so that you end up with more runtime anonymous pages, rather than read-only
pages backed by the file cache, so they end up being more expensive. I think a
good rule to prevent this would be to only do it for objects larger than a
page, and possibly add page alignment to them. It may also be possible to
collect enough smaller objects together to make it worth doing this. Not sure
how often that occurs in practice though.

Also at -Os it may make sense to do this for any size object putting since
small objects in .bss will still shrink the binary size. Not sure how users of
-Os would react to tradeoffs involving runtime memory consumption vs binary
size.

[Bug other/111976] New: Large constant zero-valued objects should go in .bss rather than .rodata

2023-10-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111976

Bug ID: 111976
   Summary: Large constant zero-valued objects should go in .bss
rather than .rodata
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Right now constant objects always go in .rodata. This is nice because it will
give a nice loud error if you ever write to it. However, .rodata takes up space
in the binary file and in memory at runtime. If you instead put it in .bss it
takes up no space in the binary file and, at least on linux, it gets backed by
a single zero-filled page of physical memory. Ideally there would be something
like .robss which gave you the best of both worlds, but this is admittedly
niche for all the effort to add a new section like that. I think the best
option is to leave it in .rodata for non-optimized builds to catch bugs, but
when optimizing, especially with -Os, put it in .bss.

Repro https://www.godbolt.org/z/3rWvTrsTv:

constexpr int GB = 1024*1024*1024;

// Goes in .bss - no space in binary or runtime.
char nonconst_zeros[GB] = {};

// Goes in .rodata - expensive in binary size and runtime memory.
extern const char const_zeros[GB]; // force usage
const char const_zeros[GB] = {};


.globl  const_zeros
.section.rodata
.align 32
.type   const_zeros, @object
.size   const_zeros, 1073741824
const_zeros:
.zero   1073741824


.globl  nonconst_zeros
.bss
.align 32
.type   nonconst_zeros, @object
.size   nonconst_zeros, 1073741824
nonconst_zeros:
.zero   1073741824

[Bug target/104611] memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0 on aarch64

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #4 from Mathias Stearn  ---
clang has already been using the optimized memcmp code since v16, even at -O1:
https://www.godbolt.org/z/qEd768TKr. Older versions (at least since v9) were
still branch-free, but via a less optimal sequence of instructions.

GCC's code gets even more ridiculous at 32 bytes, because it does a branch
after every 8-byte compare, while the clang code is fully branch-free (not that
branch-free is always better, but it seems clearly so in this case).

Judging by the codegen, there seems to be three deficiencies in GCC: 1) an
inability to take advantage of the load-pair instructions to load 16-bytes at a
time, and 2) an inability to use ccmp to combine comparisons. 3) using
branching rather than cset to fill the output register. Ideally these could all
be done in the general case by the low level instruction optimizer, but even
getting them special cased for memcmp (and friends) would be an improvement.

[Bug libstdc++/111589] New: Use relaxed atomic increment (but not decrement!) in shared_ptr

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111589

Bug ID: 111589
   Summary: Use relaxed atomic increment (but not decrement!) in
shared_ptr
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

The atomic increment when copying a shared_ptr can be relaxed because it is
never actually used as a synchronization operation. The current thread must
already have sufficient synchronization to access the memory because it can
already deref the pointer. All synchronization is done either via whatever
program-provided code makes the shared_ptr object available to the thread, or
in the atomic decrement (where the decrements to non-zero are releases that
ensure all uses of the object happen before the final decrement to zero
acquires and destroys the object).

As an argument-from-authority, libc++ already is using relaxed for increments
and acq/rel for decements:
https://github.com/llvm/llvm-project/blob/c649fd34e928ad01951cbff298c5c44853dd41dd/libcxx/include/__memory/shared_ptr.h#L101-L121

This will have no impact on x86 where all atomic RMWs are effectively
sequentially consistent, but it will enable the use of ldadd rather than
ldaddal on aarch64, and similar optimizations on other weaker architectures.

[Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

Bug ID: 111588
   Summary: Provide opt-out of shared_ptr single-threaded
optimization
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Right now there is a fast-path for single-threaded programs to avoid the
overhead of atomics in shared_ptr, but there is no equivalent for a program the
knows it is multi-threaded to remove the check and branch. If __GTHREADS is not
defined then no atomic code is emitted.

There are two issues with this: 1) for programs that know they are effectively
always multithreaded they pay for a runtime branch and .text segment bloat for
an optimization that never applies. This may have knock-on effects of making
functions that use shared_ptr less likely to be inlined by pushing them
slightly over the complexity threshold. 2) It invalidates singlethreaded
microbenchmarks of code that uses shared_ptr because the performance of the
code may be very different from when run in the real multithreaded program.

I understand the value of making a fast mode for single-threaded code, and I
can even except having the runtime branch by default, rather than as an opt-in,
when it is unknown if the program will be run with multiple threads. But an
opt-out would be nice to have. If it had to be a gcc-build time option rather
than a #define, that would be acceptable for us since we always use our own
build of gcc, but it seems like a worse option for other users.

FWIW, neither llvm libc++
(https://github.com/llvm/llvm-project/blob/0bfaed8c612705cfa8c5382d26d8089a0a26386b/libcxx/include/__memory/shared_ptr.h#L103-L110)
nor MS-STL
(https://github.com/microsoft/STL/blob/main/stl/inc/memory#L1171-L1173) ever
use runtime branching to detect multithreading.

[Bug c++/89997] Garbled expression in error message with -fconcepts

2022-10-05 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89997

--- Comment #6 from Mathias Stearn  ---
> I think this is probably not concepts-specific, and just another variant of 
> PR 49152.

Perhaps the busted pretty printer is a general problem, but at least in this
case I think the fix may be in concepts code. It looks like the error is
generated at
https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/constraint.cc#L3300
(or the similar call 7 lines lower). Given that gcc already prints the source
loc with the invalid expression, I think you can just remove the %qE to improve
the diagnostic output. (I don't know the gcc codebase at all, so I could be
wrong about this, I just grepped for the string "the required expression")

[Bug c++/89997] Garbled expression in error message with -fconcepts

2022-10-05 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89997

--- Comment #3 from Mathias Stearn  ---
Please reopen. It still seems to be broken with -std=c++20 as the only flag:
https://godbolt.org/z/bWMq4s6xb (trunk) https://godbolt.org/z/W3xWjWaGe (12.2)

Output:

: In function 'void test()':
:16:15: error: no matching function for call to 'check()'
   16 | check(); // mangled error
  | ~~^~
:12:6: note: candidate: 'template void check() requires
requires(X x, T val) {x.X::operator<<(const char*)("hello") << val;}'
   12 | void check() requires requires (X x, T val) { x << "hello" << val; } {}
  |  ^
:12:6: note:   template argument deduction/substitution failed:
:12:6: note: constraints not satisfied
: In substitution of 'template void check() requires
requires(X x, T val) {x.X::operator<<(const char*)("hello") << val;} [with T =
int]':
:16:15:   required from here
:12:6:   required by the constraints of 'template void check()
requires requires(X x, T val) {x.X::operator<<(const char*)("hello") << val;}'
:12:23:   in requirements with 'X x', 'T val' [with T = int]
:12:60: note: the required expression '("hello"->x.X::operator<<() <<
val)' is invalid
   12 | void check() requires requires (X x, T val) { x << "hello" << val; } {}
  |   ~^~
cc1plus: note: set '-fconcepts-diagnostics-depth=' to at least 2 for more
detail
Compiler returned: 1


The last line with  still says "the required expression
'("hello"->x.X::operator<<() << val)' is invalid". It should not be trying to
apply -> to a string literal. The following line with carrot and underline is
very helpful and shows what the problem is. But the "note" line seems actively
harmful since it is showing an expression that would never be valid for any
type. It seems like it would be better to remove that line than attempting to
show it if you can't reproduce the correct expression.

[Bug c++/105397] Cannot export module initializer symbols with `-fvisibility=hidden`

2022-06-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105397

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #1 from Mathias Stearn  ---
Perhaps the best option is to default the visibility of the implicit functions
to the widest visibility of any function or object in module purview exposed by
the TU. The assumption being that if anything is visibile outside the library,
then it is expected to be imported from TUs outside the library and that should
Just Work. Conversely, if everything is defined as internal visibility, then it
is unlikely that this module was intended to be imported from outside of the
library, and so it may be desireable to allow different libs to have their own
module with the same name.

Unfortunately that doesn't give any good indication of what to do for
importable units that have an empty module purview (or where everything inside
it has TU-local internal linkage). While legal, maybe that isn't a case worth
optimizing the Just Works experience for?

[Bug target/105661] New: Comparisons to atomic variables generates less efficient code

2022-05-19 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105661

Bug ID: 105661
   Summary: Comparisons to atomic variables generates less
efficient code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

With normal variables, gcc will generate a nice cmp/js pair when checking if
the high bit is null. When using an atomic, gcc generates a movzx/test/js
triple, even though it could use the same codegen as for a non-atomic.

https://godbolt.org/z/GorvWfrsh

#include 
#include 

[[gnu::noinline]] void f();
uint8_t plain;
std::atomic atomic;

void plain_test() {
if (plain & 0x80) f();
}

void atomic_test() {
if (atomic.load(std::memory_order_relaxed) & 0x80) f();
}

With both -O2 and -O3 this generates:

plain_test():
cmp BYTE PTR plain[rip], 0
js  .L4
ret
.L4:
jmp f()
atomic_test():
movzx   eax, BYTE PTR atomic[rip]
testal, al
js  .L7
ret
.L7:
jmp f()

ARM64 seems to be hit even harder, but I don't know that platform well enough
to know if the non-atomic codegen is valid there
https://godbolt.org/z/c3h8Y1dan. It seems likely though, at least for a relaxed
load.

[Bug c++/105560] New: Spurious -Wunused-local-typedefs warning on a typedef used on returned type

2022-05-11 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105560

Bug ID: 105560
   Summary: Spurious -Wunused-local-typedefs warning on a typedef
used on returned type
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/zcY11ezev

#include 
#include 

template 
typename F::Out call(F&& fun) {
typename F::Out var = fun(1);
return var;
}

void test() {
// Fill builder with bson-encoded elements from view.
auto decorateLambda = [](auto&& lambda) {
using Lambda = std::remove_reference_t;
struct DecoratedLambda : Lambda {
using Out = bool;
};
return DecoratedLambda{std::forward(lambda)};
};
call(decorateLambda([&](auto&& value) {
(void)(value);
return true;
}));
}

With -O2 -std=c++20 -Wall:

: In lambda function:
:15:19: warning: typedef 'using
test()DecoratedLambda::Out = bool' locally defined but
not used [-Wunused-local-typedefs]
   15 | using Out = bool;
  |   ^~~
Compiler returned: 0


However, DecoratedLambda::Out *is* used by call(), both in the signature and in
the body. Perhaps the issue is that typedefs in local types that are returned
(or are reachable from returned types) shouldn't be considered "local"?

[Bug c++/105534] -Wmaybe-uninitialized cases shouldn't suppress -Wuninitialized warnings

2022-05-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105534

--- Comment #4 from Mathias Stearn  ---
(In reply to Richard Biener from comment #2)
> Note there's
> 
>   _2 = value_1(D) * x_2;
> 
> where _2 might be effectively uninitialized iff x_2 is not zero.  When x_2
> is zero then _2 has a well-defined value.

Not according to the C++ standard. http://eel.is/c++draft/basic.indet seems
pretty clear that unless x_2 is std::byte (which doesn't support
multiplication) or an "unsigned ordinary character type", then that is UB. 

FWIW I still think I'd expect the warning on "unsigned char x, y; y = x * 0;",
but I would definitiely expect the warning for int.

[Bug c++/105534] -Wmaybe-uninitialized cases shouldn't suppress -Wuninitialized warnings

2022-05-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105534

--- Comment #3 from Mathias Stearn  ---
One slightly confusing aspect is that the wording of the flag implies that the
variable may or may not be uninitialzied (because in -Wmaybe-uninitialized
maybe binds directly to uninitialized), while phrasing of the warning message
is about the usage being conditional: "may be used uninitialized". And of
course the documentation (at least the man page) uses a different phrasing:

> For an object with automatic or allocated storage duration, if there exists a 
> path from the function entry to a use of the object that is initialized, but 
> there exist some other paths for which the object is not initialized, the 
> compiler emits a warning if it cannot prove the uninitialized paths are not 
> executed at run time.

For both the initial example with count, and your example with count2, I'd say
that the "there exists a path from the function entry to a use of the object
that is initialized" bit is clearly not satisfied, so if we assume the
documentation is correct, then those cases both lack a "maybe" and the
variables are clearly uninitialized.

This would also match my intuition for -Winitialized which is that it
definitively errors if all paths from declaration to any usage result in the
variable being uninitialized.

PS - This test case is a reduced example from an actual bug that luckily was
found by coverity before release: https://jira.mongodb.org/browse/SERVER-66306.
I dug in to make a repro because I was expecting that we would have gotten a
compiler error on that code before it was even committed. I'm also exploring
whether we can stop passing -Wno-maybe-uninitialized, but it looks like we
still get false positives in third-party headers, so it doesn't seem likely.

[Bug c++/105534] New: -Wmaybe-uninitialized shouldn't suppress -Wuninitialized warnings

2022-05-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105534

Bug ID: 105534
   Summary: -Wmaybe-uninitialized shouldn't suppress
-Wuninitialized warnings
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

The following function emits a -Wuninitialized warning on ++count with -Wall
https://godbolt.org/z/KfaMEETY1:

int test(bool cond) {
int count;
++count;
return count;
}

Making the increment be conditional changes it to a -Wmaybe-uninitialized
warning, which is suppressed with -Wno-maybe-uninitialized.
https://godbolt.org/z/qarMrqW7E

int test(bool cond) {
int count;
if (cond) ++count;
return count;
}

This makes no sense. count is never initialized on any path through the
function, and it is returned on all paths.

We use -Wall with -Wno-maybe-uninitialized on our codebase because we were
getting too many false-positives with -Wmaybe-initialized, in particular from
third-party headers that we didn't want to modify. At the time we decided to do
that, we didn't realize that we would also be missing out on clearly
uninitialized cases like this.

[Bug target/105496] New: Comparison optimizations result in unnecessary cmp instructions

2022-05-05 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105496

Bug ID: 105496
   Summary: Comparison optimizations result in unnecessary cmp
instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/1zdYsaqEj

Consider these equivalent functions:

int test1(int x) {
if (x <= 10)
return 123;
if (x == 11)
return 456;
return 789;
}

int test2(int x) {
if (x < 11)
return 123;
if (x == 11)
return 456;
return 789;
}

In test2 it is very clear that you can do a single cmp of x with 11 then use
different flag bits to choose your case. In test1 it is less clear, but because
x<=10 and x<11 are equivalent, you could always transform one to the other.
Clang seems to do this correctly and transforms test1 into test2 and only emits
a single cmp instruction in each. For some reason, not only does gcc miss this
optimization, it seems to go the other direction and transform test2 into
test1, emitting 2 cmp instructions for both!

test1(int):
mov eax, 123
cmp edi, 10
jle .L1
cmp edi, 11
mov eax, 456
mov edx, 789
cmovne  eax, edx
.L1:
ret
test2(int):
mov eax, 123
cmp edi, 10
jle .L6
cmp edi, 11
mov eax, 456
mov edx, 789
cmovne  eax, edx
.L6:
ret

Observed with at least -O2 and -O3. I initially observed this for code where
each if generated an actual branch rather than a cmov, but when I reduced the
example, the cmov was generated.

I'm not sure if this should be a middle-end or target specific optimization,
since ideally it would be smart on all targets that use comparison flags, even
if there are some targets that don't. Is there ever a down side to trying to
make two adjacent comparisons compare the same number?

[Bug libstdc++/104223] New: GCC unable to inline trivial functions passed to views::filter and transform unless lifted into types

2022-01-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104223

Bug ID: 104223
   Summary: GCC unable to inline trivial functions passed to
views::filter and transform unless lifted into types
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

I'm not sure if this is an issue with the optimizer or the way that the library
code is written, or some combination of the two, but the end result seems
unfortunate. with() and without() are logically the same function, the only
difference is that with() lifts the function pointers into types using a
templated lambda variable, while without() just passes the function names
directly to the library. It seems interesting that the optimizer knows that
they are "constant enough" to emit direct rather than indirect calls to t() and
f(), however, it isn't constant enough to inline those calls.


https://godbolt.org/z/EqWzzh916
Flags: -std=c++2b -O3
Reproduces on at least 11.2 and trunk

#include 

namespace views = std::views;

void trace() noexcept;
inline int f(int i) {
trace();
return i;
}

inline bool t(int) { return true; }

// for some reason gcc needs this in order to inline f() and t()
template 
auto typify = [] (int i) { return f(i); };

void with() {
for (auto&& x : views::single(1) | views::transform(typify) |
views::filter(typify)) {}
}

void without() {
for (auto&& x : views::single(1) | views::transform(f) | views::filter(t))
{}
}

with():
sub rsp, 8
calltrace()
add rsp, 8
jmp trace()
without():
sub rsp, 8
mov edi, 1
callf(int)
mov edi, eax
callt(int)
testal, al
jne .L10
add rsp, 8
ret
.L10:
mov edi, 1
add rsp, 8
jmp f(int)

[Bug c++/102876] GCC fails to use constant initialization even when it knows the value to initialize

2021-10-26 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102876

--- Comment #12 from Mathias Stearn  ---
(In reply to Jakub Jelinek from comment #10)
> So we'd just punt at optimizing that, we don't know if b is read or written
> by foo (and, note, it doesn't have to be just the case of explicitly being
> passed address of some var, it can get the address through other means).
> On the other side, we can't optimize b to b: .long 2, because bar can use
> the variable and/or modify it, so by using 2 as static initializer bar would
> be miscompiled.

I'm pretty sure that that is explicitly allowed by
https://eel.is/c++draft/basic.start.static#3, so it should *not* be considered
a miscompilation. The example only shows reading from another static-duration
variable's initializer, but I believe writing would also be covered.

I took a look at how other compilers handle this, and it is somewhat
interesting: https://godbolt.org/z/9YvcbEeax

int foo(int&);
inline int bar() { return 7; }
extern int b;
int z = bar();
int a = foo(b); // comment this out and watch clang change...
int b = bar();
int c = bar();

GCC: always does dynamic init for everything.
MSVC: always does static init for z, b, and c. always dynamic init for a.
Clang: it seems like if it does any dynamic init in the TU, it doesn't promote
any dynamic init to static. So with that code all four variables are
dynamicially initialized, but if you comment out a, the remaining 3 become
static. If you add an unrelated variable that requires dynamic init, those 3
become dynamically initialized again.

I don't understand why clang does what it does. I don't think it is required to
do that by the standard, and it clearly seems suboptimal. So I would rather GCC
behave like MSVC in this case than like clang.

Also note what happens if we provide a definition for foo like `inline int
foo(int& x) { return x += 6; }`: https://godbolt.org/z/sWd6chsnP. Now both MSVC
and Clang will static initialize z, b, and c to 7 and *static* initialize a to
6. GCC gets the same result dynamically, but for some reason tries to load b
prior to adding 6, even though it has to be 0 (barring a UB write to b from
another TU).

[Bug c++/102876] GCC fails to use constant initialization even when it knows the value to initialize

2021-10-21 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102876

--- Comment #3 from Mathias Stearn  ---
> Why not just make the function constexpr though?

That isn't always possible. Sometimes the initializer may call a third-party
function that is inline, but not yet marked constexpr (it may need to support
older language versions where it couldn't be constexpr). Other times the
initializer may call a function that is out of line (so can't be constexpr at
all), but defined in the same TU. MSVC and clang handle this somewhat more
realistic example nicely, gcc doesn't: https://godbolt.org/z/jYKx8359T

The original example using chrono was just something that when reading I
thought "any optimizer worth its salt should be able to do this even without
explicit constexpr annotation". I was disappointed to learn that gcc couldn't,
so I filed a bug in the hope that it can be improved.

[Bug c++/102876] New: GCC fails to use constant initialization even when it knows the value to initialize

2021-10-21 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102876

Bug ID: 102876
   Summary: GCC fails to use constant initialization even when it
knows the value to initialize
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

See: https://godbolt.org/z/KnKv9j8b9

#include 
using namespace std::literals;
/*not constexpr*/ inline std::chrono::sys_days compute_days() {
return 1776y/7/4;
}
std::chrono::sys_days BURN_IN_TO_BINARY = compute_days();

Clang and MSVC both correctly burn BURN_IN_TO_BINARY into the binary image with
the correct value. Even with -O3, gcc zero-initializes it then uses a dynamic
initializer to complete the initialization. Both are valid implementation
strategies according to https://eel.is/c++draft/basic.start.static#3, however,
I think the strategy used by clang and MSVC is clearly superior QoI here.

[Bug ipa/102528] Unused out-of-line functions emitted for trivial coroutines

2021-10-05 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102528

--- Comment #8 from Mathias Stearn  ---
Sorry again about the confusion caused by my typo. I am not able to edit the
comment to make it clear that the comment#0 should be ignored. If that happens
again, would it be better for me to close the ticket and create a new one?

[Bug c++/102528] Unused out-of-line functions emitted for trivial coroutines

2021-09-29 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102528

--- Comment #1 from Mathias Stearn  ---
Sorry, there was a typo in the initial code. I forgot the trivial
implementation of await_resume(). (D'oh!)

Now I can see that test() is just a ret instruction, but there is still a lot
of dead code emitted for the coroutine functions. While it isn't too much for
the trivial functions, for a real use case it would be. And it seems like a bug
anyway  that unused code makes it to the obj file in -O3:
https://godbolt.org/z/bTncofrzd

#include 

struct simple {
struct promise_type {
void return_void() {}
std::suspend_never initial_suspend() { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void unhandled_exception() { throw; }
simple get_return_object() { return {}; }
};

std::true_type await_ready() {return {}; }
void await_suspend(std::coroutine_handle<>) {}
void await_resume() {}
};

inline simple test1() {
co_return;
}

inline simple test2() {
co_await test1();
co_return;
}

void test() {
test2();
}

test1(test1()::_Z5test1v.frame*) [clone .actor]:
movzx   eax, WORD PTR [rdi+18]
testal, 1
je  .L2
cmp ax, 5
ja  .L3
mov edx, 42
bt  rdx, rax
jnc .L3
.L4:
cmp BYTE PTR [rdi+17], 0
jne .L14
.L1:
ret
.L2:
cmp ax, 2
je  .L5
cmp ax, 4
je  .L4
testax, ax
jne .L3
mov QWORD PTR [rdi+24], rdi
.L5:
cmp BYTE PTR [rdi+17], 0
mov BYTE PTR [rdi+32], 1
mov QWORD PTR [rdi], 0
je  .L1
.L14:
jmp operator delete(void*)
test1(test1()::_Z5test1v.frame*) [clone .actor] [clone .cold]:
.L3:
ud2
test2(test2()::_Z5test2v.frame*) [clone .actor]:
movzx   eax, WORD PTR [rdi+18]
testal, 1
je  .L16
cmp ax, 7
ja  .L17
mov edx, 170
bt  rdx, rax
jnc .L17
.L18:
cmp BYTE PTR [rdi+17], 0
jne .L33
.L15:
ret
.L16:
cmp ax, 4
je  .L19
ja  .L20
testax, ax
jne .L34
mov QWORD PTR [rdi+24], rdi
.L22:
mov BYTE PTR [rdi+32], 1
.L19:
cmp BYTE PTR [rdi+17], 0
mov QWORD PTR [rdi], 0
je  .L15
.L33:
jmp operator delete(void*)
.L34:
cmp ax, 2
je  .L22
jmp .L17
.L20:
cmp ax, 6
je  .L18
jmp .L17
test2(test2()::_Z5test2v.frame*) [clone .actor] [clone .cold]:
.L17:
ud2
test1(test1()::_Z5test1v.frame*) [clone .destroy]:
movzx   eax, WORD PTR [rdi+18]
or  eax, 1
mov WORD PTR [rdi+18], ax
cmp ax, 5
ja  .L36
mov edx, 42
bt  rdx, rax
jnc .L36
cmp BYTE PTR [rdi+17], 0
jne .L39
ret
.L39:
jmp operator delete(void*)
test1(test1()::_Z5test1v.frame*) [clone .destroy] [clone .cold]:
.L36:
ud2
test2(test2()::_Z5test2v.frame*) [clone .destroy]:
movzx   eax, WORD PTR [rdi+18]
or  eax, 1
mov WORD PTR [rdi+18], ax
cmp ax, 7
ja  .L41
mov edx, 170
bt  rdx, rax
jnc .L41
cmp BYTE PTR [rdi+17], 0
jne .L44
ret
.L44:
jmp operator delete(void*)
test2(test2()::_Z5test2v.frame*) [clone .destroy] [clone .cold]:
.L41:
ud2
test():
ret

[Bug c++/102528] New: Unable to inline even trivial coroutines

2021-09-29 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102528

Bug ID: 102528
   Summary: Unable to inline even trivial coroutines
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/aoab9W4xG

This should all compile away, and test() should just be a single ret
instruction. That is not what happens now, even with -O3.

#include 

struct simple {
struct promise_type {
void return_void() {}
std::suspend_never initial_suspend() { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void unhandled_exception() { throw; }
simple get_return_object() { return {}; }
};

std::true_type await_ready() { return {}; }
void await_suspend(std::coroutine_handle<>) {}
void await_resume();
};

inline simple test1() {
co_return;
}

inline simple test2() {
co_await test1();
co_return;
}

void test() {
test2();
}

[Bug c++/100493] Lambda default copy capture that captures "this" cannot be used in both C++17 and C++20 modes

2021-09-22 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100493

--- Comment #3 from Mathias Stearn  ---
When reading https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82611, I noticed that
C++17 actually requires the warning on [=, this] from a conforming
implementation:
https://timsong-cpp.github.io/cppwp/n4659/expr.prim.lambda.capture#2. However,
given that this requirement was lifted in C++20, would it be possible to only
warn about that in -std=c++17 (or lower) with -pedantic? It seems
counterproductive to warn about it without -pedantic.

[Bug c++/100493] Lambda default copy capture that captures "this" cannot be used in both C++17 and C++20 modes

2021-09-22 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100493

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #2 from Mathias Stearn  ---
This is making it harder for us to transition to -std=c++20, since we can't
have the same code be warning-clean in both versions. I really don't think the
warning for [=, this] is helpful given that it is the pattern that is now
recommended.

[Bug web/102365] New: Function attribute docs should have an anchor or id on each attribute.

2021-09-16 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102365

Bug ID: 102365
   Summary: Function attribute docs should have an anchor or id on
each attribute.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html

This would make it easier to link to the docs for a specific attribute. There
is currently an anchor on the  tag (eg
https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#index-g_t_0040code_007bartificial_007d-function-attribute-2500
for artificial), but that will scroll down so that the name of the attribute
isn't shown. Ideally the anchor should be on the  tag instead. A nice
enhancement would be to make the attribute name be a self-link to make it easy
to get the link without needing to poke in the browser's inspect facility.

[Bug c++/102363] New: source_location in await_transform has function_name referring to internal coroutine funclet rather than source-level function

2021-09-16 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102363

Bug ID: 102363
   Summary: source_location in await_transform has function_name
referring to internal coroutine funclet rather than
source-level function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

When a promise_type::await_transform() has a defaulted argument of
source_location::current(), it refers to the location of the co_await that
triggered the await_transform. This is really useful for building runtime
diagnostics and tracing. The file/line/column fields all seem to be filled out
correctly, however, the function_name field refers to an internal funclet that
the coroutine function is split into. I think it would be more useful if it had
the original corountine function name since that is what users will be
expecting.

Example: https://godbolt.org/z/sbMsPG8Eq

Output:
  good: /app/example.cpp:20:43: example func()
  bad: /app/example.cpp:25:14: void func(func()::_Z4funcv.frame*)

Source (compiled with just -std=c++20):
#include 
#include 
#include 
struct example {
struct promise_type {
std::suspend_never initial_suspend() { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void unhandled_exception() { throw; }
example get_return_object() { return {}; }
void return_void() {}

auto await_transform(const char* dummy, std::source_location l =
std::source_location::current()) {
printf("bad: %s:%u:%u: %s\n", l.file_name(), l.line(), l.column(),
l.function_name());
return std::suspend_never();
}
};
};

example func() {
auto l = std::source_location::current();
printf("good: %s:%u:%u: %s\n", l.file_name(), l.line(), l.column(),
l.function_name());

// bad location points here:
//   v
co_await "pretend this is an awaitable";
}

int main() {
func();
}

[Bug c++/100876] New: -Wmismatched-new-delete should either look through or ignore placement new

2021-06-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100876

Bug ID: 100876
   Summary: -Wmismatched-new-delete should either look through or
ignore placement new
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/KTTMrEGns

Example code:
free(new (malloc(4)) int()); // Warns but shouldn't
delete new (malloc(4)) int(); // Doesn't warn but should

output:

:5:9: warning: 'void free(void*)' called on pointer returned from a
mismatched allocation function [-Wmismatched-new-delete]
5 | free(new (malloc(4)) int()); // Warns but shouldn't
  | ^~~
:5:30: note: returned from 'void* operator new(std::size_t, void*)'
5 | free(new (malloc(4)) int()); // Warns but shouldn't
  |  ^

While it would be nice to have a warning on the second line, not warning on the
first seems more important. And hopefully is a backportable fix.

Here is some Real World Code exhibiting this pattern that g++ currently warns
about when compiling:
https://github.com/facebook/hermes/blob/dfef1abd6d20b196e24c591e225a7003e6337a94/unittests/VMRuntime/StringPrimitiveTest.cpp#L221-L235.
There is also an example using calloc() lower in that file.

[Bug libstdc++/99979] New: condition_variable_any has wrong behavior if Lock::lock() throws

2021-04-08 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99979

Bug ID: 99979
   Summary: condition_variable_any has wrong behavior if
Lock::lock() throws
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://eel.is/c++draft/thread.condvarany.wait says "Postconditions: lock is
locked by the calling thread. […] Remarks: If the function fails to meet the
postcondition, terminate() is invoked […] This can happen if the re-locking of
the mutex throws an exception." This wording was changed in C++14 for
https://cplusplus.github.io/LWG/issue2135, but given that it is borderline
impossible to use a condition_variable[_any] correctly if it doesn't guarantee
relocking, it seems best to do the same in C++11 mode as well.

Unfortunately, std::condition_variable_any::__Unlock::~__Unlock() either
ignores[1] or propagates any exception thrown by lock(), which means that the
postcondition of wait() may not be fulfilled. I think the correct behavior
would be to remove the noexcept(false), allowing the destructor's default
noexecpt to apply, and just unconditionally call _M_lock.lock() without any
try/catch.

[1] It ignores if std::uncaught_exception() returns true. I'm not 100% sure why
that dispatch is there. It seems like it may be trying to detect if
_M_cond.wait() threw, but a) that is noexcept and b) that doesn't work
correctly if some thread waits on a condvar inside a destructor that is
triggered during exception unwinding (this is why std::uncaught_exceptions()
was introduced).

[Bug rtl-optimization/99470] Convert fixed index addition to array address offset

2021-03-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99470

Mathias Stearn  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #4 from Mathias Stearn  ---
Yes, but I believe any case where they would access different addresses would
be UB overflow in f(), making it valid to turn f() into g(), especially if you
used an internal lowering which sign extended index to pointer width and had
defined wrapping semantics. I'll note that clang already generates identical
code for f() and g() https://gcc.godbolt.org/z/j897sh, although I think gcc has
better codegen at least for g().

Also, my example was perhaps oversimplified. My indexes were actually int8_t
(which is why I'm indexing into a 256-element array), so due to int promotion,
overflow is actually impossible. However, with int8_t arguments, gcc generates
even worse code for f(), doing the sign-extension twice for some reason (8 ->
32 -> 64): https://gcc.godbolt.org/z/5r9h89


(I hope it isn't a faux pas to reopen the ticket, but I think I've provided
enough new information that this warrants another look)

[Bug c++/99470] New: Convert fixed index addition to array address offset

2021-03-08 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99470

Bug ID: 99470
   Summary: Convert fixed index addition to array address offset
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

These two functions do the same thing but f() is the cleaner source code
(especially when arr is a std::array) while g() generates better code:

https://gcc.godbolt.org/z/vTT399

#include 

inline int8_t arr[256];

bool f(int a, int b) {
return arr[a+128] == arr[b+128];
}

bool g(int a, int b) {
return (arr+128)[a] == (arr+128)[b];
}

f(int, int):
sub edi, -128
sub esi, -128
lea rax, arr[rip]
movsx   rdi, edi
movsx   rsi, esi
movzx   edx, BYTE PTR [rax+rsi]
cmp BYTE PTR [rax+rdi], dl
seteal
ret
g(int, int):
lea rax, arr[rip+128]
movsx   rdi, edi
movsx   rsi, esi
movzx   edx, BYTE PTR [rax+rsi]
cmp BYTE PTR [rax+rdi], dl
seteal
ret

In addition to only doing the +128 once, it also ends up being completely free
in g() because the assembler (or linker?) folds the addition into the address
calculation by adjusting the offset of the rip-relative address. In the godbolt
link, you can see that when compiled to binary, the LEA instruction uses the
same form in both f() and g(), so the addition really is free in g().

[Bug middle-end/99339] Poor codegen with simple varargs

2021-03-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99339

--- Comment #6 from Mathias Stearn  ---
> The question is how common in the wild it is and if it is worth the work.

I'm not sure how common it is, but this is my use case. The code in the bug
report is a slightly simplified example from some Real World Code I am working
on:
https://source.wiredtiger.com/develop/struct_w_t___c_u_r_s_o_r.html#af19f6f9d9c7fc248ab38879032620b2f.
Essentially WT_CURSOR is a structure of function pointers that all take a
WT_CURSOR pointer as the first argument, similar to C++ virtual functions. The
get_key() and get_value() "methods" both take (WT_CURSOR*, ...) arguments, and
unpack the arguments based on the format string that was set up when you opened
the cursor. The expectation is that they will be called many times for each
cursor object. Since we know the format string when creating the cursor I was
experimenting with creating specialized functions for common formats rather
than dispatching through the generic format-inspecting logic every time. I
noticed that I was able to get even more performance by declaring the functions
as taking the arguments they actually take rather than dealing with va_args,
then casting the function pointers to the generic (WT_CURSOR, ...) type when
assigning into the method slot. I assume this is UB in C (I don't know the C
standard nearly as well as C++) but all ABIs I know of are designed to make
this kind of thing work.

I'd rather not have to do that kind of questionable shenanigans in order to get
the same performance.

[Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.

2021-03-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

--- Comment #5 from Mathias Stearn  ---
@François Dumont: Sorry I didn't see your question earlier. The reason that
unordered_map perf hurts on 64-bit platforms is because it is designed to do a
size_t modulus-by-prime on every lookup, and on most platforms that is *very*
expensive (up to 100 cycles for 64 bits vs 20ish for 32 bits). Some very modern
CPUs have made improvements here, but it is still much more expensive than just
using power-of-2 buckets and masking, even if you need to hash the hash if you
don't trust the low order bits to have enough entropy. Unfortunately, fixing
this is a pretty big ABI break, so it isn't going to change any time soon.

[Bug middle-end/99339] Poor codegen with simple varargs

2021-03-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99339

--- Comment #5 from Mathias Stearn  ---
I filed a related bug with clang right after this one if anyone want to follow
along https://bugs.llvm.org/show_bug.cgi?id=49395.

Just because clang does worse doesn't mean gcc shouldn't do better ;)

[Bug c/99339] New: Poor codegen with simple varargs

2021-03-02 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99339

Bug ID: 99339
   Summary: Poor codegen with simple varargs
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

These two functions should generate the same code, but the varargs version is
much worse. And it is even worser still when you enable
-fstack-protector-strong.

https://godbolt.org/z/noEYoh

#include 

int test_va(int x, ...) {
int i;
va_list va;
va_start(va, x);
i = va_arg(va, int);
va_end(va);
return i + x;
}

int test_args(int x, int i) {
return i + x;
}

# explicit args with or without stack protection
test_args:
lea eax, [rsi+rdi]
ret

# without stack protection (why aren't dead stores to the stack being
eliminated?)
test_va:
lea rax, [rsp+8]
mov QWORD PTR [rsp-40], rsi
mov QWORD PTR [rsp-64], rax
lea rax, [rsp-48]
mov QWORD PTR [rsp-56], rax
mov eax, DWORD PTR [rsp-40]
mov DWORD PTR [rsp-72], 8
add eax, edi
ret

# with stack protection (yikes!)
test_va:
sub rsp, 88
mov QWORD PTR [rsp+40], rsi
mov rax, QWORD PTR fs:40
mov QWORD PTR [rsp+24], rax
xor eax, eax
lea rax, [rsp+96]
mov DWORD PTR [rsp], 8
mov QWORD PTR [rsp+8], rax
lea rax, [rsp+32]
mov QWORD PTR [rsp+16], rax
mov eax, DWORD PTR [rsp+40]
add eax, edi
mov rdx, QWORD PTR [rsp+24]
sub rdx, QWORD PTR fs:40
jne .L7
add rsp, 88
ret
.L7:
call__stack_chk_fail

[Bug c++/99047] New: ICE on simple task coroutine example

2021-02-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99047

Bug ID: 99047
   Summary: ICE on simple task coroutine example
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/ecxEbb Code compiles on MSVC and clang, and at least on
clang produces the expected output.

#include 
#include 

template 
struct [[nodiscard]] task {
struct promise_type  {
std::suspend_always initial_suspend() {
return {};
}
auto final_suspend() noexcept {
struct awaiter {
std::false_type await_ready() noexcept {
return {};
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<>)
noexcept {
return next;
}
void await_resume() noexcept {
}
std::coroutine_handle<> next;
};
return awaiter{next};
}

void unhandled_exception() noexcept {
std::terminate();
}
auto get_return_object() {
return task(this);
}
auto coro() {
return std::coroutine_handle::from_promise(*this);
}
void return_value(T val) {
result.emplace(std::move(val));
}

std::coroutine_handle<> next;
std::optional result;
};

task(task&& source) : p(std::exchange(source.p, nullptr)) {}
explicit task(promise_type* p) : p(p) {}
~task() {
if (p)
p->coro().destroy();
}

bool await_ready() noexcept {
return p->coro().done();
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> next)
noexcept {
p->next = next;
return p->coro();
}
const T& await_resume() const& noexcept {
return *p->result;
}

promise_type* p;
};

task five() {
co_return 5;
}
task six() {
co_return (co_await five()) + 1;
}


int main() {
auto task = six();
task.p->next = std::noop_coroutine();
task.p->coro().resume();
return *task.p->result;
}

: In function 'void _Z4fivev.actor(five()::_Z4fivev.frame*)':
:92:11: internal compiler error: in fold_convert_loc, at
fold-const.c:2430
   92 | task five() {
  |   ^~~~
0x1ce6f09 internal_error(char const*, ...)
???:0
0x6b6f43 fancy_abort(char const*, int, char const*)
???:0
0xc92cd4 fold_convert_loc(unsigned int, tree_node*, tree_node*)
???:0
0xd126ae gimple_boolify(tree_node*)
???:0
0xd1b720 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
???:0
0xd1bb4a gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)

[snipping many more frames]

[Bug libstdc++/99021] coroutine_handle<_Promise>::from_address() should be noexcept

2021-02-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99021

--- Comment #3 from Mathias Stearn  ---
Thanks for the quick fix!

Any chance of a backport of this and
https://github.com/gcc-mirror/gcc/commit/f1b6e46c417224887c2f21baa6d4c538a25fe9fb#diff-ed08df78eba81189642b4e8d670a0adb4b377db6846aecb090b4dce52a9251fa
to the v10 branch? It will improve the experience of anyone using clang-based
editor tooling when using libstdc++ rather than libc++.

[Bug libstdc++/99021] coroutine_handle<_Promise>::from_address() should be noexcept

2021-02-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99021

--- Comment #1 from Mathias Stearn  ---
clang bug for reference https://bugs.llvm.org/show_bug.cgi?id=49109

[Bug libstdc++/99021] New: coroutine_handle<_Promise>::from_address() should be noexcept

2021-02-09 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99021

Bug ID: 99021
   Summary: coroutine_handle<_Promise>::from_address() should be
noexcept
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

While it isn't required by the standard, it seems odd that 
coroutine_handle::from_address() is noexcept while the non-specialized
version isn't when they are the same code. This would also help with
clang/libstdc++ compatibility because they currently check for noexcept of
from_address() as part of validating
https://eel.is/c++draft/dcl.fct.def.coroutine#15 (I'll be filing a bug with
them about that shortly since that is leaking an internal compiler
implementation detail).

[Bug c++/98639] GCC accepts cast from Base to Derived in C++20 mode

2021-01-12 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98639

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #4 from Mathias Stearn  ---
This really seems like it *is* supposed to be allowed based on simply following
the normal language rules for aggregates. It would probably require a special
case rule to prevent it from working. Is there something that leads you to
think that such a special case rule was forgotten, rather than the natural
outcome being expected?

[Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.

2020-05-12 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

Bug ID: 95079
   Summary: unorderd_map::insert_or_assign and try_emplace should
only hash and mod once unless there is a rehash.
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Currently insert_or_assign() and try_emplace() call find(key) and fall back to
emplace(...) if that fails to find the key. The computed hash (and more
importantly in general) the modded bucket index computed by find() is thrown
away and recomputed by emplace(). Instead they should compute the hash once,
and unless there is a rehash, also only do the modulus once. This optimization
is already performed for operator[].

https://godbolt.org/z/cw82RC shows that the hasher is invoked once for
operator[] and twice for insert_or_assign().
http://quick-bench.com/ge8Suq7PcdRKm6IBQbjvwuXhW6Y shows that there is a
significant performance difference (20% in this test).

(I know std::unordered_map is always going to be less than fast on 64-bit
platforms, but it doesn't need to be slower than it needs to be...)

[Bug libstdc++/94854] New: Comment in basic_string.tcc incorrectly says std::string doesn't have explicit instantiation in C++17

2020-04-29 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94854

Bug ID: 94854
   Summary: Comment in basic_string.tcc incorrectly says
std::string doesn't have explicit instantiation in
C++17
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://github.com/gcc-mirror/gcc/blob/0c8217b16f307c3eedce8f22354714938613f701/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L1612-L1617

Looks like the code changed in this commit
https://github.com/gcc-mirror/gcc/commit/1a289fa36294627c252492e4c18d7877a7c80dc1#diff-56639139bdefbe09b8f41c465ebf1ab5,
but the comment wasn't adjusted to match. While I recognize this isn't an issue
for 99.9% of users, hopefully if this is fixed it will save someone
else that minute or two of staring at the code trying to reconcile it with the
comment.

[Bug libstdc++/90295] Please define ~exception_ptr inline

2019-11-08 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90295

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #2 from Mathias Stearn  ---
This will become more of a problem with C++ coroutines where in many cases the
compiler can prove that no exception will be thrown, but the generator or task
type will still have overhead from an exception_ptr member that the compiler
can't eliminate at the moment.

A handful of additional methods should probably either be completely inline or
have a nullptr fast-path: https://godbolt.org/z/zP6DSJ

default ctor
copy ctor
copy assign
swap
operator==
operator!=

Example showing significant codegen at -O3 which should all optimize away:

#include 
bool test() {
std::exception_ptr p1;
std::exception_ptr p2 (p1);
p1 = p2;
swap(p1, p2);
return p1 == nullptr && nullptr == p2 && p1 == p2;
}

[Bug c++/91759] New: g++ accepts ill-formed deduction guides in wrong scope

2019-09-12 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91759

Bug ID: 91759
   Summary: g++ accepts ill-formed deduction guides in wrong scope
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

namespace N {
template 
struct X{ X(int); };
}

using N::X;

X(int) -> X;

This is supposed to not compile due to
http://eel.is/c++draft/temp.deduct.guide#3.sentence-4: A deduction-guide shall
be declared in the same scope as the corresponding class template and, for a
member class template, with the same access. 

clang, icc, and msvc all correctly consider this to be ill-formed:
https://godbolt.org/z/UH3Y8W.

[Bug c++/90287] New: [concepts] bogus error on overload failure inside requires-expression

2019-04-29 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90287

Bug ID: 90287
   Summary: [concepts] bogus error on overload failure inside
requires-expression
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/30Cf6s

#include 

template 
constexpr inline bool isAddable = requires(const T& lhs, const U& rhs) {
 lhs + rhs;
};

auto x = isAddable;

: In instantiation of 'constexpr const bool isAddable >':
:13:10:   required from here
:10:10: error: no match for 'operator+' (operand types are 'const int'
and 'const std::__cxx11::basic_string')
   10 |  lhs + rhs ;
  |  ^

[snip rest of wall-o-errors]

If I make isAddable a concept, it correctly evaluates to false, but I would
expect requires-expressions to also work when assigned to a constexpr bool.

[Bug c++/90033] New: [concepts] ICE segfault evaluating a requires clause that transitively depends on itself

2019-04-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90033

Bug ID: 90033
   Summary: [concepts] ICE segfault evaluating a requires clause
that transitively depends on itself
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/SEfFol

This is a creduce'd example that tripped a segfault in our Real World Code
implementation of unique_function. The RWC version includes a requirement that
X != Y so this can never be a copy or move constructor, but that was removed in
the reduction. FWIW, The clang concepts fork compiles this successfully.

template 
struct bool_constant { static constexpr bool value = B; };
template 
struct is_constructible : bool_constant<__is_constructible(T, Args...)> {};
template 
T&& move(T&);

struct X {
  template 
  requires(is_constructible::value)
  X(OtherFunc &&);

  X() = default;
};

X source;
X dest = move(source);

-

: In substitution of 'template  requires 
is_constructible::value X::X(OtherFunc&&) [with OtherFunc
= X]':
:16:21:   required from here
:16:21: internal compiler error: Segmentation fault
   16 | X dest = move(source);
  | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.


I know concepts are still experimental, but if the fix turns out to be simple,
we'd appreciate a backport to gcc8.

[Bug c++/90031] New: Bogus parse error trying to explicitly specialize a template variable inside class scope

2019-04-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90031

Bug ID: 90031
   Summary: Bogus parse error trying to explicitly specialize a
template variable inside class scope
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/y5GQZd

struct Struct {
template 
constexpr static bool use_cond = false;
template 
constexpr static bool use_cond = true;
};

:5:27: error: explicit template argument list not allowed
5 | constexpr static bool use_cond = true;
  |   ^

[Bug c++/90019] New: [8 regression] Bogus ambiguous overload error for NTTP pack of disjoint enable_ifs unless there is an unsupplied default argument

2019-04-08 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90019

Bug ID: 90019
   Summary: [8 regression] Bogus ambiguous overload error for NTTP
pack of disjoint enable_ifs unless there is an
unsupplied default argument
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

This code compiles fine with 7 and trunk but fails with gcc8:
https://godbolt.org/z/v1HN8B

#include 

// gcc8 thinks these are ambiguous for <0>
template ...> void foo(){}
template ...> void foo(){}

// but somehow these arn't for <0>, but are for <0,0> !?
template ...> void bar(){}
template ...> void bar(){}

void test() {
bar<0>(); // works
bar<0,0>(); // boom
foo<0>(); // boom
}

[Bug c++/89997] New: Garbled expression in error message with -fconcepts

2019-04-06 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89997

Bug ID: 89997
   Summary: Garbled expression in error message with -fconcepts
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Usually -fconcepts delivers excellent error messages, but this one is pretty
bad. It looks like this goes back to 6.2, when it first started to show the
expression.

https://godbolt.org/z/m9DlOZ

struct Y;

struct X {
Y operator<< (const char*);
};

struct Y {
X operator<< (void*);
};

template 
void check() requires requires (X x, T val) { x << "hello" << val; } {}

void test() {
check(); // no error
check(); // mangled error
}

-

: In function 'void test()':
:16:16: error: cannot call function 'void check() requires (> [with T =
int]'
   16 | check(); // mangled error
  |^
:12:6: note:   constraints not satisfied
   12 | void check() requires requires (X x, T val) { x << "hello" << val; } {}
  |  ^
:12:6: note: with 'X x'
:12:6: note: with 'int val'
:12:6: note: the required expression '("hello"->x.X::operator<<() <<
val)' would be ill-formed


What is that expression? How did it end up applying -> to a string literal!?

[Bug c++/89781] Misleading error messages when initializing a static data member in-class

2019-03-22 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89781

--- Comment #3 from Mathias Stearn  ---
Yeah, my point wasn't that this code should be accepted, it was that the error
messages should be improved. Ideally there would even be fixits suggesting
adding constexpr where it would be valid, otherwise suggesting inline.

Sorry if that wasn't clear.

[Bug c++/89781] New: Misleading error messages when initializing a static data member in-class

2019-03-20 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89781

Bug ID: 89781
   Summary: Misleading error messages when initializing a static
data member in-class
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

The error messages make sense in a pre-c++17 world before inline variables
existed. Now they are lies!

struct X{};
struct Y { static X x = {}; };

:2:21: error: 'constexpr' needed for in-class initialization of static
data member 'X Y::x' of non-integral type [-fpermissive]
 struct Y { static X x = {}; };
 ^

The error is even worse when X can't be a constexpr variable:


struct X{ X(); };
struct Y { static X x = {}; };

:2:21: error: in-class initialization of static data member 'X Y::x' of
non-literal type
 struct Y { static X x = {}; };
 ^
:2:26: error: temporary of non-literal type 'X' in a constant
expression
 struct Y { static X x = {}; };
  ^
:1:8: note: 'X' is not literal because:
 struct X{ X(); };
^
:1:8: note:   'X' is not an aggregate, does not have a trivial default
constructor, and has no 'constexpr' constructor that is not a copy or move
constructor



This compiles just fine:

struct X{ X(); };
struct Y { static inline X x = {}; };

All examples were passing -std=c++17.

[Bug c++/89780] New: -Wpessimizing-move is too agressive with templates and recommends pessimization

2019-03-20 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89780

Bug ID: 89780
   Summary: -Wpessimizing-move is too agressive with templates and
recommends pessimization
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/JA-0Gq

#include 

struct Dest {
Dest() = default;
Dest(Dest&&);
Dest(const Dest&);
};
struct Source : Dest {};

template 
Dest withMove() {
T x;
return std::move(x);
}


template 
Dest noMove() {
T x;
return x;
}


template Dest withMove();
template Dest withMove();

template Dest noMove();
template Dest noMove();

> g++ -O3 -Wall -std=c++17

: In instantiation of 'Dest withMove() [with T = Dest]':
:24:30:   required from here
:13:23: warning: moving a local object in a return statement prevents
copy elision [-Wpessimizing-move]
   13 | return std::move(x);
  |   ^
:13:23: note: remove 'std::move' call


Basically, gcc9 recommends changing withMove, where both Source and Dest are
moved from, in to noMove, where Dest is copy-elided but Source is copied. While
this is a minor optimization for the Dest instantiation, it is a potentially
significant pesimization for the Source one.

Amusingly, this code trips a warning in clang that recommends doing the
opposite, adding the std::move to turn noMove into withMove:
https://godbolt.org/z/WONlMN

 :20:12: warning: local variable 'x' will be copied despite being
returned by name [-Wreturn-std-move]
return x;
   ^
:28:15: note: in instantiation of function template specialization
'noMove' requested here
template Dest noMove();
  ^
:20:12: note: call 'std::move' explicitly to avoid copying
return x;
   ^
   std::move(x)

There is just no winning!

[Bug middle-end/89739] New: pessimizing vectorization at -O3 to load two u64 objects

2019-03-16 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89739

Bug ID: 89739
   Summary: pessimizing vectorization at -O3 to load two u64
objects
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/8vIGZ3

using u64 = unsigned long long;
struct u128 {u64 a, b;};

inline u64 load8(void* ptr) {
u64 out;
__builtin_memcpy(, ptr, 8);
return out;
}

u128 load(char* basep, u64 n) {
return {load8(basep), load8(basep+n-8)};
}

At -O2 this emits ideal asm:
mov rax, QWORD PTR [rdi]
mov rdx, QWORD PTR [rdi-8+rsi]
ret


At -O3 it is comical:
movqxmm0, QWORD PTR [rdi]
movhps  xmm0, QWORD PTR [rdi-8+rsi]
movaps  XMMWORD PTR [rsp-24], xmm0
mov rax, QWORD PTR [rsp-24]
mov rdx, QWORD PTR [rsp-16]
ret

This seems to have been introduced in gcc7

[Bug c++/89706] New: -Wstringop-truncation strncpy message is confusing and has psuedo-false-positives

2019-03-13 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89706

Bug ID: 89706
   Summary: -Wstringop-truncation strncpy message is confusing and
has psuedo-false-positives
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/BAeXQB

"strncpy output truncated copying between 0 and 8 bytes from a string of length
8" doesn't make it obvious what the problem with the is or how to solve it.
Once I decoded it[1], I realized it was warning about something the code was
already set up to handle correctly. Specifically, the last line of func
re-establishes the nul byte termination regardless of how many bytes are
copied. (Yes I know that this code is a dumb use of strncpy and we should
probably just be using memcpy, but that isn't what this warning is warning
about.)


[1] I was only able to decode it by tweaking the code until I got the "output
truncated before terminating nul copying 8 bytes from a string of the same
length" form of the warning. This happens if you comment out the if-block in
this code.


#include 

extern size_t len;
extern char *buf;

inline void func(const char* s) {
size_t sz = strlen(s);
if (sz > len - 1)
sz = len - 1;
strncpy(buf, s, sz);
buf[sz] = '\0';
}

void test() {
const char* p = "Progress";
func(p);
}

> g++ -Wstringop-truncation -O2

In function 'void func(const char*)',
inlined from 'void test()' at :16:9:
:10:12: warning: 'char* strncpy(char*, const char*, size_t)' output
truncated copying between 0 and 8 bytes from a string of length 8
[-Wstringop-truncation]
   10 | strncpy(buf, s, sz);
  | ~~~^~~~

[Bug c++/89688] New: -Wstringop-overflow confused by 2D array of char

2019-03-12 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89688

Bug ID: 89688
   Summary: -Wstringop-overflow confused by 2D array of char
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Also, for some reason it repeats the warning 3 times.

https://godbolt.org/z/SStUsl

// Fine
extern const char a[2] = {'1'};
auto z = __builtin_strlen(a);

// Warns
extern const char aa[1][2] = {{'2'}};
auto zz = __builtin_strlen(aa[0]);



> ~/opensource/gcc/prefix/bin/g++ -Wall str.cpp  -fsyntax-only
str.cpp:7:27: warning: ‘strlen’ argument missing terminating nul
[-Wstringop-overflow=]
7 | auto zz = __builtin_strlen(aa[0]);
  |   ^~~
str.cpp:6:19: note: referenced argument declared here
6 | extern const char aa[1][2] = {{'2'}};
  |   ^~
str.cpp:7:27: warning: ‘strlen’ argument missing terminating nul
[-Wstringop-overflow=]
7 | auto zz = __builtin_strlen(aa[0]);
  |   ^~~
str.cpp:6:19: note: referenced argument declared here
6 | extern const char aa[1][2] = {{'2'}};
  |   ^~
str.cpp:7:32: warning: ‘strlen’ argument missing terminating nul
[-Wstringop-overflow=]
7 | auto zz = __builtin_strlen(aa[0]);
  |^
str.cpp:6:19: note: referenced argument declared here
6 | extern const char aa[1][2] = {{'2'}};
  |   ^~


Reduced example from real code at:
https://github.com/boostorg/date_time/blob/b0437e2999a65668dc4178dbb817a89a382ece94/include/boost/date_time/special_values_formatter.hpp#L89-L92
+
https://github.com/boostorg/date_time/blob/b0437e2999a65668dc4178dbb817a89a382ece94/include/boost/date_time/special_values_formatter.hpp#L43-L45

[Bug c++/89682] New: g++9 incorrectly disallows using private static method as default arg to ctor of template type

2019-03-12 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89682

Bug ID: 89682
   Summary: g++9 incorrectly disallows using private static method
as default arg to ctor of template type
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

New in GCC9. No other compiler errors on this:

https://godbolt.org/z/480KfZ

template 
class C {
class TagType {};
public:
C(int, TagType = makeTag());
private:
static TagType makeTag();
};

void test() {
C(1);
}


> g++ -fsyntax-only./creduce_repro/test.ii
./creduce_repro/test.ii: In function ‘void test()’:
./creduce_repro/test.ii:5:29: error: ‘static C::TagType C::makeTag()
[with T = int]’ is private within this context
5 | C(int, TagType = makeTag());
  |  ~~~^~
./creduce_repro/test.ii:7:20: note: declared private here
7 | static TagType makeTag();
  |^~~

[Bug c++/89640] [9 Regression] g++ chokes on lambda with __attribute__

2019-03-11 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89640

--- Comment #4 from Mathias Stearn  ---
@Jakub, This code doesn't have either mutable or noexcept, so the "wrong place
in the grammer" issue doesn't apply. That part of the fix seems correct and
useful.

While it seems correct to fix what the c++11-syle [[attribute]] appertains to
to match the standard, it would be unfortunate to do the same to
__attribute__(()) style attributes which are already outside of the standard.
Until the standard provides some way to have an attribute that appertains to
the lambda function, this part of the "fix" is breaking useful functionality
that has existed since GCC-5 without offering any replacement.

[Bug c++/89640] [9 Regression] g++ chokes on lambda with __attribute__

2019-03-11 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89640

--- Comment #2 from Mathias Stearn  ---
Unfortunately the c++ attributes syntax applies to the lambda type rather than
the function, so the warning is correct. The old style __attribute__ syntax
seems to be the only way to annotate the lambda function, which is why we use
it here. We use something like this in a macro around code that builds error
messages on our error paths, which is why it needs to be on a lambda. It made a
notable shrink to the size of our primary .text section moving that stuff out
to the .text.cold section.

[Bug c++/89640] New: g++ chokes on lambda with __attribute__

2019-03-08 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89640

Bug ID: 89640
   Summary: g++ chokes on lambda with __attribute__
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Regression in gcc 9 vs 8.

https://godbolt.org/z/MGL0H4

void test() {
[]() __attribute__((noinline,cold)) {
asm volatile("");
}();
}

: In function 'void test()':
:2:41: error: expected identifier before '{' token
2 | []() __attribute__((noinline,cold)) {
  | ^
:2:41: error: type-specifier invalid in lambda
Compiler returned: 1

[Bug c++/88865] New: [[no_unique_address]] leads to sizeof(T) == 0, which cannot be

2019-01-15 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88865

Bug ID: 88865
   Summary: [[no_unique_address]] leads to sizeof(T) == 0, which
cannot be
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/rJT5X9

struct B {};
struct A {
[[no_unique_address]] B a;
[[no_unique_address]] B b;
[[no_unique_address]] B c;
[[no_unique_address]] B d;
};

int f() {
return sizeof(A);
}

f():
pushrbp
mov rbp, rsp
mov eax, 0
pop rbp
ret

In addition to the major issue that sizeof(A) must not be 0, it also must not
be 1 either. It must be (at least) 4.
http://eel.is/c++draft/intro.object#9.sentence-2 is very clear that
[[no_unique_address]] (which clauses 7 and 8 define to mean a "subobject of
zero size") only allows members of *different types* to overlap. a,b,c, and d
are all distinct objects of the same type B, and therefore must have distinct
addresses.

> Two objects with overlapping lifetimes that are not bit-fields may have the 
> same address if one is nested within the other, or if at least one is a 
> subobject of zero size and they are of different types; otherwise, they have 
> distinct addresses and occupy disjoint bytes of storage.


https://godbolt.org/z/160XGN shows that some parts of gcc seem to understand
this rule, some something very strange must be going on.

[Bug c++/87312] New: statics in lambdas should be weak not local symbols

2018-09-14 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87312

Bug ID: 87312
   Summary: statics in lambdas should be weak not local symbols
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/oSuiQO

// IN HEADER:
inline auto lambda = [] () -> int* {
static int foo;
return 
};

inline int* func() {
static int foo;
return 
};

// NOT IN HEADER:
int* lambda_addr() {
return lambda();
}

int* func_addr() {
return func();
}

Both of the "foo" objects should have exactly one address in the whole program.
The "foo" in func() will work correctly, but the "foo" in the lambda will
incorrectly have one address in each TU where it is used.

[Bug c++/67012] decltype(auto) with trailing return type

2018-08-14 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67012

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #1 from Mathias Stearn  ---
Still repros with latest trunk builds, even with -pedantic:
https://godbolt.org/g/vMDcwg

[Bug c++/86276] New: Poor codegen when returning a std::vector

2018-06-21 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86276

Bug ID: 86276
   Summary: Poor codegen when returning a std::vector
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/aCiuAy

While I'm a bit sad that good() isn't just "ret", I think that the current
rules for allocation elision require something like that codegen.

I'd expect bad() to codegen to roughly the same as good(), but then rather than
calling operator delete, it should store [rax, rax + 1, rax + 22] to the three
qwords starting at the out pointer. Instead, it does things like checking if
the vector pointer it just zeroed is still zero to see if it needs to be
deleted. It also seems to register an exception landing pad (I'm guessing to
cover the operator new call) to handle freeing the vector's memory, even though
it should know it isn't holding any. 

If I had to guess, I'd say the problem is that it thinks the hidden return out
pointer has escaped when it hasn't really until a successful return. I'm pretty
sure nothing in the language allows any way to access the return value before a
function returns like that, but I'm now really curious if I'm wrong. If there
is, is there any way to tell gcc that I promise I'm not doing anything quite
that stupid?

PS- is it helpful to include the code and asm here in addition to the godbolt
links?

#include 
#include 

auto good() {
std::vector something;
something.reserve(22);
something = {0x02};
//return something; Only difference from bad
}


auto bad() {
std::vector something;
something.reserve(22);
something = {0x02};
return something;
}

good():
sub rsp, 8
mov edi, 22
calloperator new(unsigned long)
mov BYTE PTR [rax], 2
mov rdi, rax
add rsp, 8
jmp operator delete(void*)
bad():
pushrbp
pxorxmm0, xmm0
pushrbx
mov rbx, rdi
sub rsp, 24
mov QWORD PTR [rdi+16], 0
movups  XMMWORD PTR [rdi], xmm0
mov edi, 22
calloperator new(unsigned long)
mov rdi, QWORD PTR [rbx]
testrdi, rdi
je  .L5
mov QWORD PTR [rsp+8], rax
calloperator delete(void*)
mov rax, QWORD PTR [rsp+8]
.L5:
mov BYTE PTR [rax], 2
lea rdx, [rax+22]
mov QWORD PTR [rbx], rax
add rax, 1
mov QWORD PTR [rbx+8], rax
mov rax, rbx
mov QWORD PTR [rbx+16], rdx
add rsp, 24
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L6
bad() [clone .cold.25]:
.L6:
mov rdi, QWORD PTR [rbx]
testrdi, rdi
je  .L7
calloperator delete(void*)
.L7:
mov rdi, rbp
call_Unwind_Resume

[Bug middle-end/86140] New: constprop clones with identical bodies

2018-06-13 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86140

Bug ID: 86140
   Summary: constprop clones with identical bodies
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

This was tweaked example code to demonstrate something totally unrelated, but I
noticed it was doing the constprop clone thing for functions with identical
instructions, which seemed odd. While this exact case doesn't matter, it seemed
like it might be a sign of a deeper bug.

https://godbolt.org/g/Zwpi7J

[[gnu::noinline]] void f(const int*){
asm volatile("");
}

void good() {
constexpr static int arr[1000] = {};
f(arr);
}

void bad() {
constexpr  int arr[1000] = {};
f(arr);
}


f(int const*) [clone .constprop.0]:
  ret
f(int const*) [clone .constprop.1]:
  ret
f(int const*):
  ret
good():
  jmp f(int const*) [clone .constprop.0]
bad():
  jmp f(int const*) [clone .constprop.1]

[Bug c++/86072] New: Poor codegen with atomics

2018-06-06 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072

Bug ID: 86072
   Summary: Poor codegen with atomics
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/aEFhq8

#include 

std::atomic progress{-1};

void combine_writes1() {
// These should be reduced to a single store(0,release),
// At least as long as release-sequnce includes same-thread
// relaxed stores. If that is removed, this should just be
// a single relaxed store.
progress.store(0, std::memory_order_relaxed);
progress.store(0, std::memory_order_relaxed);
progress.store(0, std::memory_order_release);
progress.store(0, std::memory_order_release);
progress.store(0, std::memory_order_relaxed);
progress.store(0, std::memory_order_relaxed);
}

void combine_writes2() {
// Ditto above, but should store 5.
progress.store(0, std::memory_order_relaxed);
progress.store(1, std::memory_order_relaxed);
progress.store(2, std::memory_order_release);
progress.store(3, std::memory_order_release);
progress.store(4, std::memory_order_relaxed);
progress.store(5, std::memory_order_relaxed);
}

void combine_loads() {
// These should be reduced to either a single acquire-load
// or an acquire fence. 
progress.load(std::memory_order_relaxed);
progress.load(std::memory_order_relaxed);
progress.load(std::memory_order_acquire);
progress.load(std::memory_order_acquire);
progress.load(std::memory_order_relaxed);
progress.load(std::memory_order_relaxed);
}

combine_writes1():
  xor eax, eax
  mov DWORD PTR progress[rip], eax
  mov DWORD PTR progress[rip], eax
  mov DWORD PTR progress[rip], eax
  mov DWORD PTR progress[rip], eax
  mov DWORD PTR progress[rip], eax
  mov DWORD PTR progress[rip], eax
  ret
combine_writes2():
  mov DWORD PTR progress[rip], 0
  mov DWORD PTR progress[rip], 1
  mov DWORD PTR progress[rip], 2
  mov DWORD PTR progress[rip], 3
  mov DWORD PTR progress[rip], 4
  mov DWORD PTR progress[rip], 5
  ret
combine_loads():
  mov eax, DWORD PTR progress[rip]
  mov eax, DWORD PTR progress[rip]
  mov eax, DWORD PTR progress[rip]
  mov eax, DWORD PTR progress[rip]
  mov eax, DWORD PTR progress[rip]
  mov eax, DWORD PTR progress[rip]
  ret
progress:
  .long -1

This example seems to ICE segfaulting gcc trunk: https://godbolt.org/g/M4ZQGS


Possibly related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86056

[Bug middle-end/86056] Codegen can result in multiple sequential mfence instructions

2018-06-05 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86056

--- Comment #2 from Mathias Stearn  ---
Oh, I agree that this optimization must be permitted. I was using this example
to prove this to someone else who didn't believe that. I only mentioned that to
explain how that weird source code came to be.

My point of this bug was that the code gen can be further optimized because
there is never a good reason to have multiple mfence instructions back to back
like that.

[Bug target/86056] New: Codegen can result in multiple sequential mfence instructions

2018-06-05 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86056

Bug ID: 86056
   Summary: Codegen can result in multiple sequential mfence
instructions
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

This is from an example to prove that atomic_thread_fence does not prevent the
compiler from optimizing non-escaped memory.

https://godbolt.org/g/288mTC

#include 
#include 
struct Type {
Type(Type&&)=default;
int i;
};

Type func(Type t) {
auto out = Type(Type(std::move(t)));
std::atomic_thread_fence(std::memory_order_seq_cst);
return out;
}

auto func2(Type t) { return func(func(func(func(std::move(t); }

func(Type):
  movl (%rsi), %edx
  movq %rdi, %rax
  movl %edx, (%rdi)
  mfence
  ret
func2(Type):
  movl (%rsi), %edx
  movq %rdi, %rax
  mfence
  mfence
  mfence
  movl %edx, (%rdi)
  mfence
  ret

[Bug libstdc++/85813] make_exception_ptr should support __cxa_init_primary_exception path even with -fno-exceptions

2018-05-17 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85813

--- Comment #7 from Mathias Stearn  ---
> Simply returning an empty exception_ptr is what happened before the PR 64241
> change, so what we do now retains that behaviour. That might be the main
> reason for it.

Silently dropping errors always skeeves me out. I'm not sure if anyone is well
served by the current behavior. If it were up to me (and I know it isn't) I'd
make that case call std::terminate() or similar rather than returning the "no
error" value. That seems consistent with the behavior of the __throw*
functions, but it is a breaking change. Or even better, since gcc seems fine
throwing through functions marked noexcept in -fno-exceptions TUs, maybe in the
(very rare) case where copying/moving an exception throws inside an
-fno-exceptions TU, just let it bubble out.

> ::new (__e) _Ex(std::forward<_Ex>(__ex));

Should that be std::move(__ex)? I don't know why, but make_exception_ptr takes
the exception by value rather than forwarding ref. I guess forward<_Ex> is fine
either way, and will stay correct if that defect gets fixed. It just seemed odd
to forward a value so I thought I'd mention it.

> Mix-and-match works if the function gets inlined.

Do you think this case warrants a [[gnu::always_inline]]? Given the sensitive
nature of error handling, it seems pretty bad if a correct usage of
make_exception_ptr() could be silently transformed to return the "no error"
value just by linking in a bad library. Even if that library never dynamically
executes make_exception_ptr(). I know I'd hate to be called in to debug that
issue...

I'm going to go make sure no code we link with uses -fno-exceptions!

[Bug libstdc++/85813] make_exception_ptr should support __cxa_init_primary_exception path even with -fno-exceptions

2018-05-17 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85813

--- Comment #3 from Mathias Stearn  ---
My assumption was that if E(...) throws and it can't be caught, it should be
treated as any other case when an -fno-exceptions TU calls a throwing function.
In this case that would mean calling terminate() due to the noexcept, which
seems better than returning a null exception_ptr.

However, while testing the behavior of mixing -fno-exceptions TUs with normal
ones, I realized there may be a bigger problem due to ODR violations. In
particular, if you link these TUs without optimizations, one of the asserts
will trip depending on the link order:

// g++ -fno-exceptions -c
#include 
#include 
void no_exceptions() {
assert(!std::make_exception_ptr(1));
}

// g++ -fexceptions
#include 
#include 
void no_exceptions();
int main() {
assert(std::make_exception_ptr(1));
no_exceptions();
}

Is that just accepted, implying the the whole program must be compiled with
either -fexceptions or -fno-exeptions, rather than allowing mix-and-match? If
so, I guess this whole point is moot.

[Bug libstdc++/85813] New: make_exception_ptr should support __cxa_init_primary_exception path even with -fno-exceptions

2018-05-16 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85813

Bug ID: 85813
   Summary: make_exception_ptr should support
__cxa_init_primary_exception path even with
-fno-exceptions
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

It looks like nothing in that path actually needs exception support. This would
allow make_exception_ptr to produce a valid exception_ptr even in TUs that
don't support exceptions. While that may not sound valuable, the exception_ptr
could be handed to TUs that do support exceptions where it can be rethrown (a
misnomer in this case...) and caught.

Also, if my paper http://wg21.link/p1066 is accepted, that would provide full
support for handling the exception without ever throwing it. And if this ticket
is impossible to implement for some reason, I'd love to know that before
Rapperswil ;)

The nullptr return was added in response to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64241, but it looks like that was
done before the __cxa_init_primary_exception path existed. When that was added
in
https://github.com/gcc-mirror/gcc/commit/cce8e59224e18858749a2324bce583bcfd160d6c#diff-e1b5856b8cc940de58c12ef252d63c34R188,
I can't tell if it was put in the #if __cpp_exceptions block for a good reason
or just by default.

[Bug libstdc++/85812] New: make_exception_ptr can leak the allocated exception if construction throws

2018-05-16 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85812

Bug ID: 85812
   Summary: make_exception_ptr can leak the allocated exception if
construction throws
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

I think if this line throws the exception allocated on line 185 will leak:
https://github.com/gcc-mirror/gcc/blob/3474beffb1fd23da44a752763e648d5d1ffa4d0f/libstdc%2B%2B-v3/libsupc%2B%2B/exception_ptr.h#L189.

[Bug c++/85799] __builin_expect doesn't propagate through inlined functions

2018-05-15 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85799

--- Comment #1 from Mathias Stearn  ---
LLVM bug: https://bugs.llvm.org/show_bug.cgi?id=37476

[Bug c++/85799] New: __builin_expect doesn't propagate through inlined functions

2018-05-15 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85799

Bug ID: 85799
   Summary: __builin_expect doesn't propagate through inlined
functions
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

It seems like __builtin_expect doesn't propagate to conditions when inlined.
This is unfortunate because in some cases it would be nice to be able to put
the expectation into methods so that every user doesn't need to do their own
hinting. Currently we need to use macros to achieve this.

See https://godbolt.org/g/MuPM77 for full example

inline bool expect_false(bool b) {
return __builtin_expect(b, false);
}

void inline_func_hint(bool b) {
if (expect_false(b)) {
unlikely(); // treated as more likely!!!
} else {
likely();
}
}


inline_func_hint(bool):
testdil, dil
je  .L11
jmp unlikely()
.L11:
jmp likely()

[Bug c++/83400] g++ -O1 doesn't execute any code in destructors with a throw statement if it sees another throw

2018-05-15 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83400

--- Comment #3 from Mathias Stearn  ---
This is related to https://wg21.link/CWG2219

[Bug c++/85736] New: Support warn_unused or warn_unused_result on specific constructors

2018-05-10 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85736

Bug ID: 85736
   Summary: Support warn_unused or warn_unused_result on specific
constructors
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

It would be nice to get the benefits of those attributes on a per-constructor
basis, rather than requiring them on the whole type. The particular use case I
have in mind is for unique_lock's default constructor (or at least on our
wrapper around it). I recently did a code review where someone typed:

std::unique_lock myMutex;

where they meant to use:

std::unique_lock lk(myMutex);

There is currently no warning for this at -Wall -Wextra, although thankfully it
is at least caught when myMutex has parentheses around it, which is the more
common mistake. Clearly, it wouldn't make sense to put warn_unused on the whole
unique_lock since the second line is fine.

It would probably make sense on almost all default constructors actually, since
with the exception of a few specific types that alter global or thread-local
state, why are you declaring a default constructed variable then not using it
at all? But on a few types like unique_lock it seems actively dangerous rather
than just simply wasteful.

[Bug middle-end/85721] bad codegen for looped copy of primitives at -O2 and -O3 (differently bad)

2018-05-10 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721

--- Comment #4 from Mathias Stearn  ---
Marc Glisse pointed out at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720#c3 that my I missed an
aliasing case when I created this ticket. memmove isn't a valid replacement if
out is in the range (in, in + n). I did some benchmarking to see what the best
solution is and how much this matters. This seems to do the best on
sandybridge, haswell, and an Opteron 6344 Piledriver:

[[gnu::noinline, gnu::optimize("s")]] void copy0(char* out, const char* in,
size_t n) {
if (n >= 8 &&__builtin_expect(out >= in + n || out + n <= in, 1)) {
memcpy(out, in, n);
return;
}
for (size_t i = 0; i < n; i++){
out[i] = in[i];
}
}

copy0(char*, char const*, unsigned long):
cmp rdx, 7
jbe .L7
lea rax, [rsi+rdx]
cmp rdi, rax
jnb .L3
lea rax, [rdi+rdx]
cmp rsi, rax
jb  .L7
.L3:
jmp memcpy
.L7:
xor eax, eax
.L5:
cmp rax, rdx
je  .L1
mov cl, BYTE PTR [rsi+rax]
mov BYTE PTR [rdi+rax], cl
inc rax
jmp .L5
.L1:
ret

With char, it is substantially faster than the current codegen for the orignal
loop at -O2  and moderately faster than -O3, while being about 10% the size.
With a TriviallyCopiable type with a non-trivial default ctor, even -O3 does
byte-by-byte, so it is a substantial win there as well.

Let me know if you'd like me to post the benchmark I was using.

[Bug middle-end/85720] bad codegen for looped assignment of primitives at -O2

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

--- Comment #4 from Mathias Stearn  ---
(In reply to Marc Glisse from comment #3)
> Again, you are ignoring aliasing issues (just like in your other PR the
> function copy isn't equivalent to memmove). Does adding __restrict change
> the result? Also, B[i]=B[i]+1 doesn't look like a memset...

Sorry, I typoed. It was supposed to be B[i] = A[i] + 1. That still does
basically the same thing though: https://godbolt.org/g/dtmU5t. Good point about
aliasing though. I guess the right code gen in that case would actually be
something that detected the overlap and did the right calls to memset to only
set each byte once. Or just do the simple thing:

if (b > a && b < a + n) {
  memset(b, 1, n);
  memset(a, 0, n);
} else {
  memset(a, 0, n);
  memset(b, 1, n);
}

Yes, __restrict helps, but that isn't part of standard c++, and it seems like
it never will be.

[Bug middle-end/85721] bad codegen for looped copy of primitives at -O2 and -O3 (differently bad)

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721

--- Comment #3 from Mathias Stearn  ---
@Jonathan Wakely, that is because std::copy cheats and calls memmove directly.
A slight modification of the type that shouldn't matter defeats that
optimization and causes both forms to degrade to byte-by-byte:
https://godbolt.org/g/Z4fWNT. 

I actually consider that the fact that the explicit optimization in std::copy
is necessary should be considered an optimizer bug. Why isn't the optimizer
noticing that the simple implementation is the same as a memmove and do the
transformation for you? It generally seems unfortunate if similar code that
clearly means the same thing results in very different performance. Ideally,
even providing user-defined inline copy operations should result in calling
memmove if they are equivalent since the compiler should be able to "see
through" the non-triviality and optimize it all away.

I'm filing these tickets based on what Martin Sebor said: "As an aside, using
assignment and copy constructors should be at least as efficient as calling
memset and memcpy, and ideally more, and they should always be preferred over
the raw memory functions.  If/where they aren't as efficient please open bugs
for missing optimizations." Do you disagree with that statement?

[Bug middle-end/85720] bad codegen for looped assignment of primitives at -O2

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

--- Comment #2 from Mathias Stearn  ---
Hmm. Taking the example from the -ftree-loop-distribute-patterns documentation,
it still seems to generate poor code, this time at both -O2 and -O3:
https://godbolt.org/g/EsQDj8

Why isn't that transformed to memset(A, 0, N); memset(B, 1, N); ? This feels
similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721. Should I make a
new ticket with this example?

[Bug c++/85721] New: bad codegen for looped copy of primitives at -O2 and -O3 (differently bad)

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721

Bug ID: 85721
   Summary: bad codegen for looped copy of primitives at -O2 and
-O3 (differently bad)
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/Gg9fFt

Related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720, but filed
separately because this also affects -O3. Similarly, while this affects types
other than char, char is most egregious.

using SIZE_T = decltype(sizeof(0));
void copy(char* out, const char* in, SIZE_T n) {
for (SIZE_T i = 0; i < n; i++){
out[i] = in[i];
}
}

This should probably just be compiled to check size then jmp memmove. At O2 it
copies byte-by-byte:

copy(char*, char const*, unsigned long):
testrdx, rdx
je  .L1
xor eax, eax
.L3:
movzx   ecx, BYTE PTR [rsi+rax]
mov BYTE PTR [rdi+rax], cl
add rax, 1
cmp rdx, rax
jne .L3
.L1:
ret


At O3 it generates a TON of code:

copy(char*, char const*, unsigned long):
  test rdx, rdx
  je .L1
  lea rax, [rsi+16]
  cmp rdi, rax
  lea rax, [rdi+16]
  setnb cl
  cmp rsi, rax
  setnb al
  or cl, al
  je .L7
  lea rax, [rdx-1]
  cmp rax, 14
  jbe .L7
  mov rcx, rdx
  xor eax, eax
  and rcx, -16
.L4:
  movdqu xmm0, XMMWORD PTR [rsi+rax]
  movups XMMWORD PTR [rdi+rax], xmm0
  add rax, 16
  cmp rax, rcx
  jne .L4
  mov rax, rdx
  and rax, -16
  cmp rdx, rax
  je .L1
  movzx ecx, BYTE PTR [rsi+rax]
  mov BYTE PTR [rdi+rax], cl
  lea rcx, [rax+1]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+1+rax]
  mov BYTE PTR [rdi+1+rax], cl
  lea rcx, [rax+2]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+2+rax]
  mov BYTE PTR [rdi+2+rax], cl
  lea rcx, [rax+3]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+3+rax]
  mov BYTE PTR [rdi+3+rax], cl
  lea rcx, [rax+4]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+4+rax]
  mov BYTE PTR [rdi+4+rax], cl
  lea rcx, [rax+5]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+5+rax]
  mov BYTE PTR [rdi+5+rax], cl
  lea rcx, [rax+6]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+6+rax]
  mov BYTE PTR [rdi+6+rax], cl
  lea rcx, [rax+7]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+7+rax]
  mov BYTE PTR [rdi+7+rax], cl
  lea rcx, [rax+8]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+8+rax]
  mov BYTE PTR [rdi+8+rax], cl
  lea rcx, [rax+9]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+9+rax]
  mov BYTE PTR [rdi+9+rax], cl
  lea rcx, [rax+10]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+10+rax]
  mov BYTE PTR [rdi+10+rax], cl
  lea rcx, [rax+11]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+11+rax]
  mov BYTE PTR [rdi+11+rax], cl
  lea rcx, [rax+12]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+12+rax]
  mov BYTE PTR [rdi+12+rax], cl
  lea rcx, [rax+13]
  cmp rdx, rcx
  jbe .L1
  movzx ecx, BYTE PTR [rsi+13+rax]
  mov BYTE PTR [rdi+13+rax], cl
  lea rcx, [rax+14]
  cmp rdx, rcx
  jbe .L1
  movzx edx, BYTE PTR [rsi+14+rax]
  mov BYTE PTR [rdi+14+rax], dl
  ret
.L7:
  xor eax, eax
.L3:
  movzx ecx, BYTE PTR [rsi+rax]
  mov BYTE PTR [rdi+rax], cl
  add rax, 1
  cmp rdx, rax
  jne .L3
.L1:
  ret

A) This should probably just call memmove which has a tuned implementation for
many architectures and uses ifunc dispatch to choose the right one based on the
runtime CPU rather than the compile-time settings. Also, all functions like
this for all types would all just jump to a single function, there should be I$
advantages.

B) If you really want to emit code for this rather than calling into libc, it
is probably best to use their technique of overlapped reads and writes for the
last vector rather than going into an unrolled byte-by-byte loop:
https://github.molgen.mpg.de/git-mirror/glibc/blob/20003c49884422da7ffbc459cdeee768a6fee07b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S#L331-L335

[Bug c++/85720] New: bad codegen for looped assignment of primitives at -O2

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

Bug ID: 85720
   Summary: bad codegen for looped assignment of primitives at -O2
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/qp19Cv

using SIZE_T = decltype(sizeof(0));
void fill(char* p, SIZE_T n) {
for (SIZE_T i = 0; i < n; i++){
p[i] = -1;
}
}

fill(char*, unsigned long):
testrsi, rsi
je  .L1
add rsi, rdi
.L3:
mov BYTE PTR [rdi], -1
add rdi, 1
cmp rdi, rsi
jne .L3
.L1:
ret

At -O3 it basically just tail-calls memset.

Also applies to other types than char, but char is most egregious.

This ticket is spun out of from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85707

[Bug c++/85707] -Wclass-memaccess should excempt safe usage inside of a class and its friends

2018-05-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85707

--- Comment #8 from Mathias Stearn  ---
>  If the constructor had other side-effects (e.g., count
>  the number of objects of the class) bypassing it could
>  violate the invariant.

I thought one of the points of friendship was to allow friends to violate a
class's invariants temporarily as long as they promise that the class is left
in a valid state in the end. Since presumably a class and its friends are
maintained by the same people, they are aware of what the actual requirements
of the class are, even if they can't be stated precisely in the language today.

[Bug c++/85707] -Wclass-memaccess should excempt safe usage inside of a class and its friends

2018-05-08 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85707

--- Comment #2 from Mathias Stearn  ---
Here is a boiled down example of some of our code that trips this warning:
https://godbolt.org/g/ChLrch. It also shows why we do this, since the codegen
is *substantially* better for init_table_memset than init_table_assignment, at
least at -O2. Even if you improve the codegen for that case tomorrow, we'd
still need to keep using memset for a while until we stop supporting older
compilers.

This is reduced from
https://github.com/mongodb/mongo/blob/11a3d5ccb1216da0e84d941fd48e486f72455ba4/src/mongo/db/pipeline/document_internal.h.
The actual key type is stored as a variable-lenth string stored directly in the
buffer and the Key type in the interface is our pre-17 string_view equivalent.
The value is actually a type called Value, that holds an internal 16-byte type
called ValueStorage as its only member. ValueStorage also triggers the warning
in its lifetime methods:
https://github.com/mongodb/mongo/blob/11a3d5ccb1216da0e84d941fd48e486f72455ba4/src/mongo/db/pipeline/value_internal.h#L165-L221
(the DEV macro expands to "if (DEBUG_BUILD)" so you can ignore anything on
those lines). If necessary I can try to boil down that type too, but it is much
more complex, so I'm not sure how small I can make it.

This is all to implement what is essentially a dynamically-typed JSON object
which is something we need to be *REALLY* fast. A lot of effort went into
micro-optimizing these types so that the business logic code doesn't need to
worry about any of this and can write very natural looking, modern c++ code.
All of this memory-munging is hidden in internal types in _internal.h files.
The user facing types are not supposed to expose any of this, except to the
implementations which are all friendly which each other.

The third_party code that is tripping this is in S2. It tries to memcpy
https://github.com/mongodb/mongo/blob/11a3d5ccb1216da0e84d941fd48e486f72455ba4/src/mongo/db/pipeline/value_internal.h#L165-L221
an array of S2Points (typedef for Vector3)
https://github.com/mongodb/mongo/blob/11a3d5ccb1216da0e84d941fd48e486f72455ba4/src/third_party/s2/util/math/vector3.h#L30.
This doesn't meet the self-or-friend condition described in the ticket, so I'm
not sure what the solution there is, but it is an example of code that is
correct (I think, I haven't reviewed it *too* closely) but still triggers this
warning.

[Bug c++/85707] New: -Wclass-memaccess should excempt safe usage inside of a class and its friends

2018-05-08 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85707

Bug ID: 85707
   Summary: -Wclass-memaccess should excempt safe usage inside of
a class and its friends
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

I get the point of the warning and would like to be able to turn it on. However
we have some types that have been specifically designed to be memset and
memcpy/memmove-able as long as certain rules are followed. This is an
implementation detail of the type so non-friend users are expected to use the
default/copy/move constructors, rather than directly manipulating the bytes.
However classes (and their friends) are always allowed to violate their own
invariants, even if external users aren't. That is why I think the warning
should be suppressed* inside of contexts that have access to internals.

*Ideally it would still trip if the class's members were not mem-accessible by
it, but it seems more important (to me) to avoid the false-positives than to
avoid these kinds of false negatives if only one is possible.

I already know about the void*-cast to suppress the warning. I tried doing that
in our code base and it was required in too many places, all of which were
correct (as in no actual misuse was found). Additionally, this trips in
third-party code that we'd rather not alter unless there is an actual bug.

As a potential alternative, adding an attribute like [[gnu::memaccessable]]
that can be put on a type to suppress the warning for uses of that type might
also work.

[Bug c++/85680] Missed optimization for value-init of variable-sized allocation

2018-05-07 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85680

--- Comment #3 from Mathias Stearn  ---
MSVC and ICC both also handle this poorly: https://godbolt.org/g/i4wMYa

https://developercommunity.visualstudio.com/content/problem/246786/poor-codegen-for-value-init-followed-by-explicit-i.html

[Bug c++/85680] Missed optimization for value-init of variable-sized allocation

2018-05-07 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85680

--- Comment #2 from Mathias Stearn  ---
FYI, I don't think this is a signed/unsigned thing since it also repros with
unsigned long https://godbolt.org/g/LTmrpi

My initial repo actually used size_t, but I (incorrectly) changed it to long
rather than unsigned long based on past experience that compiler authors prefer
repros that don't have any #includes and the (IMO) sad fact that size_t still
isn't actually part of the language.

[Bug c++/85680] New: Missed optimization for value-init of variable-sized allocation

2018-05-07 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85680

Bug ID: 85680
   Summary: Missed optimization for value-init of variable-sized
allocation
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/6yZEKf (compiled with -O3)


char* variableSize(long n) {
auto p = new char[n]();
for (int i = 0; i < n; i++) {
p[i] = 0xff;
}
return p;
}

char* small() {
return variableSize(8);
}

char* large() {
return variableSize(10'000);
}


The variableSize() case generates two calls two memset with the both being
conditional on n > 0 if I'm reading this right), but the two checks are done in
different ways. That may be a second optimizer bug that it missed a
jump-threading opportunity.

variableSize(long):
pushrbx
mov rbx, rdi
calloperator new[](unsigned long)
mov rcx, rax
mov rax, rbx
sub rax, 1
js  .L5
lea rax, [rbx-2]
mov edx, 1
mov rdi, rcx
cmp rax, -1
cmovge  rdx, rbx
xor esi, esi
callmemset
mov rcx, rax
.L5:
testrbx, rbx
jle .L1
mov rdi, rcx
mov rdx, rbx
mov esi, 255
callmemset
mov rcx, rax
.L1:
mov rax, rcx
pop rbx
ret

Note that when g++ can see the allocation size it *does* elide the initial
memset:


small(): # @small()
  push rax
  mov edi, 8
  call operator new[](unsigned long)
  mov qword ptr [rax], -1
  pop rcx
  ret
large(): # @large()
  push rbx
  mov edi, 1
  call operator new[](unsigned long)
  mov rbx, rax
  mov esi, 255
  mov edx, 1
  mov rdi, rax
  call memset
  mov rax, rbx
  pop rbx
  ret



Related clang bug: https://bugs.llvm.org/show_bug.cgi?id=37351

[Bug c++/83400] New: g++ -O1 doesn't execute any code in destructors with a throw statement if it sees another throw

2017-12-12 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83400

Bug ID: 83400
   Summary: g++ -O1 doesn't execute any code in destructors with a
throw statement if it sees another throw
   Product: gcc
   Version: 7.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Repro code below doesn't print the "in dtor" line when compiled with -O1 or
higher. It does with g++ -O0 and with clang++ at any optimization level. Note
that even with the raise() call enabled, which would prevent the second
exception from firing, it still won't print "in dtor".

I've reproed with g++ 4.8.2, 5.4.0, 7.2.0, and 7.2.1, all on linux. No warnings
are output even with -Wall -Wextra.


#include 
#include 
#include 
#include 

struct ThrowInDtor{
~ThrowInDtor() noexcept(false) {
std::cout << "in dtor"  << std::endl; // Doesn't print.
// raise(SIGINT); // Still repros with this commented in.
throw std::exception();
}
};

int main() {
try {
ThrowInDtor throws;
throw std::exception();
} catch (std::exception&) {
// unreachable
}
}

[Bug tree-optimization/77943] [5/6 Regression] Optimization incorrectly commons noexcept calls with non-noexcept calls

2016-10-17 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77943

--- Comment #12 from Mathias Stearn  ---
We were hoping to replace many places in our code that does the following with
noexcept attributes on functions:

> try {doStuffThatShouldntThrow();} catch(...) {std::terminate();}

We wanted to take advantage of noexcept preventing exceptions from propagating
in a way that causes the terminate handler to be invoked at the throw site
rather than the catch site. We've avoided doing this so far because of the bugs
that we've found, leaving us with low confidence that we can rely on it. Can
you think of any similar cases that are likely to cause issues? Where do you
think it makes sense to focus our testing?

The scariest issues we've seen were a combination of failures to enforce
noexcept (like this bug) combined with noexcept causing try/catch blocks to be
optimized out. That lead to an exception thrown from a noexcept function
bypassing the surrounding catch block and escaping two layers of protection and
reaching code that really should never see them. Are there any compiler flags
we can use to tell it not to eliminate the catches based on noexcept
annotations?

We've also noticed that 'throw()' annotations have fewer issues, but have
avoided them because they are deprecated. As compiler writers, would you trust
'throw()' more or suggest we stick with noexcept?

[Bug c++/77943] Optimization incorrectly commons noexcept calls with non-noexcept calls

2016-10-11 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77943

--- Comment #3 from Mathias Stearn  ---
> Not being a C++ expect, but following spec:
> http://en.cppreference.com/w/cpp/language/noexcept_spec
> 
> "If a search for a matching exception handler leaves a function marked
> noexcept or noexcept(true), std::terminate is called immediately."
> 
> Which is probably what happens in fatal() function

The problem is that it merges the calls to fatal() and notFatal() into a single
call with no branch, so there is no way to only call std::terminate() if the
call to throws() was through fatal(). (In reply to Martin Liška from comment
#2)

PS - I realized I uploaded an old version of the repro that shows the alternate
incorrect behavior of always calling std::terminate even when it should go
through nonFatal(). Sorry about that. If you reverse the branches of the
if/else you'll get the behavior I describe in the initial report.

[Bug c++/77943] Optimization incorrectly commons noexcept calls with non-noexcept calls

2016-10-11 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77943

--- Comment #1 from Mathias Stearn  ---
Created attachment 39787
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39787=edit
Reproducer

[Bug c++/77943] New: Optimization incorrectly commons noexcept calls with non-noexcept calls

2016-10-11 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77943

Bug ID: 77943
   Summary: Optimization incorrectly commons noexcept calls with
non-noexcept calls
   Product: gcc
   Version: 6.2.1
Status: UNCONFIRMED
  Severity: critical
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

When compiled with -O0 and -01 the program behaves correctly and only calls
std::terminate() when passed an argument. With -O2 and -O3 it doesn't call
std::terminate() which means that it allowed an exception to escape from a
noexcept function in violation of the standard. 

> g++ -O0 noexcept_test.cpp && ./a.out ; ./a.out 1
should die: 0
should die: 1
terminate called after throwing an instance of 'int'
Aborted (core dumped)

> g++ -O1 noexcept_test.cpp && ./a.out ; ./a.out 1
should die: 0
should die: 1
terminate called after throwing an instance of 'int'
Aborted (core dumped)

> g++ -O2 noexcept_test.cpp && ./a.out ; ./a.out 1
should die: 0
should die: 1

> g++ -O3 noexcept_test.cpp && ./a.out ; ./a.out 1
should die: 0
should die: 1

The actual behavior with -O2 and -O3 depends on which way the if block is
written. The attached version is the scariest, but if you swap the if and else
blocks and remove the ! from the condition, the executable will always call
std::terminate() even when it shouldn't.

[Bug c++/69300] New: g++ segfault on silly noexcept case

2016-01-15 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69300

Bug ID: 69300
   Summary: g++ segfault on silly noexcept case
   Product: gcc
   Version: 5.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Created attachment 37359
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37359=edit
Preprocessed test case

This is a reduced example from a ridiculous code contest. It compiles with
clang++ but segfaults during compilation on g++. It isn't something that I
actually need to work, but I thought I'd report it since it probably shouldn't
be crashing g++.

Code:

template
struct F {
template
void f() && noexcept(::template f) {}
};

int main (int argc, char ** argv) {
F().f();
}

g++ output:

> g++ -v -save-temps -std=c++14 bug.cpp
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-5.2.0/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --enable-libmpx --with-system-zlib --with-isl
--enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu
--disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu
--enable-gnu-indirect-function --disable-multilib --disable-werror
--enable-checking=release --with-default-libstdcxx-abi=gcc4-compatible
Thread model: posix
gcc version 5.2.0 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++14' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/cc1plus -E -quiet -v -D_GNU_SOURCE
bug.cpp -mtune=generic -march=x86-64 -std=c++14 -fpch-preprocess -o bug.ii
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../x86_64-unknown-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0

/usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/x86_64-unknown-linux-gnu

/usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../include/c++/5.2.0/backward
 /usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/include
 /usr/local/include
 /usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++14' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/cc1plus -fpreprocessed bug.ii
-quiet -dumpbase bug.cpp -mtune=generic -march=x86-64 -auxbase bug -std=c++14
-version -o bug.s
GNU C++14 (GCC) version 5.2.0 (x86_64-unknown-linux-gnu)
compiled by GNU C version 5.2.0, GMP version 6.0.0, MPFR version
3.1.3-p4, MPC version 1.0.3
warning: GMP header version 6.0.0 differs from library version 6.1.0.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++14 (GCC) version 5.2.0 (x86_64-unknown-linux-gnu)
compiled by GNU C version 5.2.0, GMP version 6.0.0, MPFR version
3.1.3-p4, MPC version 1.0.3
warning: GMP header version 6.0.0 differs from library version 6.1.0.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 9ed1d81099b98de89457560501a9ea0c
g++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.archlinux.org/> for instructions.