from:"gabravier at gmail dot com via Gcc\-bugs"

[Bug ipa/114408] New: Crash when invoking strcmp multiple times with -fsanitize=undefined -O1 -fanalyzer -flto

2024-03-20 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114408

Bug ID: 114408
   Summary: Crash when invoking strcmp multiple times with
-fsanitize=undefined -O1 -fanalyzer -flto
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int main(){}

int HMAP_unset_copy(const char *key) {
return __builtin_strcmp("a", key) + __builtin_strcmp("a", key);
}

Compiling this program with `-fsanitize=undefined -O1 -fanalyzer -flto` results
in the following:

: In function 'HMAP_unset_copy':
:4:41: warning: check of 'key_4(D)' for NULL after already
dereferencing it [-Wanalyzer-deref-before-check]
4 | return __builtin_strcmp("a", key) + __builtin_strcmp("a", key);
  | ^
  'HMAP_unset_copy': events 1-2
|
|4 | return __builtin_strcmp("a", key) + __builtin_strcmp("a",
key);
|  |^~
|  |||
|  ||(2) pointer 'key_4(D)' is
checked for NULL here but it was already dereferenced at (1)
|  |(1) pointer 'key_4(D)' is dereferenced here
|
during IPA pass: whole-program
At top level:
lto1: internal compiler error: in release_function_body, at cgraph.cc:1813
0x221519c internal_error(char const*, ...)
???:0
0x926a67 fancy_abort(char const*, int, char const*)
???:0
0xa1a687 cgraph_node::release_body(bool)
???:0
0xa1c2d2 cgraph_node::remove()
???:0
0xcea661 symbol_table::remove_unreachable_nodes(_IO_FILE*)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
lto-wrapper: fatal error: /opt/compiler-explorer/gcc-snapshot/bin/gcc returned
1 exit status
compilation terminated.
/opt/compiler-explorer/gcc-trunk-20240320/bin/../lib/gcc/x86_64-linux-gnu/14.0.1/../../../../x86_64-linux-gnu/bin/ld:
error: lto-wrapper failed
collect2: error: ld returned 1 exit status
Compiler returned: 1

[Bug rtl-optimization/114176] New: Failure to optimize out useless stack spill when array is present in union

2024-02-29 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114176

Bug ID: 114176
   Summary: Failure to optimize out useless stack spill when array
is present in union
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

struct Vec3
{
union {
float v[3];
float x, y, z;
};
};

float Vec3Dot(struct Vec3 vec1, struct Vec3 vec2)
{
return vec1.x;
}

on x86-64, with -O3, GCC outputs:

Vec3Dot:
  movq QWORD PTR [rsp-16], xmm0
  movss xmm0, DWORD PTR [rsp-16]
  ret

LLVM instead outputs this:

Vec3Dot:
  ret

Stack spilling here seems to occur for no reason when the `float v[3];`
declaration is present.

This looks like an RTL issue to me as the final optimized tree pass looks the
same with or without the `float v[3];` declaration being present, and the issue
seems to be present on multiple targets, though to different degrees: only on
x86-64 have I seen it actually result in a stack spill (on e.g. AArch64 and
RISC-V, it only makes GCC outputs unnecessary instructions to adjust `sp` but
does not actually spill the value).

[Bug c++/113812] Comma expression parsed as declaration when ambiguous type name cast is present

2024-02-07 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113812

--- Comment #2 from Gabriel Ravier  ---
Also I guess this is a simpler minimal example:

void f(int x)
{
int(x), 0;
}

[Bug c++/113812] New: Comma expression parsed as declaration when ambiguous type name cast is present

2024-02-07 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113812

Bug ID: 113812
   Summary: Comma expression parsed as declaration when ambiguous
type name cast is present
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f()
{
int(x), a, b, c, d, e, f, g, h, etc;
int(x), a, b, c, d, e, f, g, h, etc, (new int);
}

Clang parses this fine, but GCC errors out, complaining that:

: In function 'void f()':
:4:9: error: redeclaration of 'int x'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  | ^
:3:9: note: 'int x' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  | ^
:4:13: error: redeclaration of 'int a'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  | ^
:3:13: note: 'int a' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  | ^
:4:16: error: redeclaration of 'int b'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |^
:3:16: note: 'int b' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |^
:4:19: error: redeclaration of 'int c'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |   ^
:3:19: note: 'int c' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |   ^
:4:22: error: redeclaration of 'int d'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |  ^
:3:22: note: 'int d' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |  ^
:4:25: error: redeclaration of 'int e'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  | ^
:3:25: note: 'int e' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  | ^
:4:28: error: redeclaration of 'int f'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |^
:3:28: note: 'int f' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |^
:4:31: error: redeclaration of 'int g'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |   ^
:3:31: note: 'int g' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |   ^
:4:34: error: redeclaration of 'int h'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |  ^
:3:34: note: 'int h' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  |  ^
:4:37: error: redeclaration of 'int etc'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  | ^~~
:3:37: note: 'int etc' previously declared here
3 | int(x), a, b, c, d, e, f, g, h, etc;
  | ^~~
:4:43: error: expected unqualified-id before 'new'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |   ^~~
:4:43: error: expected ')' before 'new'
4 | int(x), a, b, c, d, e, f, g, h, etc, (new int);
  |  ~^~~
  |   )

It appears that GCC is treating the second statement in the function as a
declaration instead of a comma expression, which Clang does.

[Bug c/113650] New: __builtin_nonlocal_goto ICEs when passed 0 as arguments

2024-01-29 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113650

Bug ID: 113650
   Summary: __builtin_nonlocal_goto ICEs when passed 0 as
arguments
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f() {
__builtin_nonlocal_goto(0, 0);
}

This crashes GCC with the following error:

during RTL pass: expand
: In function 'f':
:2:9: internal compiler error: in int_mode_for_mode, at
stor-layout.cc:407
2 | __builtin_nonlocal_goto(0, 0);
  | ^
0x23382dc internal_error(char const*, ...)
???:0
0x96bd77 fancy_abort(char const*, int, char const*)
???:0
0xc569ae emit_move_insn_1(rtx_def*, rtx_def*)
???:0
0xc56d40 emit_move_insn(rtx_def*, rtx_def*)
???:0
0xc2c376 copy_to_reg(rtx_def*)
???:0
0xaf9911 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
???:0
0xc53d5c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug middle-end/107845] __builtin_init_trampoline ICEs on invalid arguments

2024-01-28 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107845

--- Comment #2 from Gabriel Ravier  ---
I'll add that the new `__builtin_init_heap_trampoline` builtin also ICEs when
given the same arguments, presumably for the same reasons (thus, an extra bug
report doesn't seem very useful)

[Bug c/113647] New: __builtin_eh_return_data_regno ICEs when passed -1 as argument

2024-01-28 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113647

Bug ID: 113647
   Summary: __builtin_eh_return_data_regno ICEs when passed -1 as
argument
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int f()
{
return __builtin_eh_return_data_regno(-1);
}

This crashes GCC with the following error:

during RTL pass: expand
: In function 'int f()':
:440:42: internal compiler error: in tree_to_uhwi, at tree.cc:6472
  440 | return __builtin_eh_return_data_regno(-1);
  |~~^~~~
0x2647edc internal_error(char const*, ...)
???:0
0xa51cb7 fancy_abort(char const*, int, char const*)
???:0
0xf34aa6 expand_builtin_eh_return_data_regno(tree_node*)
???:0
0xf6381c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
???:0
0xf6ef5e store_expr(tree_node*, rtx_def*, int, bool, bool)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

This seems to be due to the following code:

```
rtx
expand_builtin_eh_return_data_regno (tree exp)
{
  tree which = CALL_EXPR_ARG (exp, 0);
  unsigned HOST_WIDE_INT iwhich;

  if (TREE_CODE (which) != INTEGER_CST)
{
  error ("argument of %<__builtin_eh_return_regno%> must be constant");
  return constm1_rtx;
}

  iwhich = tree_to_uhwi (which); // <-- THIS
```

wherein it looks like `tree_to_uhwi` asserts that its argument is non-negative,
meaning -1 (and all other negative numbers) fail.

[Bug c/29970] mixing ({...}) with VLA leads to massive breakage

2024-01-25 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29970

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #19 from Gabriel Ravier  ---
Can also confirm this myself as I've also encountered this ICE in this code:

#include 

#define each(item, array) \
(typeof(*(array)) *foreach_p = (array), *foreach_p2 = foreach_p, (item) = {}; \
foreach_p < &((foreach_p2)[sizeof(array)/sizeof(*array)]); \
++foreach_p)if((__builtin_memcpy(&(item), foreach_p, sizeof((item, 0){}else

#define range1(_stop) (({ \
typeof(_stop) stop = _stop; \
struct{typeof((stop)) array[stop];}p = {}; \
if(stop < 0){ \
for(size_t i = 0; i > stop; --i) \
p.array[-i] = i; \
}else{ \
for(size_t i = 0; i < stop; ++i) \
p.array[i] = i; \
} \
p; \
}).array)

int main(){
char group[][4] = {
"egg",
"one",
"two",
"moo",
};
for each(x, group){
puts(x);
}
return sizeof(range1(6));
}

which I was able to minify to:

void f()
{
  (void)({
int x = 1;
struct {
  int array[x];
} p;
p;
  });
}

which roughly matches what testcase 2 does.

[Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants

2024-01-12 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

--- Comment #5 from Gabriel Ravier  ---
It does seem as though this transformation is not particularly favorable on
most platforms. In fact, it seems as though the opposite transformation (which
Clang does on many targets, along with MSVC) would be useful on most target,
with some exceptions, including:

- PowerPC, on which llvm-mca appears to consider `srdi.` to be faster than
`cmplwi`

- MIPS16, though I am unsure of this - GCC code generation is messy on there
and I have trouble getting llvm-mca to parse GCC's output, but it seems to
consider loading the constant from memory to be far slower than even doing the
shift in two steps (which MIPS16 apparently requires, given GCC emits two `srl
$4, $4, 8` instructions to do the shift)

- Loongarch, which seems to give code for `x < 0x1` that I would have a
hard time imagining being faster than a single shift given that it outputs
this:
  lu12i.w $r12,61440>>12 # 0xf000
  ori $r5,$r12,4095
  sltu $r4,$r5,$r4
  xori $r6,$r4,1
  andi $r4,$r6,1
whereas a shift outputs this:
  bstrpick.d $r4,$r4,31,16
  sltui $r4,$r4,1
(note: I am not too certain for some of these, but it also seems like Alpha,
C6x, FR-V, RISC-V 64 and Sparc emit much smaller code sequences (i.e. 2-3 times
smaller) that look faster at first glance for the shifting version as compared
to the comparing version)


(PS: Given I do not have a server farm containing every single target GCC
supports for the purposes of benchmarking this, I'm mostly assuming this from
manually peeking at assembly output to try and guess which would be better and
what from looking at what llvm-mca considers to be the faster instruction
sequence on the targets it supports, so potentially llvm-mca and me could just
be wrong, though I would hope LLVM correctly models the performance of the
chips it targets...)

[Bug c/113262] New: ICE when using [[gnu::copy("")]] attribute

2024-01-07 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113262

Bug ID: 113262
   Summary: ICE when using [[gnu::copy("")]] attribute
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int[[gnu::copy("")]]a;

This crashes trunk GCC with the following error:

:1:1: internal compiler error: tree check: expected tree that contains
'decl minimal' structure, have 'integer_type' in handle_copy_attribute, at
c-family/c-attribs.cc:3150
1 | int[[gnu::copy("")]]a;
  | ^~~
0x232d4fc internal_error(char const*, ...)
???:0
0x87efb9 tree_contains_struct_check_failed(tree_node const*,
tree_node_structure_enum, char const*, int, char const*)
???:0
0x983af3 decl_attributes(tree_node**, tree_node*, int, tree_node*)
???:0
0x996328 finish_declspecs(c_declspecs*)
???:0
0xa1c31b c_parse_file()
???:0
0xa93da9 c_common_parse_file()
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

[Bug middle-end/109986] missing fold (~a | b) ^ a => ~(a & b)

2023-08-01 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109986

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #6 from Gabriel Ravier  ---
Seems to be fixed on trunk, except that I've noticed that the f0 example does
some weird operations on BPF, but that seems like a separate issue.

[Bug tree-optimization/94911] Failure to optimize comparisons of VLA sizes

2023-05-13 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911

--- Comment #5 from Gabriel Ravier  ---
Also, as an extra note, w.r.t.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911#c3, I've just noticed that I
had indeed made a separate bug report at https://gcc.gnu.org/PR94912 (which
ended up being closed as a duplicate of https://gcc.gnu.org/PR68531) - just
wanted to clarify that so nobody ends up filing more duplicates like I almost
just did

[Bug target/104375] [x86] Failure to recognize bzhi pattern when shr is present

2023-02-18 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104375

--- Comment #4 from Gabriel Ravier  ---
So should the bug be marked as fixed or... ?

[Bug tree-optimization/98966] Failure to optimize conditional or with 1 based on boolean condition to direct or

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98966

--- Comment #3 from Gabriel Ravier  ---
Appears to be fixed on trunk.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #11 from Gabriel Ravier  ---
It appears like this is fixed on trunk, I think ?

[Bug rtl-optimization/96692] Failure to optimize xor+or+xor to andnot+xor

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96692

--- Comment #3 from Gabriel Ravier  ---
This seems to be fixed on trunk now, I think ?

[Bug target/95427] Failure to avoid emitting rbp initialization when doing 256-bit memory store

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95427

--- Comment #2 from Gabriel Ravier  ---
Still appears to be fixed on trunk.

[Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #3 from Gabriel Ravier  ---
Looks like this gives much better output now.

[Bug tree-optimization/94899] Failure to optimize out add before compare with INT_MIN

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899

--- Comment #7 from Gabriel Ravier  ---
I don't know if I've missed something obvious but this still appears to be
fixed.

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

--- Comment #2 from Gabriel Ravier  ---
Appears to be fixed on trunk.

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #19 from Gabriel Ravier  ---
(In reply to Jakub Jelinek from comment #14)
> The patch does:
> +  bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval)
> == 2;
> +
> +  /* Skip if there is no value defined at zero, or if we can't easily
> +return the correct value for zero.  */
> +  if (!zero_ok)
> +   return false;
> +  if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
> +   return false;
> For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
> need
> to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
> GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
> basic blocks right now), where there is a high chance that RTL opts would
> turn it back into unconditional
> ctz.
> That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is
> 0 there.
> We could handle even that case by doing the branches around, but those would
> stay there
> in the generated code, at which point I wonder whether it would be a win. 
> The original
> code is branchless...

If the original code being branchless makes it faster, wouldn't that imply that
we should use the table-based implementation when generating code for
`__builtin_ctz` ?

[Bug tree-optimization/90838] Detect table-based ctz implementation

2023-02-16 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #12 from Gabriel Ravier  ---
It appears this new optimization is non-functional on trunk with x86-64...
specifically on x86-64, too, on AArch64 it works just fine. So does that mean
this bug should be re-opened or should a new bug be opened for that ?

[Bug tree-optimization/92342] [10/11/12/13 Regression] a small missed transformation into x?b:0

2023-01-14 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342

--- Comment #29 from Gabriel Ravier  ---
Looks like the patch fixes this bug, unless I'm missing something.

[Bug middle-end/107115] Wrong codegen from TBAA under stores that change effective type?

2022-12-27 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115

--- Comment #14 from Gabriel Ravier  ---
Actually I think there's some aliasing violations in the C++ code w.r.t. the
re-usage of `p4` after another object has been created in its place so I think
this code would be more correct:

void test1(long *p1)
{
p1 = (long *)new ((char*)p1) char[sizeof(long)];
p1[0] = 1;
}

long test2(long long *p2, int index1, int index2)
{
p2 = (long long *)new ((char*)p2) char[sizeof(long long)];
p2[index1] = 2;
return p2[index2];
}

long test3(long *p3, int index2, long value)
{
p3 = (long *)new ((char*)p3) char[sizeof(long)];
p3[index2] = 3;
p3[index2] = value;
return p3[0];
}

long test4(void *p4, int index1, int index2)
{
test1((long *)p4);
long temp = test2((long long *)std::launder((long *)p4), index1, index2);
return test3((long *)std::launder((long long *)p4), index2, temp);
}

long (*volatile vtest)(void *, int, int) = test4;

int main(void)
{
void *pp = malloc(sizeof(long long));
if (!pp) abort();
long result = vtest(pp, 0, 0);
printf("%lu/%lu\n", *std::launder((long *)pp), result);
}

[Bug middle-end/107115] Wrong codegen from TBAA under stores that change effective type?

2022-12-27 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #13 from Gabriel Ravier  ---
Idk if it qualifies as the same bug or if this will prove to bee particularly
useful, but just to make sure the corresponding issue in C++, triggered by this
slightly altered code (so that it hopefully respects the stricter rules C++ has
w.r.t. implicit object creation) which also fails to execute correctly on GCC,
is fixed by the fix to this issue, I'll post here the corresponding code:

#include 
#include 
#include 

void test1(long *p1)
{
p1 = (long *)new ((char*)p1) char[sizeof(long)];
p1[0] = 1;
}

long test2(long long *p2, int index1, int index2)
{
p2 = (long long *)new ((char*)p2) char[sizeof(long long)];
p2[index1] = 2;
return p2[index2];
}

long test3(long *p3, int index2, long value)
{
p3 = (long *)new ((char*)p3) char[sizeof(long)];
p3[index2] = 3;
p3[index2] = value;
return p3[0];
}

long test4(void *p4, int index1, int index2)
{
test1((long *)p4);
long temp = test2((long long *)p4, index1, index2);
return test3((long *)p4, index2, temp);
}

long (*volatile vtest)(void *, int, int) = test4;

int main(void)
{
void *pp = malloc(sizeof(long long));
if (!pp) abort();
long result = vtest(pp, 0, 0);
printf("%lu/%lu\n", *std::launder((long *)pp), result);
}

[Bug c/107845] New: __builtin_init_trampoline ICEs on invalid arguments

2022-11-23 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107845

Bug ID: 107845
   Summary: __builtin_init_trampoline ICEs on invalid arguments
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f()
{
__builtin_init_trampoline(0, 0, 0);
}

This crashes GCC with the following error:

during RTL pass: expand
: In function 'f':
:3:5: internal compiler error: in expand_builtin_init_trampoline, at
builtins.cc:5683
3 | __builtin_init_trampoline(0, 0, 0);
  | ^~
0x2008dee internal_error(char const*, ...)
???:0
0x95c468 fancy_abort(char const*, int, char const*)
???:0
0xac9c55 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
???:0
0xc1031c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

It looks like `expand_builtin_init_trampoline` just uses `gcc_assert` to check
its arguments instead of the proper error handling other builtins do

[Bug c/107840] New: ICE when compiling cursed setjmp/longjmp that uses __builtin_call_with_static_chain

2022-11-23 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840

Bug ID: 107840
   Summary: ICE when compiling cursed setjmp/longjmp that uses
__builtin_call_with_static_chain
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef __UINT16_TYPE__ uint16_t;
typedef __UINT32_TYPE__ uint32_t;
typedef __INTPTR_TYPE__ intptr_t;
#define unreachable __builtin_unreachable

typedef struct{
const uint16_t mov1;
const uint32_t addr;
const uint16_t mov2;
const void * const chain;
} __attribute__((packed)) thunk_struct;

#define NESTED_CHAIN(p) ({\
thunk_struct *__t = (void*)p; \
__t->chain;   \
})

#define NESTED_ADDR(p) ({ \
auto __p = (p);   \
thunk_struct *__t = (void*)__p;   \
(typeof(__p))(intptr_t)__t->addr; \
})

#define NESTED_UPGRADE(self, ptr, args) ({\
if(self != ptr)   \
__builtin_call_with_static_chain( \
NESTED_ADDR((typeof(self)*)ptr) args, \
NESTED_CHAIN(ptr) \
);\
})

typedef struct{
// can't apply standard [[noreturn]] to function pointers
[[gnu::noreturn]] void(*fun)(void*, int);
}xjmp_buf[1];

#define xsetjmp(env) ({ \
__label__ trgt; \
int __xsetjmp_ret = 0;  \
[[noreturn]] void __jmp(void *self, int r){ \
NESTED_UPGRADE(__jmp, self, (self, r)); \
__xsetjmp_ret = r ?: 1; \
goto trgt;  \
}   \
env[0].fun = __jmp; \
trgt:;  \
int tmp = __xsetjmp_ret;\
__xsetjmp_ret = 0;  \
tmp;\
})

[[noreturn, gnu::always_inline]] inline void xlongjmp(xjmp_buf env, int r){
((void(*)(void*, int))NESTED_ADDR(env[0].fun))(env[0].fun, r);
unreachable();
}

int main(){
int a = 0;
xjmp_buf test;
void foo(xjmp_buf ctx){
if(!xsetjmp(ctx)){
(volatile void)0;
}
}

foo(test);
xlongjmp(test, ++a);
}

Compiling this code with `-std=c2x` results in the following error:

: In function 'foo':
:60:14: error: label '({anonymous})' has incorrect context in bb 4
   60 | void foo(xjmp_buf ctx){
  |  ^~~
during GIMPLE pass: cfg
dump file: /app/output.c.015t.cfg
:60:14: internal compiler error: verify_flow_info failed
0x2008dee internal_error(char const*, ...)
???:0
0xaf90d7 verify_flow_info()
???:0
0x10447c7 cleanup_tree_cfg(unsigned int)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

(PS: I cannot seem to get more of the necessary information from Godbolt,
although the bug seems simple enough to reproduce without it. Still, this link
to the setup I got the bug in might help: https://godbolt.org/z/cd7f4Mdzd)

[Bug c/106535] GCC doesn't reject non-constant initializer if -pedantic is specified but does so in any other circumstances

2022-08-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106535

--- Comment #3 from Gabriel Ravier  ---
Considering the comment appears to be from 1993 (see commit
d9fc6069c69564ce7fecd9ca0ce1bbe0b3c130ef), it having become wrong since then
doesn't seem particularly surprising :p

[Bug c/106535] GCC doesn't reject non-constant initializer if -pedantic is specified but does so in any other circumstances

2022-08-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106535

Gabriel Ravier  changed:

   What|Removed |Added

   Keywords||accepts-invalid,
   ||rejects-valid

--- Comment #1 from Gabriel Ravier  ---
(PS: I'm not sure whether it's intended that it should be rejected under
-pedantic or accepted without any options, so I've used both of the keywords)

[Bug c/106535] New: GCC doesn't reject non-constant initializer if -pedantic is specified but does so in any other circumstances

2022-08-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106535

Bug ID: 106535
   Summary: GCC doesn't reject non-constant initializer if
-pedantic is specified but does so in any other
circumstances
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int f = (0, 0);

Compiled without any options:

:1:9: error: initializer element is not constant
1 | int f = (0, 0);
  | ^

Compiled with -pedantic:

:1:9: warning: initializer element is not constant [-Wpedantic]
1 | int f = (0, 0);
  | ^

It seems rather odd that adding -pedantic transforms an error into a warning...

[Bug tree-optimization/94920] Failure to optimize abs pattern from arithmetic with selected operands based on comparisons with 0

2022-07-27 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94920

--- Comment #4 from Gabriel Ravier  ---
So, is this fully fixed, or did I miss something ?

[Bug tree-optimization/106245] New: Failure to optimize (u8)(a << 7) >> 7 pattern to a & 1

2022-07-10 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106245

Bug ID: 106245
   Summary: Failure to optimize (u8)(a << 7) >> 7 pattern to a & 1
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

int8_t f(int8_t a)
{
return (uint8_t)(a << 7) >> 7;
}

This can be optimized to `return a & 1;`. This transformation is done by LLVM,
but not by GCC.

[Bug tree-optimization/106244] New: Failure to optimize (1 << x) & 1 to `x == 0` if separated into multiple statements

2022-07-10 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106244

Bug ID: 106244
   Summary: Failure to optimize (1 << x) & 1 to `x == 0` if
separated into multiple statements
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

int8_t f(int8_t x)
{
int8_t sh = 1 << x;
return sh & 1;
}

This can be optimized to `return x == 0;`. This transformation is done by LLVM,
but not by GCC.

PS: For some reason GCC manages to do this optimization if I replace `f` with
`return (1 << x) & 1;` instead of having it spelled out in 2 statements.

[Bug tree-optimization/106243] New: Failure to optimize (0 - x) & 1 on vector type

2022-07-10 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106243

Bug ID: 106243
   Summary: Failure to optimize (0 - x) & 1 on vector type
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef int64_t v2i64 __attribute__((vector_size(16)));

v2i64 f(v2i64 x)
{
return (0 - x) & 1;
}

This can be optimized to `return x & 1;`. LLVM does this transformation, but
GCC does not

[Bug tree-optimization/94899] Failure to optimize out add before compare with INT_MIN

2022-06-22 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899

--- Comment #6 from Gabriel Ravier  ---
Can confirm that this appears to be fixed.

[Bug tree-optimization/105983] New: Failure to optimize (b != 0) && (a >= b) as well as the same pattern with binary and

2022-06-14 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105983

Bug ID: 105983
   Summary: Failure to optimize (b != 0) && (a >= b) as well as
the same pattern with binary and
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool f(unsigned a, unsigned b)
{
return (b != 0) && (a >= b);
}

This can be optimized to `return (b != 0) & (a >= b);`, which is itself
optimized to `return (b - 1) > a;`. GCC outputs code equivalent to `return (b
!= 0) & (a >= b);` (at least on x86) whereas if that code is compiled it would
output `return (b - 1) > a;`, while LLVM has no trouble directly outputting the
optimal code.

[Bug tree-optimization/105777] New: Failure to optimize __builtin_mul_overflow with constant operand to add+cmp check

2022-05-30 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105777

Bug ID: 105777
   Summary: Failure to optimize __builtin_mul_overflow with
constant operand to add+cmp check
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int f17(unsigned x)
{
int z;
return __builtin_mul_overflow((int)x, 35, );
}

This can be optimized to `return (x + 0xFC57C57C) < 0xF8AF8AF9;` (and I'd
assume the same pattern with other constants than 35 should be optimizable in
the same way). LLVM does this transformation, but GCC does not.

[Bug tree-optimization/105776] New: Failure to recognize __builtin_mul_overflow pattern

2022-05-30 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105776

Bug ID: 105776
   Summary: Failure to recognize __builtin_mul_overflow pattern
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int f4(unsigned x, unsigned y)
{
if (x == 0)
return 1;
return ((int)(x * y) / (int)x) == y;
}

can be optimized to

int f4(unsigned x, unsigned y)
{
int z;
return !__builtin_mul_overflow((int)x, (int)y, );
}

This transformation is done by LLVM, but not by GCC.

Note that this derivates from another function written as such:

int
f3 (unsigned x, unsigned y)
{
  unsigned int r = x * y;
  return !x || ((int) r / (int) x) == (int) y;
}

which does optimize correctly on x86 but not on aarch64 (where it generates
tree-optimized GIMPLE corresponding to the code above)

[Bug target/105773] New: [Aarch64] Failure to optimize and+cmp to tst

2022-05-30 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105773

Bug ID: 105773
   Summary: [Aarch64] Failure to optimize and+cmp to tst
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int
baz (unsigned long x, unsigned long y)
{
  return (int) (x & y) > 0;
}

With -O3, AArch64 GCC outputs this:

baz(unsigned long, unsigned long):
and w0, w0, w1
cmp w0, 0
csetw0, gt
ret

whereas LLVM outputs this:

baz(unsigned long, unsigned long):
tst w1, w0
csetw0, gt
ret

It seems to me as though using tst should be faster (unless Aarch64 processors
are extremely weird).

[Bug tree-optimization/102583] [x86] Failure to optimize 32-byte integer vector conversion to 16-byte float vector properly when converting upper part with -mavx2

2022-05-16 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583

--- Comment #7 from Gabriel Ravier  ---
Can confirm it is indeed fixed

[Bug target/105328] New: [x86] Failure to optimize out test instruction after add

2022-04-21 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105328

Bug ID: 105328
   Summary: [x86] Failure to optimize out test instruction after
add
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f1();
void f2();
void f3();

void g(int b, int c)
{
int a = b + c;
if (a > 0)
f1();
else if (a < 0)
f2();
else
f3();
}

With -O3, GCC outputs this:

g(int, int):
  add edi, esi
  test edi, edi
  jg .L5
  je .L3
  jmp f2()
.L3:
  jmp f3()
.L5:
  jmp f1()

LLVM instead outputs this:

g(int, int):
  add edi, esi
  jle .LBB0_1
  jmp f1()@PLT # TAILCALL
.LBB0_1:
  js .LBB0_4
  jmp f3()@PLT # TAILCALL
.LBB0_4:
  jmp f2()@PLT # TAILCALL

It appears like the `test` instruction can be removed (I assume without having
to change anything about which functions get called on branchless paths and
things like that). I'm not completely sure about this, considering how complex
x86 performance can sometimes be, but I'd think removing that instruction
should be beneficial everywhere.

[Bug target/104412] New: [Aarch64] Failure to optimize vector initialization from int64s

2022-02-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104412

Bug ID: 104412
   Summary: [Aarch64] Failure to optimize vector initialization
from int64s
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef int64_t v2i64 __attribute__((vector_size(16)));

v2i64 _mm_set_epi64x(int64_t i1, int64_t i2)
{
union {
int64_t data[2];
v2i64 v;
} d = {.data = {i2, i1}};
return d.v;
}

With -O3, AArch64 GCC outputs this:

_mm_set_epi64x(long, long):
  sub sp, sp, #16
  stp x1, x0, [sp]
  ldr q0, [sp]
  add sp, sp, 16
  ret

LLVM instead outputs this:

_mm_set_epi64x(long, long):
  fmov d0, x1
  mov v0.d[1], x0
  ret

[Bug target/104409] New: [Aarch64] Crash when compiling source code of any significant size with -march=armv8.7-a

2022-02-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104409

Bug ID: 104409
   Summary: [Aarch64] Crash when compiling source code of any
significant size with -march=armv8.7-a
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

Created attachment 52361
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52361=edit
Preprocessed minimal code that triggers the crash

It appears that using `-march=armv8.7-a` on code of any significant size
quickly results in a crash. Compiling the attached code with that option
results in a crash on Godbolt's trunk build (updated as of sometime today) of
Aarch64 GCC (here's a Godbolt link if you want to see it reproduced there, btw:
https://godbolt.org/z/GWh1r6MWj)

The complete command line is this: `aarch64-linux-gnu-g++ -g -o
/tmp/compiler-explorer-compiler202216-6419-6qrckn.k4c16/output.s -S
-fdiagnostics-color=always -march=armv8.7-a -v
/tmp/compiler-explorer-compiler202216-6419-6qrckn.k4c16/example.cpp`

The compiler output is this:

:2280:5: internal compiler error: Segmentation fault
 2280 | }
  | ^
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See  for instructions.

The information from gcc -v is this:

Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/arm64/gcc-trunk/aarch64-unknown-linux-gnu/bin/aarch64-unknown-linux-gnu-g++
Target: aarch64-unknown-linux-gnu
Configured with: /opt/.build/aarch64-unknown-linux-gnu/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=aarch64-unknown-linux-gnu
--prefix=/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu
--exec_prefix=/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu
--with-sysroot=/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot
--enable-languages=c,c++,fortran --enable-__cxa_atexit --disable-libmudflap
--enable-libgomp --enable-libssp --enable-libquadmath
--enable-libquadmath-support --enable-libsanitizer --disable-libmpx
--with-gmp=/opt/.build/aarch64-unknown-linux-gnu/buildtools
--with-mpfr=/opt/.build/aarch64-unknown-linux-gnu/buildtools
--with-mpc=/opt/.build/aarch64-unknown-linux-gnu/buildtools
--with-isl=/opt/.build/aarch64-unknown-linux-gnu/buildtools --disable-lto
--without-zstd --enable-threads=posix --enable-target-optspace --disable-plugin
--disable-nls --disable-multilib
--with-local-prefix=/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot
--enable-long-long
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.1 20220206 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s' '-S'
'-march=armv8.7-a' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'
'-dumpdir' '/app/'

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/libexec/gcc/aarch64-unknown-linux-gnu/12.0.1/cc1plus
-quiet -v -D_GNU_SOURCE  -quiet -dumpdir /app/ -dumpbase output.cpp
-dumpbase-ext .cpp -march=armv8.7-a -mlittle-endian -mabi=lp64 -g -version
-fdiagnostics-color=always -o /app/output.s
GNU C++17 (GCC) version 12.0.1 20220206 (experimental)
(aarch64-unknown-linux-gnu)
compiled by GNU C version 7.5.0, GMP version 6.1.2, MPFR version 4.0.2,
MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot/include"
#include "..." search starts here:
#include <...> search starts here:

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/12.0.1/../../../../aarch64-unknown-linux-gnu/include/c++/12.0.1

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/12.0.1/../../../../aarch64-unknown-linux-gnu/include/c++/12.0.1/aarch64-unknown-linux-gnu

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/12.0.1/../../../../aarch64-unknown-linux-gnu/include/c++/12.0.1/backward

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/12.0.1/include

/opt/compiler-explorer/arm64/gcc-trunk-20220206/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/12.0.1/include-fixed

[Bug target/104401] New: [x86] Failure to recognize min/max pattern using pcmp+pblendv

2022-02-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104401

Bug ID: 104401
   Summary: [x86] Failure to recognize min/max pattern using
pcmp+pblendv
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

__m128i min32(__m128i value, __m128i input)
{
  return _mm_blendv_epi8(input, value, _mm_cmplt_epi32(value, input));
}

With -O3 -msse4.1, GCC outputs this:

min32(long long __vector(2), long long __vector(2)):
  movdqa xmm2, xmm0
  movdqa xmm0, xmm1
  movdqa xmm3, xmm1
  pcmpgtd xmm0, xmm2
  pblendvb xmm3, xmm2, xmm0
  movdqa xmm0, xmm3
  ret

LLVM instead outputs this:

min32(long long __vector(2), long long __vector(2)):
  pminsd xmm0, xmm1
  ret

The equivalent code with cmpgt used instead of cmplt can be optimized to
pmaxsd.

[Bug tree-optimization/104394] New: Failure to optimize vector pattern for x < 0

2022-02-04 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104394

Bug ID: 104394
   Summary: Failure to optimize vector pattern for x < 0
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef int32_t v4i32 __attribute__((vector_size(16)));

v4i32 get_cmpmask(v4i32 mask)
{
v4i32 signmask{(int32_t)0x8000, (int32_t)0x8000,
(int32_t)0x8000, (int32_t)0x8000};
return ((signmask & mask) == signmask);
}

This can be optimized to `return mask < 0;`. This transformation is done by
LLVM, but not by GCC.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-04 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #2 from Gabriel Ravier  ---
Although I agree the pattern doesn't seem that useful at first, I've seen it
crop up in several places, such as:

- in pixman: https://github.com/servo/pixman/blob/master/pixman/pixman-sse2.c
on line 181
- in an simd mandelbrot implementation:
https://github.com/huonw/mandel-simd/blob/master/mandel_sse2.c on line 47
- in this article:
http://0x80.pl/notesen/2021-02-02-all-bytes-in-reg-are-equal.html
- in boost::uuid (although this one will detect if compiling on a platform with
SSE4.1):
https://github.com/boostorg/uuid/blob/develop/include/boost/uuid/detail/uuid_x86.ipp
- in this other article:
https://mischasan.wordpress.com/2011/11/09/the-generic-sse2-loop/
- in a research paper's accompanying github repo:
https://github.com/GameTechDev/MaskedOcclusionCulling/blob/master/MaskedOcclusionCulling.cpp
on line 333
- in ClickHouse:
https://clickhouse.com/codebrowser/html_report/ClickHouse/src/Common/memcmpSmall.h.html
on line 241

And this is just what I found in a few minutes, so I would personally think
there are many more occurences of that pattern.

[Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz

2022-02-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Bug ID: 104376
   Summary: Failure to optimize clz equivalent to clz
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

uint32_t countLeadingZeros32(uint32_t x)
{
if (x == 0)
return 32;
return (31 - __builtin_clz(x)) ^ 31;
}

On x86, with `-mlzcnt`, GCC outputs this:

countLeadingZeros32(unsigned int):
  mov eax, 32
  test edi, edi
  je .L1
  mov eax, 31
  lzcnt edi, edi
  sub eax, edi
  xor eax, 31
.L1:
  ret

LLVM instead outputs this:

countLeadingZeros32(unsigned int):
  lzcnt eax, edi
  ret

[Bug target/104375] New: [x86] Failure to recognize bzhi patter nwhen shr is present

2022-02-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104375

Bug ID: 104375
   Summary: [x86] Failure to recognize bzhi patter nwhen shr is
present
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

uint64_t bextr_u64(uint64_t w, unsigned off, unsigned int len)
{
return (w >> off) & ((1U << len) - 1U);
}

With -mbmi2, this can be optimized to using shrx followed by bzhi. This
transformation is done by LLVM, but not by GCC.


PS: Even in the case where the shr is removed and thus the bzhi pattern is
recognized (e.g. `return w & ((1U << len) - 1U);`), it is still not compiled
optimally as it for some reason decides to put the result of the bzhi in an
intermediary register before moving it to eax.

[Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-02-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Bug ID: 104371
   Summary: [x86] Failure to use optimize
pxor+pcmpeqb+pmovmskb+cmp 0x pattern to ptest
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool is_zero(__m128i x)
{
return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128())) == 0x;
}

This can be optimized to `return _mm_testz_si128(x, x);`. This optimization is
done by LLVM, but not by GCC.

[Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types

2022-02-02 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360

Bug ID: 104360
   Summary: Failure to optimize abs pattern on vector types
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef int16_t v8i16 __attribute__((vector_size(16)));

v8i16 abs_i16(v8i16 x)
{
auto isN = x < v8i16{};

x ^= isN;
return x - isN;
}

This (although I think v8i16 could be replaced with any integer vector type and
it still would work) can be optimized to using an abs instruction where
possible (such as `pabsw` on x86-64, or `abs` on aarch64)

PS: this doesn't even necessarily require an abs instruction. on standard
x86-64 with -O3, GCC manages just this:

abs_i16(short __vector(8)):
  pxor xmm1, xmm1
  pcmpgtw xmm1, xmm0
  pxor xmm0, xmm1
  psubw xmm0, xmm1
  ret

whereas LLVM outputs this:

abs_i16(short __vector(8)):
  pxor xmm1, xmm1
  psubw xmm1, xmm0
  pmaxsw xmm0, xmm1
  ret

which I'm pretty sure is better.

[Bug target/104357] New: [Aarch64] Failure to use csinv instead of mvn+csel where possible

2022-02-02 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Bug ID: 104357
   Summary: [Aarch64] Failure to use csinv instead of mvn+csel
where possible
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

unsigned char stbi__clamp(int x)
{
   if ((unsigned)x > 255) {
  if (x < 0) return 0;
  if (x > 255) return 255;
   }
   return x;
}

With -O3, GCC outputs this (on aarch64):

stbi__clamp(int):
  mvn w1, w0
  cmp w0, 256
  and w0, w0, 255
  asr w1, w1, 31
  and w1, w1, 255
  csel w0, w0, w1, cc
  ret

LLVM instead outputs this:

stbi__clamp(int):
  asr w8, w0, #31
  cmp w0, #255
  csinv w0, w0, w8, ls
  ret

I don't know if the `and`s are there because of ABI differences, but it seems
to me like the `mvn` can definitely be replaced by using `csinv` instead of
`csel`.

[Bug target/104315] [AArch64] Failure to optimize 8-bit bitreverse pattern

2022-01-31 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104315

--- Comment #1 from Gabriel Ravier  ---
PS: I've just stumbled upon the more generic case, which would be this code:

unsigned int stb_bitreverse(unsigned int n)
{
  n = ((n & 0x) >>  1) | ((n & 0x) << 1);
  n = ((n & 0x) >>  2) | ((n & 0x) << 2);
  n = ((n & 0xF0F0F0F0) >>  4) | ((n & 0x0F0F0F0F) << 4);
  n = ((n & 0xFF00FF00) >>  8) | ((n & 0x00FF00FF) << 8);
  return (n >> 16) | (n << 16);
}

which GCC optimizes to this:

stb_bitreverse(unsigned int):
  lsl w2, w0, 1
  lsr w1, w0, 1
  and w0, w1, 1431655765
  and w1, w2, -1431655766
  orr w0, w0, w1
  lsr w1, w0, 2
  lsl w0, w0, 2
  and w0, w0, -858993460
  and w1, w1, 858993459
  orr w1, w1, w0
  lsr w0, w1, 4
  lsl w1, w1, 4
  and w1, w1, -252645136
  and w0, w0, 252645135
  orr w0, w0, w1
  rev w0, w0
  ret

and LLVM to this:

stb_bitreverse(unsigned int):
  rbit w0, w0
  ret

[Bug target/104315] New: [AArch64] Failure to optimize 8-bit bitreverse pattern

2022-01-31 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104315

Bug ID: 104315
   Summary: [AArch64] Failure to optimize 8-bit bitreverse pattern
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

unsigned int stb_bitreverse8(unsigned char n)
{
   n = ((n & 0xAA) >> 1) + ((n & 0x55) << 1);
   n = ((n & 0xCC) >> 2) + ((n & 0x33) << 2);
   return (unsigned char) ((n >> 4) + (n << 4));
}

On AArch64, with -O3, GCC currently outputs this:

stb_bitreverse8(unsigned char):
  mov w2, 170
  mov w1, 85
  and w1, w1, w0, lsr 1
  and w0, w2, w0, lsl 1
  orr w0, w1, w0
  mov w1, -52
  mov w2, 51
  and w1, w1, w0, lsl 2
  and w0, w2, w0, lsr 2
  and w1, w1, 255
  orr w0, w0, w1
  lsr w1, w0, 4
  orr w0, w1, w0, lsl 4
  and w0, w0, 255
  ret

LLVM instead outputs this:

stb_bitreverse8(unsigned char):
  rbit w8, w0
  lsr w0, w8, #24
  ret

This optimization should be faster and quite useful, especially as there does
not seem to be any way to use `rbit` manually with intrinsics in GCC.

[Bug middle-end/96159] atomic creates incorrect code for possible isaligned struct

2021-11-26 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96159

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #8 from Gabriel Ravier  ---
I do agree that either: 
- GCC's behavior should be aligned with Clang's and that it should provide some
kind of "known-aligned load" function (along with the corresponding ones for
store/exchange/compare_exchange)
- or there should at least be some kind of "safe load" function (along with the
corresponding ones) for the cases where the alignment is unknown.

I personally do prefer the first solution, personally, unless the Clang devs
can be convinced to change their builtin, as I think it would be better to have
the same behavior on both compilers.

In any case, though, I do think there is a documentation bug here, and I would
also say it would be quite nice to have a warning when using the built-in in a
way that makes it much slower/invalid, like what Clang does:

:240:5: warning: misaligned atomic operation may incur significant
performance penalty; the expected alignment (8 bytes) exceeds the actual
alignment (4 bytes) [-Watomic-alignment]
__atomic_load(x, , __ATOMIC_SEQ_CST);
^

[Bug c/103343] Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer

2021-11-22 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

--- Comment #3 from Gabriel Ravier  ---
Well the code does not invoke undefined behavior here, it just so happens that
`p == (x + 1)` because `y` happens to be laid out in memory after `x` (note:
this isn't a guarantee, of course, but GCC can't prove this isn't the case as
it's defined in another TU and it's quite easy to make this happen). The
comparison doesn't imply the pointers have the same provenance, and the
standard has a specific provision for this exact comparison:

"If one pointer represents the address of a complete object, and another
pointer represents the address one past the last element of a different
complete object,72 the result of the comparison is unspecified."
- [expr.eq] (https://eel.is/c++draft/expr.eq#3.1)

Also, `y` isn't accessed through a pointer to `x`: I've already said the case
where the function is incorrect is when `f` is called with `` as the first
argument. If doing `p == (x + 1)` implied they derived from the same object,
then that would imply after doing ` == (x + 1)` doing `*` would invoke
undefined behavior which is obviously ridiculous.

Although there is a case to be made that this code is stupid and deserves a
warning, though... I won't argue with that, this code is just something I wrote
to test things after a 3 hour long conversation about DR 260
() and a lot of
standardese lawyering, so it's not intended to be real life code. I'd say,
though, that the warning is quite inaccurate in the details of what it's
saying, as `p` isn't actually equivalent to `(x + 1)` just because `p == (x +
1)`.

[Bug c/103343] New: Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer

2021-11-20 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103343

Bug ID: 103343
   Summary: Invalid codegen when comparing pointer to one past the
end and then dereferencing that pointer
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

extern int x[1], y;

int f(int *p, int *q) {
*q = y;
if (p == (x + 1)) {
*p = 2;
return y;
}
return 0;
}

GCC trunk currently outputs the following code with -O3:

f:
mov eax, DWORD PTR y[rip]
mov DWORD PTR [rsi], eax
cmp rdi, OFFSET FLAT:x+4
je  .L5
xor eax, eax
ret
.L5:
mov DWORD PTR x[rip+4], 2
ret

Which is incorrect because `p` could point to `y`, for example if `f` was
called as such:

int whatever;
f(, );

and `y` could happen to be located in memory right after `x`.

Also, although the comparison invokes unspecified behavior, this still means
only two results are possible according to the standard:
- if `p == (x + 1)` results in `false`, then the result of `f` is 0
- if `p == (x + 1)` results in `true`, then the result of `f` is 2 since we do
`*p = 2` and `p` points to `y`.

GCC's optimization makes it so the result can also be the previous value of
`y`, which could be something else than 0 or 2.

It seems that GCC assumes that because `p == (x + 1)` it can replace all
occurrences of `p` with `x + 1` without any regard to provenance, and doing
that change manually would indeed mean the `return y;` could be optimized to
use the previous store (and the store to `x + 1` would be UB, too...), but this
isn't the case here: `p` could simultaneously validly point to `y` and be equal
to `x + 1`.

PS: This also results in plenty of invalid warnings when compiling with -Wall:

: In function 'f':
:6:9: warning: array subscript 1 is outside array bounds of 'int[1]'
[-Warray-bounds]
6 | *p = 2;
  | ^~
:1:12: note: at offset 4 into object 'x' of size 4
1 | extern int x[1], y;
  |^

[Bug c/102939] Ridiculously long compilation times on (admittedly itself ridiculous) pointer declaration

2021-11-01 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102939

--- Comment #4 from Gabriel Ravier  ---
(In reply to Hans-Peter Nilsson from comment #3)
> (In reply to Gabriel Ravier from comment #0)
> ...
> > #define PTR4 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3
> > #define PTR5 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4
> > #define PTR6 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5
> > 
> > int PTR4 q3_var = 0;
> ...
> 
> Is the use of PTR4 instead of PTR6 or PTR5, intended to provoke comments
> such as this one, or are there untold additional related observations?

It's just a leftover I forgot to remove from when I was first testing this
(with the bigger macros, which just had worse results).

[Bug c/102939] New: Ridiculously long compilation times on (admittedly ridiculous) pointer declaration

2021-10-25 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102939

Bug ID: 102939
   Summary: Ridiculously long compilation times on (admittedly
ridiculous) pointer declaration
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#define PTR1 * * * * * * * * * *
#define PTR2 PTR1 PTR1 PTR1 PTR1 PTR1 PTR1 PTR1 PTR1 PTR1 PTR1
#define PTR3 PTR2 PTR2 PTR2 PTR2 PTR2 PTR2 PTR2 PTR2 PTR2 PTR2
#define PTR4 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3 PTR3
#define PTR5 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4 PTR4
#define PTR6 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5 PTR5

int PTR4 q3_var = 0;

This takes an entire second to compile for GCC, and which is absolutely
ridiculous compared to, for example, tcc and ack, which both compile this code
in under .01 seconds.

I've investigated a bit into what's going on myself and it looks like while
parsing, there's some `variably_modified_type_p` algorithm that's going haywire
and taking forever on this declaration. Is it some kind of O(n²) recursive
algorithm or something ? It seems rather quite odd, to be honest...

For comparison, compiling this as C++ takes just .04 seconds.

[Bug tree-optimization/102927] Failure to optimize series of if-else to use array when possible

2021-10-25 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102927

--- Comment #5 from Gabriel Ravier  ---
Um, what ? How is this invalid, exactly ? Are you saying foo is faster than baz
(in which case it seems the opposite optimization should be implemented for baz
and bar), or that this optimization just won't ever be implemented (which seems
like more of a WONTFIX), or something else ? This seems kind of odd...

[Bug tree-optimization/102927] New: Failure to optimize series of if-else to use array when possible

2021-10-25 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102927

Bug ID: 102927
   Summary: Failure to optimize series of if-else to use array
when possible
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int foo(int i) {
  if (i == 0)
return 52;
  else if (i == 1)
return 77;
  else if (i == 2)
return 91;
  else if (i == 3)
return 10;
  else
return 42;
}

int bar(int i) {
  switch (i) {
  case 0:
return 52;
  case 1:
return 77;
  case 2:
return 91;
  case 3:
return 10;
  default:
return 42;
  }
}

int baz(int i)
{
static const int results[] = {52, 77, 91, 10};
if (__builtin_expect_with_probability((unsigned)i < 4, 1, 0.5))
return results[(unsigned)i];
return 42;
}

foo can be optimized to be equivalent to baz (like bar is). This optimization
is done by LLVM, but not by GCC.

PS: I've observed that making the if-else chain longer triggers the
optimization. Is GCC considering the if-else chain to be faster than an array
access ? Because in that case, it seems like bar should be optimized to an
if-else chain (perhaps along with bar).

[Bug c++/31573] -Wall-all to enable all warnings

2021-10-19 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573

--- Comment #11 from Gabriel Ravier  ---
Well, that does help, although it is still a significant annoyance that would
take more than its fair share of time to handle.

(Also, is it still really that much of a concern anymore that users would think
-Weverything is a normal flag to set in compilations ? I've basically never
seen this happen with Clang's flag, so it seems like an unreasonable concern,
especially considering the amount of warning flags that have been added since
2007+the amount of warning flags that are rather specialized and obviously
result in an extremely large amount of false positives on a lot of code)

[Bug c++/31573] -Wall-all to enable all warnings

2021-10-19 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #9 from Gabriel Ravier  ---
I would also quite like to note myself that numerous people I know have found
Clang's `-Weverything` very useful, especially for finding new warnings.

The only way I've been able to find some of GCC's most useful flags was by
manually making a full list off GCC's manual (which took a long while !), and I
have to say this is a particularly annoying process to go through, especially
when such a list has to be updated on every new GCC version.

[Bug c++/102820] New: Failure to compile void{}

2021-10-18 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102820

Bug ID: 102820
   Summary: Failure to compile void{}
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f()
{
void{};
}

This has been considered valid since
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#2351 was accepted
as a DR in 2018, but GCC fails to compile it, with this error:

: In function 'void f()':
:3:10: error: compound literal of non-object type 'void'
3 | void{};
  |  ^

[Bug rtl-optimization/15792] missed subreg optimization

2021-10-14 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #11 from Gabriel Ravier  ---
Seems like the issue is present again, except it's test1 that gets the better
asm now. Perhaps this should be re-opened ?

[Bug target/102758] New: [x86] Failure to use registers optimally when swapping between (identically represented) vector types

2021-10-14 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102758

Bug ID: 102758
   Summary: [x86] Failure to use registers optimally when swapping
between (identically represented) vector types
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef int64_t v2i64 __attribute__((vector_size(16)));
typedef uint16_t v8u16 __attribute__((vector_size(16)));

v2i64 f(v8u16 make_b_xxm1, v2i64 b)
{
return (v2i64)((v8u16)b + (v8u16){1});
}

With -O3, GCC outputs this:

f(unsigned short __vector(8), long __vector(2)):
movdqa  xmm2, XMMWORD PTR .LC0[rip]
paddw   xmm2, xmm1
movdqa  xmm0, xmm2
ret

LLVM outputs this:

f(unsigned short __vector(8), long __vector(2)):
movdqa  xmm0, xmm1
paddw   xmm0, xmmword ptr [rip + .LCPI0_0]
ret

It should be possible to optimize out the last `movdqa`. This seems to be
directly related to the usage of differing types here (even though the
conversion is cost-free) as replacing all usage of `v2i64` with `v8u16` makes
this be better optimized.

[Bug target/95740] Failure to avoid using the stack when interpreting a float as an integer when it is modified afterwards

2021-10-13 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95740

--- Comment #3 from Gabriel Ravier  ---
I've also encountered what looks like a duplicate of this bug, although I'm not
sure but it seems likely:

int foo(float f)
{
  union
  {
float f;
int i;
  } z = { .f = f };

  return z.i - 1;
}

Which outputs roughly the same assembly code as the initial test case.

[Bug tree-optimization/102738] New: Failure to optimize right shift of 128-bit value after it's already been shifted by 127

2021-10-13 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102738

Bug ID: 102738
   Summary: Failure to optimize right shift of 128-bit value after
it's already been shifted by 127
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int a(__int128 f, int g)
{
return (f >> 127) >> g;
}

This can be optimized to `return f >> 127;`. This optimization is done by LLVM,
but not by GCC.

[Bug target/102737] New: [x86] Failure to optimize out bad register usage involving int->double conversion

2021-10-13 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102737

Bug ID: 102737
   Summary: [x86] Failure to optimize out bad register usage
involving int->double conversion
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

double foo(int s, double a)
{
  return s + a;
}

With -O3, GCC outputs this on AMD64:

foo(int, double):
movapd  xmm1, xmm0
pxorxmm0, xmm0
cvtsi2sdxmm0, edi
addsd   xmm0, xmm1
ret

LLVM outputs this:

foo(int, double):
cvtsi2sdxmm1, edi
addsd   xmm0, xmm1
ret

The movapd in GCC's version can be optimized out.

[Bug tree-optimization/102679] New: Failure to optimize out 64-bit multiplication to 32-bit multiplication when possible in circumstances involving modifying a 64-bit variable that gets converted to 3

2021-10-10 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102679

Bug ID: 102679
   Summary: Failure to optimize out 64-bit multiplication to
32-bit multiplication when possible in circumstances
involving modifying a 64-bit variable that gets
converted to 32-bit
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

int32_t mac(int32_t *b, int64_t sqr)
{
sqr += (int64_t)*b * *b;
return sqr;
}

This can be optimized to remove the `(int64_t)` cast. This optimization is done
by LLVM, but not by GCC.

[Bug tree-optimization/102676] Failure to optimize out malloc/nothrow allocation that's only used for bool checking

2021-10-10 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102676

--- Comment #2 from Gabriel Ravier  ---
Well, I think the assumption LLVM is making is that the allocation, being
unused, can just be eliminated and considered to have always succeeded. I don't
see how that would contradict the standard, although I suppose some would
consider it a bad thing to do for the compiler (although in that case you might
as well rule out all optimizations that elide allocations).

[Bug tree-optimization/102676] New: Failure to optimize out malloc/nothrow allocation that's only used for bool checking

2021-10-09 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102676

Bug ID: 102676
   Summary: Failure to optimize out malloc/nothrow allocation
that's only used for bool checking
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 
#include 

bool f()
{
  return new(std::nothrow) int;
}

bool g()
{
return malloc(1);
}

Both these functions can be optimized to `return true;`. This optimization is
done by LLVM, but not by GCC.

[Bug target/102672] New: [AArch64] Failure to optimize to using stp instead of 2 strs when possible

2021-10-09 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102672

Bug ID: 102672
   Summary: [AArch64] Failure to optimize to using stp instead of
2 strs when possible
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

struct X {
int i;
void *p;
};

void foo(struct X *q, void *p)
{
struct X b{};
b.p = p;
*q = b;
}

With -O3, GCC outputs this:

foo(X*, void*):
str wzr, [x0]
str x1, [x0, 8]
ret

LLVM instead outputs this:

foo(X*, void*):
stp xzr, x1, [x0]
ret

[Bug c++/102623] New: Failure to detect destructed scalar objects in consteval function

2021-10-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102623

Bug ID: 102623
   Summary: Failure to detect destructed scalar objects in
consteval function
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

using T = int;
consteval bool f()
{
  T t = 42;
  t.~T();
  return (t == 42);
}

bool x = f();

This code should not compile, as `f` invokes undefined behavior during constant
evaluation, as Clang diagnoses:

:9:10: error: call to consteval function 'f' is not a constant
expression
bool x = f();
 ^
:6:11: note: read of object outside its lifetime is not allowed in a
constant expression
  return (t == 42);
  ^

The specific usage with `int` here was not UB until C++20 (although
pseudo-destructors weren't allowed in constant expressions until C++20 anyway),
but as of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html#pseudo-destructor-calls
pseudo-destructors end the lifetime of the operand.

[Bug target/102591] Failure to optimize search for value in vector-sized area to use SIMD

2021-10-05 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591

--- Comment #2 from Gabriel Ravier  ---
memcpy can fail on unaligned memory ??? I used it specifically to avoid this
problem !

(also, LLVM's code, I am pretty sure, does not have any issue with alignment,
as it uses either AVX instructions which care not for it, or specifically does
a movdqu (i.e. unaligned load) of the memory)

[Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD

2021-10-04 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591

Bug ID: 102591
   Summary: Failure to optimize search for value in vector-sized
area to use SIMD
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool match8(char *tpl) 
{
int found = 0;
for (int at = 0; at < 16; at++)
if (tpl[at] == 0)
found = 1;
return found;
}

This function can be greatly optimized by using SIMD. It can be optimized to
something like this:

typedef char v16i8 __attribute__((vector_size(16)));

bool match8v2(char *tpl)
{
v16i8 values;
__builtin_memcpy(, tpl, 16);
v16i8 compared = (values == 0);
return _mm_movemask_epi8((__m128i)compared) != 0;
}

This optimization is done by LLVM, but not by GCC.

PS: I've marked this as an x86 bug, but only because I could not find a
portable way of expressing `_mm_movemask_epi8((__m128i)compared)`, I would
assume other architectures have similar ways of expressing the same thing
cheaply.

(For example, Altivec should be able to implement that operation with a
`vec_extract(vec_vbpermq((__vector unsigned char)compared, perm), 1)` with
`perm` looking like this: `{120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32,
24, 16, 8, 0}` and the 1 replaced with 14 on big-endian)

[Bug target/85730] complex code for modifying lowest byte in a 4-byte vector

2021-10-04 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

--- Comment #4 from Gabriel Ravier  ---
That's a bit odd, really - I'm just using the latest released sub-versions of
each of these (except for GCC 6 since I only have access to it through Godbolt
which doesn't have GCC 6.5), i.e. GCC 6.4, 7.5, 8.5, 9.4, 10.3, 11.2 and trunk,
I wouldn't expect it to impact this stuff that much.

Though I do realize now I had messed up my comment slightly: when saying "GCC 7
also changed bar and baz's code generation" I meant "foo and baz's code
generation".

[Bug target/85730] complex code for modifying lowest byte in a 4-byte vector

2021-10-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #2 from Gabriel Ravier  ---
Seems like they've all got identical code generation over here since GCC 7, and
the GCC 6 code generation is just very bad for bar (although GCC 7 also changed
bar and baz's code generation, which previously was as bar's is in the
description)

[Bug target/102583] New: [x86] Failure to optimize 32-byte integer vector conversion to 16-byte float vector properly when converting upper part with -mavx2

2021-10-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583

Bug ID: 102583
   Summary: [x86] Failure to optimize 32-byte integer vector
conversion to 16-byte float vector properly when
converting upper part with -mavx2
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int v8si __attribute__((vector_size(32)));
typedef float v4sf __attribute__((vector_size(16)));

v4sf high (v8si *srcp)
{
  v8si src = *srcp;
  return (v4sf) { (float)src[4], (float)src[5], (float)src[6], (float)src[7] };
}

With -O3 -mavx2, GCC outputs this:

high(int __vector(8)*):
vmovdqa ymm0, YMMWORD PTR [rdi]
vperm2i128  ymm0, ymm0, ymm0, 17
vcvtdq2ps   xmm0, xmm0
vzeroupper
ret

LLVM instead outputs this:

high(int __vector(8)*):
vcvtdq2ps   xmm0, xmmword ptr [rdi + 16]
ret

And GCC outputs the equivalent code if -mavx2 is removed:

high(int __vector(8)*):
cvtdq2psxmm0, XMMWORD PTR [rdi+16]
ret

[Bug tree-optimization/102580] New: Failure to optimize signed division to unsigned division when dividend can't be negative

2021-10-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102580

Bug ID: 102580
   Summary: Failure to optimize signed division to unsigned
division when dividend can't be negative
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

int f(int x) {
  if (x < 0) 
__builtin_abort();
  return x/3;
}

The `return` statement can be optimized to `return (unsigned)x/3;`. This
optimization is done by LLVM, but not by GCC.

[Bug tree-optimization/102579] New: Failure to optimize out allocation if volatile read is present in the middle

2021-10-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102579

Bug ID: 102579
   Summary: Failure to optimize out allocation if volatile read is
present in the middle
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void test_unused() {
  volatile int d;
  int *p = new int;
  d;
  delete p;
}

This can be optimized to just reading `d` once. LLVM does this optimization,
but GCC does not.

[Bug target/102575] New: Failure to optimize double _Complex stores to use largest loads/stores possible

2021-10-03 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102575

Bug ID: 102575
   Summary: Failure to optimize double _Complex stores to use
largest loads/stores possible
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

void test(double _Complex *a)
{
a[0] = 1;
a[1] = 1;
}

With -O3, on AMD64 GCC outputs this:

test(double _Complex*):
movsd   xmm1, QWORD PTR .LC0[rip]
movsd   xmm0, QWORD PTR .LC0[rip+8]
movsd   QWORD PTR [rdi], xmm1
movsd   QWORD PTR [rdi+8], xmm0
movsd   QWORD PTR [rdi+16], xmm1
movsd   QWORD PTR [rdi+24], xmm0
ret

Clang instead outputs this:

test(double _Complex*):
movsd   xmm0, qword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero
movups  xmmword ptr [rdi], xmm0
movups  xmmword ptr [rdi + 16], xmm0
ret

It seems to me like the second output should always be faster.

PS: The difference is even larger with `-mavx2`.

[Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP

2021-09-26 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494

Bug ID: 102494
   Summary: Failure to optimize out vector reduction properly
especially when using OpenMP
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 
#include 

typedef int8_t simde_int8x8_t __attribute__((__vector_size__(8)));

int16_t
simde_vaddlv_s8(simde_int8x8_t a) {
int16_t r = 0;

#pragma omp simd reduction(+:r)
for (size_t i = 0 ; i < (sizeof(a) / sizeof(a[0])) ; i++) {
  r += a[i];
}

return r;
}

Compiled with -O3 -fopenmp-simd, this is the output on AMD64:

simde_vaddlv_s8(signed char __vector(8)):
pxorxmm1, xmm1
movdqa  xmm2, xmm0
pcmpgtb xmm1, xmm0
punpcklbw   xmm0, xmm1
punpcklbw   xmm2, xmm1
pshufd  xmm0, xmm0, 78
movqQWORD PTR [rsp-24], xmm2
movqQWORD PTR [rsp-16], xmm0
movdqa  xmm0, XMMWORD PTR [rsp-24]
psrldq  xmm0, 8
paddw   xmm0, XMMWORD PTR [rsp-24]
movdqa  xmm1, xmm0
psrldq  xmm1, 4
paddw   xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 2
paddw   xmm0, xmm1
pextrw  eax, xmm0, 0
ret

This is what Clang manages:

simde_vaddlv_s8(signed char __vector(8)):
punpcklbw   xmm0, xmm0  # xmm0 =
xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
psraw   xmm0, 8
pshufd  xmm1, xmm0, 238 # xmm1 = xmm0[2,3,2,3]
paddw   xmm1, xmm0
pshufd  xmm0, xmm1, 85  # xmm0 = xmm1[1,1,1,1]
paddw   xmm0, xmm1
movdqa  xmm1, xmm0
psrld   xmm1, 16
paddw   xmm1, xmm0
movdeax, xmm1
ret

Weirdly enough, removing the `#pragma omp simd reduction(+r)` slightly improves
  GCC's output to this:

simde_vaddlv_s8(signed char __vector(8)):
pxorxmm1, xmm1
movdqa  xmm2, xmm0
pcmpgtb xmm1, xmm0
punpcklbw   xmm0, xmm1
punpcklbw   xmm2, xmm1
pshufd  xmm0, xmm0, 78
paddw   xmm0, xmm2
pextrw  edx, xmm0, 1
pextrw  eax, xmm0, 0
add eax, edx
pextrw  edx, xmm0, 2
add eax, edx
pextrw  edx, xmm0, 3
add eax, edx
ret

[Bug target/101543] extra zeroing of empty struct argument/return value

2021-09-22 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101543

--- Comment #4 from Gabriel Ravier  ---
Nevermind, didn't see this was an aarch64 bug

[Bug target/101543] extra zeroing of empty struct argument/return value

2021-09-22 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101543

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #3 from Gabriel Ravier  ---
Seems to be fixed on trunk.

[Bug rtl-optimization/7061] Access of bytes in struct parameters

2021-09-22 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #7 from Gabriel Ravier  ---
Compiling this under ia64 seems to now be optimized perfectly as of at least
GCC 10, though the other ones look like they're still badly handled.

[Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used

2021-09-21 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

Bug ID: 102438
   Summary: [x86-64] Failure to optimize out random extra
store+load in vector code when memcpy is used
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

typedef double simde_float64x1_t __attribute__((__vector_size__(8)));

simde_float64x1_t simde_vabs_f64(simde_float64x1_t a) {
simde_float64x1_t r;
r[0] = -a[0];
return (simde_float64x1_t)r;
}

On AMD64 with -O3, this is outputted:

simde_vabs_f64(double __vector(1)):
movsd   xmm0, QWORD PTR [rsp+8]
xorpd   xmm0, XMMWORD PTR .LC0[rip]
mov rax, rdi
movsd   QWORD PTR [rsp-24], xmm0
mov rdx, QWORD PTR [rsp-24]
mov QWORD PTR [rdi], rdx
ret

If we instead just return `r` (without the cast) this is instead outputted:

simde_vabs_f64(double __vector(1)):
movsd   xmm0, QWORD PTR [rsp+8]
xorpd   xmm0, XMMWORD PTR .LC0[rip]
mov rax, rdi
movsd   QWORD PTR [rdi], xmm0
ret

It seems as though the presence of a cast (to the same type, no less) confuses
GCC into spilling the result into memory.

The GIMPLE optimized output is different for the two, so idk how much this
target-specific to x86, but I haven't been able to reproduce it anywhere else,
so ¯\_(ツ)_/¯. 

PS: The same bug can also be reproduced with -m32

[Bug c/54192] -fno-trapping-math by default?

2021-09-20 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #5 from Gabriel Ravier  ---
Also of note should be the fact that Clang's current default is
`-fno-trapping-math`.

I'm myself kind of curious about how exactly `-ftrapping-math` is interpreted.
It certainly doesn't seem to remove every single kind of non-trapping
math-based optimization: GCC will remove such statements as `(void)1/x;` even
with `-ftrapping-math`, even though that could fault with `x == 0`, and will
optimize things like `float x = 3412897421;` to not do a conversion even though
that conversion could raise an exception (as 3412897421 cannot be exactly
represented as a float), whereas Clang won't do that kind of optimization and
will keep those operations as-is.

[Bug target/48297] Suboptimal optimization of boolean expression addition

2021-09-18 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48297

Gabriel Ravier  changed:

   What|Removed |Added

 CC||gabravier at gmail dot com

--- Comment #3 from Gabriel Ravier  ---
We should... ? 

Also, the code generation seems to be slightly better now, though I don't think
ideal yet, but I'm not sure.

[Bug target/102402] New: Seemingly suboptimal optimization of jmp/cmovcc for conditionally loading constants

2021-09-18 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102402

Bug ID: 102402
   Summary: Seemingly suboptimal optimization of jmp/cmovcc for
conditionally loading constants
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

struct MusicPlayerTrack
{
uint8_t flags;
uint8_t modT;
};

void ClearModM(struct MusicPlayerTrack *track, uint8_t modT)
{
if (track->modT == 0)
track->flags |= 3;
else
track->flags |= 12;
}

This is optimized weirdly by GCC. Leaving it as-is gives this AMD64 assembly:

ClearModM:
  movzx edx, BYTE PTR [rdi]
  mov eax, edx
  or eax, 12
  cmp BYTE PTR [rdi+1], 0
  jne .L3
  mov eax, edx
  or eax, 3
.L3:
  mov BYTE PTR [rdi], al
  ret

Whereas changing the `if` to `if (modT == 0)` gives this:

ClearModM:
  movzx eax, BYTE PTR [rdi]
  mov edx, eax
  or eax, 12
  or edx, 3
  test sil, sil
  cmove eax, edx
  mov BYTE PTR [rdi], al
  ret

It seems to me that this should be better than the first output, though of
course this could be the other way considering how finicky cmovcc seems to be,
but it seems to me like at least one should be preferred above the other.

Note that this also occurs on IA-32, so the issue seems unrelated to whether
modT is in a register or in memory. Perhaps it's about whether it's a function
argument ?

[Bug tree-optimization/102393] Failure to optimize 2 8-bit stores into a single 16-bit store

2021-09-18 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102393

--- Comment #3 from Gabriel Ravier  ---
It seems odd that the equivalent 32-bit pattern, i.e. this:

void HeaderWriteU32LE(int offset, uint32_t value, uint8_t *RomHeader)
{
RomHeader[offset] = value;
RomHeader[offset + 1] = value >> 8;
RomHeader[offset + 2] = value >> 16;
RomHeader[offset + 3] = value >> 24;
}

is optimized to a single store, though, even though the 32-bit pattern for PR
102391 doesn't. It's why I made this a separate bug report, as I thought it
indicated a likely difference in the cause of the bug.

[Bug tree-optimization/102393] New: Failure to optimize 2 8-bit stores into a single 16-bit store

2021-09-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102393

Bug ID: 102393
   Summary: Failure to optimize 2 8-bit stores into a single
16-bit store
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

void HeaderWriteU16LE(int offset, uint16_t value, uint8_t *RomHeader)
{
RomHeader[offset] = value;
RomHeader[offset + 1] = value >> 8;
}

Non-withstanding aliasing, this can be optimized to `*(uint16_t *)(RomHeader +
offset) = value`. This transformation is done by LLVM, but not by GCC.

Sample AMD64 output for this from GCC:

HeaderWriteU16LE:
  movsx rdi, edi
  mov eax, esi
  mov BYTE PTR [rdx+rdi], sil
  mov BYTE PTR [rdx+1+rdi], ah
  ret

And from LLVM:

HeaderWriteU16LE:
  movsxd rax, edi
  mov word ptr [rdx + rax], si
  ret

PS: The equivalent pattern for 4 8-bit stores gets optimized into a single
32-bit store.

[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load

2021-09-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Gabriel Ravier  changed:

   What|Removed |Added

Summary|Failure to optimize 2 8-bit |Failure to optimize
   |loads into a single 16-bit  |adjacent 8-bit loads into a
   |load|single bigger load

--- Comment #1 from Gabriel Ravier  ---
Note: this also equivalently works on bigger sizes:

uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader)
{
return RomHeader[offset] |
(RomHeader[offset + 1] << 8) |
(RomHeader[offset + 2] << 16) |
(RomHeader[offset + 3] << 24);
}

On AMD64, GCC outputs this:

HeaderReadU32LE:
  movsx rdi, edi
  movzx eax, BYTE PTR [rsi+1+rdi]
  movzx edx, BYTE PTR [rsi+2+rdi]
  sal eax, 8
  sal edx, 16
  or eax, edx
  movzx edx, BYTE PTR [rsi+rdi]
  or eax, edx
  movzx edx, BYTE PTR [rsi+3+rdi]
  sal edx, 24
  or eax, edx
  ret

LLVM manages this:

HeaderReadU32LE:
  movsxd rax, edi
  mov eax, dword ptr [rsi + rax]
  ret

[Bug tree-optimization/102392] New: Failure to optimize out sign extension when input is non-negative

2021-09-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392

Bug ID: 102392
   Summary: Failure to optimize out sign extension when input is
non-negative
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

void f(int64_t x);

void g(int32_t x)
{
if (x < 0)
__builtin_unreachable();
f(x);
}

This can be optimized to avoid the sign extension since x can't be under 0.
This optimization is done by LLVM, but not by GCC.

Sample resulting assembly from GCC:

g:
  movsx rdi, edi
  jmp f

from LLVM:

g:
  mov edi, edi
  jmp f

(PS: I originally found this while looking at the code that led me to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391: an error check earlier in
the code (not in the example cited there) wound up making this assumption
possible, and slightly changed the assembly code emitted by LLVM there to be
even more efficient)

[Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load

2021-09-17 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Bug ID: 102391
   Summary: Failure to optimize 2 8-bit loads into a single 16-bit
load
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include 

uint16_t HeaderReadU16LE(int offset, uint8_t *RomHeader)
{
return RomHeader[offset] |
(RomHeader[offset + 1] << 8);
}

This can be optimized into a single 16-bit load. On -O3, this optimization is
done by LLVM, but not by GCC.

This winds up affecting the resulting assembly quite a bit:

AMD64 GCC:

HeaderReadU16LE:
  movsx rdi, edi
  movzx edx, BYTE PTR [rsi+1+rdi]
  movzx eax, BYTE PTR [rsi+rdi]
  sal edx, 8
  or eax, edx
  ret

AMD64 LLVM:

HeaderReadU16LE:
  movsxd rax, edi
  movzx eax, word ptr [rsi + rax]
  ret

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #7 from Gabriel Ravier  ---
Also, `-ffast-math` seems to "fix" this, since in that case the code is
recognized as an ABS_EXPR pattern and as such results in the same code being
emitted without the xor. Is there any reason this isn't the case without fast
math ? I'd assume the xor wouldn't do anything w.r.t. conformance, personally.

[Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

Gabriel Ravier  changed:

   What|Removed |Added

Summary|[12 regession] wrong code   |[9/10/11/12 regession]
   |for `x * copysign(1.0, x)`  |wrong code for `x *
   ||copysign(1.0, x)`

--- Comment #6 from Gabriel Ravier  ---
Actually, I've only gotten a snapshot from the 5th, which does not appear to
include HJ's patch from the 4th (which seems rather odd). Does it happen to fix
this ? I'd assume it does not, since that patch just seems to care about not
destructing the source and not about the emission of the xor that breaks this,
but I can't know for sure rn.

[Bug target/102224] [12 regession] wrong code for `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #5 from Gabriel Ravier  ---
Actually it seems to me like this is a GCC 9 regression, ever since this
pattern exists: GCC 9, 10 and 11 emit the exact same faulty code.

[Bug tree-optimization/102224] Incorrect compile on `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #3 from Gabriel Ravier  ---
Also seems like this might be unique to x86 as this compiles fine on Aarch64
(though while it doesn't try to do anything stupid like xoring the result with
itself, it does still not optimize the XOR_SIGN to an ABS_EXPR at the GIMPLE
level).

[Bug tree-optimization/102224] Incorrect compile on `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #2 from Gabriel Ravier  ---
(PS: by "x and y" I mean "the two arguments". If they're the same, GCC should
obviously just optimize this to an abs as that's what it ends up being)

[Bug tree-optimization/102224] Incorrect compile on `x * copysign(1.0, x)`

2021-09-06 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102224

--- Comment #1 from Gabriel Ravier  ---
(Note: this is a miscompile because it compiles as equivalent to `return 0;` as
that's what `xorps xmm0, xmm0` will do)

1 2 >

1 - 100 of 184 matches

Mail list logo