[Bug c++/103273] [12 Regression] internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010

2021-11-16 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103273

Steinar H. Gunderson  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Steinar H. Gunderson  ---
Nevermind, fixed in a newer version!

[Bug c++/103273] New: [12 Regression] internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010

2021-11-16 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103273

Bug ID: 103273
   Summary: [12 Regression] internal compiler error: in
cp_parser_type_id_1, at cp/parser.c:24010
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

Found while minimizing another regression:

gcc version 12.0.0 20210918 (experimental) [master r12-3644-g7afcb534239]
(Debian 20210918-1) 

bigscreen:~/creduce> cat fts0opt.i
template  struct b;
b < b struct {

bigscreen:~/creduce> /usr/lib/gcc-snapshot/bin/g++ -c fts0opt.i
fts0opt.i:2:14: error: types may not be defined in template arguments
2 | b < b struct {
  |  ^
fts0opt.i:2:15: error: expected '}' at end of input
2 | b < b struct {
  |  ~^
fts0opt.i:2:15: internal compiler error: in cp_parser_type_id_1, at
cp/parser.c:24010
0x6fa6a2 cp_parser_type_id_1
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:24010
0xf578c3 cp_parser_template_type_arg
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:24105
0xf579ef cp_parser_template_argument
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18660
0xf579ef cp_parser_template_argument_list
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18571
0xf579ef cp_parser_enclosed_template_argument_list
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:31853
0xf58e76 cp_parser_template_id
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18108
0xf5969b cp_parser_class_name
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:25535
0xf5035a cp_parser_qualifying_entity
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:7048
0xf5035a cp_parser_nested_name_specifier_opt
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:6730
0xf45f4d cp_parser_constructor_declarator_p
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:30712
0xf45f4d cp_parser_decl_specifier_seq
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:15743
0xf469b4 cp_parser_simple_declaration
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:15006
0xf76f25 cp_parser_declaration
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:14819
0xf778fe cp_parser_toplevel_declaration
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:14840
0xf778fe cp_parser_translation_unit
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:4978
0xf778fe c_parse_file()
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:47653
0x10a4a4d c_common_parse_file()
   
/build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/c-family/c-opts.c:1236
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug middle-end/103071] Missed optimization for symmetric subset: (a & b) == a || (a & b) == b

2021-11-04 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103071

--- Comment #2 from Steinar H. Gunderson  ---
EitherIsSubset() in the example calls foo or bar (but with a redundant test
that I can't get easily rid of). I agree that if you just return 0/1, the
cmp+sete+or variant is probably as good, but that's not what you get if you
branch on it.

[Bug rtl-optimization/103071] New: Missed optimization for symmetric subset: (a & b) == a || (a & b) == b

2021-11-03 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103071

Bug ID: 103071
   Summary: Missed optimization for symmetric subset: (a & b) == a
|| (a & b) == b
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

This is a bit of a long shot, but I'll file it anyway :-)

I have this function in a hot path (of course, in the real project, it's
inlined):

#include 
#include 

void foo();
void bar();

void EitherIsSubset(uint64_t v0, uint64_t v1) {
if ((v0 & v1) == v0 || (v0 & v1) == v1) {
foo();
} else {
bar();
}
}

It is intended to treat v0 and v1 as bit sets, and then test whether either v0
or v1 is a subset of each other (or that they are equal). (An equivalent
formulation happens to be replacing & with |.)

GCC compiles (with -O2, x86-64) this to:

EitherIsSubset:
movq%rdi, %rax
andq%rsi, %rax
cmpq%rsi, %rax
je  .L4
cmpq%rdi, %rax
je  .L4
xorl%eax, %eax
jmp bar@PLT
.L4:
xorl%eax, %eax
jmp foo@PLT

This is pretty straight-forward, but feels like it's using two (relatively
hard-to-predict) branches where it should be possible to deal with one. And
indeed, GNU superopt (!) found this amazing sequence instead, with v0 in eax
and v1 in edx (this is, of course, trivially portable to 64-bit):

14: mov %edx,%ecx
or  %eax,%edx
cmp %edx,%eax
sbb %ebx,%ebx
sbb %ecx,%edx
adc $1,%ebx

I can't claim to understand fully what it does, but after this, ebx contains
either 0 or 1 with the right answer, and one would assume that after this, the
zero flag is also usable to branch on (leaving us with one branch instead of
two, in all).

Is it possible to teach GCC this sequence? I tried using it as inline
assembler, and while it works, it seems it becomes suboptimal and slower,
because I can't return a condition code (so I get a redundant test):

inline bool EitherIsSubsetAsm(uint64_t v0, uint64_t v1) {
uint64_t tmp = v0 | v1;
bool result;
asm("cmp %1, %2 ; sbb %0, %0 ; sbb %3, %1 ; adc $1, %0"
: "=r"(result), "+"(tmp)
: "r"(v0), "r"(v1)
: "cc");
return result;
}

void EitherIsSubset(uint64_t v0, uint64_t v1) {
if (EitherIsSubsetAsm(v0, v1)) {
foo();
} else {
bar();
}
}

[Bug tree-optimization/101139] Unable to remove double byteswap in fast path

2021-06-26 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

--- Comment #4 from Steinar H. Gunderson  ---
Yes, the integer promotion actually costs some performance. It happens on both
x86 and Arm. Should I file that as a separate bug?

[Bug target/101200] Unneeded AND after shift

2021-06-25 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #6 from Steinar H. Gunderson  ---
You're right, I don't know why the shrq happened. When I run now, I get shrb.
Doesn't matter for the bug, though.

[Bug tree-optimization/101200] New: Unneeded AND after shift

2021-06-24 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

Bug ID: 101200
   Summary: Unneeded AND after shift
   Product: gcc
   Version: 11.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

The code after reduction is:

struct {
  int b[6];
} c;
unsigned char d;
void e() {
  unsigned char a = d >> 4, f = d & 15;
  c.b[a] = c.b[f];
}

with g++-11 -O2, this produces

movzbl  d(%rip), %eax
movq%rax, %rdx
shrq$4, %rax
andl$15, %edx
andl$15, %eax
movlc(,%rdx,4), %edx
movl%edx, c(,%rax,4)
ret

The second AND with 15 is unneeded and should have been optimized away by VRP
as I understand it. I can't reproduce it with ARM, though, so maybe there's
something x86-specific?

Compiler is

  gcc version 11.1.0 (Debian 11.1.0-3) 

The same code is generated back to at least 4.9. Also present in

  gcc version 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian
20210527-1)

[Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero

2021-06-20 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #7 from Steinar H. Gunderson  ---
To wrap this up, confirming that GCC 11 does well on my benchmark:

BM_Chain2054529 iterations  18781 ns/iter   GCC 10, asm bsfq
BM_Chain2044584 iterations  22509 ns/iter   GCC 10, ffsll()
BM_Chain2049753 iterations  20216 ns/iter   GCC 11, asm bsfq
BM_Chain2053346 iterations  18816 ns/iter   GCC 11, ffsll()
BM_Chain2064926 iterations  15747 ns/iter   Clang 12, asm bsfq
BM_Chain2071208 iterations  14374 ns/iter   Clang 12, ffsll()

So basically for 11+, the ffsll() statement does better than the bsfq
statement, whereas it used to do markedly worse.

Clang does even better, but I can live with that. :-)

[Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path

2021-06-20 Thread steinar+gcc at gunderson dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

Bug ID: 101139
   Summary: Unable to remove double byteswap in fast path
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

The following code is reduced from a real interpreter:

extern void (*a[])();
int d, e, h, l;
typedef struct {
  char ab;
} f;
f g;
short i();
short m68ki_read_imm_16() {
  short j, k;
  int b = d;
  f f = g;
  if (b < h)
return __builtin_bswap16(()[0]);
  k = i();
  short c = k;
  j = __builtin_bswap16(c);
  return j;
}
int b() {
  short m;
  do {
m = m68ki_read_imm_16();
short c = m;
l = __builtin_bswap16(c);
a[l]();
  } while (e);
  return e;
}

Compiling with arm-linux-gnueabihf-gcc-10 -O2 yields this interesting sequence
in the function:

b   .L11
.L15:
ldrbr3, [r5, #8]@ zero_extendqisi2
rev16   r3, r3
uxthr3, r3
.L10:
rev16   r3, r3
uxthr3, r3

The original code intention was to have a reusable function that returned in
big-endian, but that a specific use of it would be able to ignore endianness
into a table lookup, removing the double-swap entirely. GCC can normally do
that, but it seems that the branch in m68ki_read_imm_16() somehow gets in the
way. Just to be clear, I expect zero rev16 instructions altogether in b() when
m68ki_read_imm_16() is inlined.

The problem is not ARM-specific; x86 shows a similar problematic sequence:

leaqa(%rip), %rbx
jmp .L11
.p2align 4,,10
.p2align 3
.L15:
movsbw  g(%rip), %ax
rolw$8, %ax
.L10:
rolw$8, %ax
movzwl  %ax, %edx

Also verified with

gcc version 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian
20210527-1)