[Bug c++/103273] [12 Regression] internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103273 Steinar H. Gunderson changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #1 from Steinar H. Gunderson --- Nevermind, fixed in a newer version!
[Bug c++/103273] New: [12 Regression] internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103273 Bug ID: 103273 Summary: [12 Regression] internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: steinar+gcc at gunderson dot no Target Milestone: --- Found while minimizing another regression: gcc version 12.0.0 20210918 (experimental) [master r12-3644-g7afcb534239] (Debian 20210918-1) bigscreen:~/creduce> cat fts0opt.i template struct b; b < b struct { bigscreen:~/creduce> /usr/lib/gcc-snapshot/bin/g++ -c fts0opt.i fts0opt.i:2:14: error: types may not be defined in template arguments 2 | b < b struct { | ^ fts0opt.i:2:15: error: expected '}' at end of input 2 | b < b struct { | ~^ fts0opt.i:2:15: internal compiler error: in cp_parser_type_id_1, at cp/parser.c:24010 0x6fa6a2 cp_parser_type_id_1 /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:24010 0xf578c3 cp_parser_template_type_arg /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:24105 0xf579ef cp_parser_template_argument /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18660 0xf579ef cp_parser_template_argument_list /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18571 0xf579ef cp_parser_enclosed_template_argument_list /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:31853 0xf58e76 cp_parser_template_id /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:18108 0xf5969b cp_parser_class_name /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:25535 0xf5035a cp_parser_qualifying_entity /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:7048 0xf5035a cp_parser_nested_name_specifier_opt /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:6730 0xf45f4d cp_parser_constructor_declarator_p /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:30712 0xf45f4d cp_parser_decl_specifier_seq /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:15743 0xf469b4 cp_parser_simple_declaration /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:15006 0xf76f25 cp_parser_declaration /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:14819 0xf778fe cp_parser_toplevel_declaration /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:14840 0xf778fe cp_parser_translation_unit /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:4978 0xf778fe c_parse_file() /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/cp/parser.c:47653 0x10a4a4d c_common_parse_file() /build/gcc-snapshot-FKpLPc/gcc-snapshot-20210918/src/gcc/c-family/c-opts.c:1236 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See for instructions.
[Bug middle-end/103071] Missed optimization for symmetric subset: (a & b) == a || (a & b) == b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103071 --- Comment #2 from Steinar H. Gunderson --- EitherIsSubset() in the example calls foo or bar (but with a redundant test that I can't get easily rid of). I agree that if you just return 0/1, the cmp+sete+or variant is probably as good, but that's not what you get if you branch on it.
[Bug rtl-optimization/103071] New: Missed optimization for symmetric subset: (a & b) == a || (a & b) == b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103071 Bug ID: 103071 Summary: Missed optimization for symmetric subset: (a & b) == a || (a & b) == b Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: steinar+gcc at gunderson dot no Target Milestone: --- This is a bit of a long shot, but I'll file it anyway :-) I have this function in a hot path (of course, in the real project, it's inlined): #include #include void foo(); void bar(); void EitherIsSubset(uint64_t v0, uint64_t v1) { if ((v0 & v1) == v0 || (v0 & v1) == v1) { foo(); } else { bar(); } } It is intended to treat v0 and v1 as bit sets, and then test whether either v0 or v1 is a subset of each other (or that they are equal). (An equivalent formulation happens to be replacing & with |.) GCC compiles (with -O2, x86-64) this to: EitherIsSubset: movq%rdi, %rax andq%rsi, %rax cmpq%rsi, %rax je .L4 cmpq%rdi, %rax je .L4 xorl%eax, %eax jmp bar@PLT .L4: xorl%eax, %eax jmp foo@PLT This is pretty straight-forward, but feels like it's using two (relatively hard-to-predict) branches where it should be possible to deal with one. And indeed, GNU superopt (!) found this amazing sequence instead, with v0 in eax and v1 in edx (this is, of course, trivially portable to 64-bit): 14: mov %edx,%ecx or %eax,%edx cmp %edx,%eax sbb %ebx,%ebx sbb %ecx,%edx adc $1,%ebx I can't claim to understand fully what it does, but after this, ebx contains either 0 or 1 with the right answer, and one would assume that after this, the zero flag is also usable to branch on (leaving us with one branch instead of two, in all). Is it possible to teach GCC this sequence? I tried using it as inline assembler, and while it works, it seems it becomes suboptimal and slower, because I can't return a condition code (so I get a redundant test): inline bool EitherIsSubsetAsm(uint64_t v0, uint64_t v1) { uint64_t tmp = v0 | v1; bool result; asm("cmp %1, %2 ; sbb %0, %0 ; sbb %3, %1 ; adc $1, %0" : "=r"(result), "+"(tmp) : "r"(v0), "r"(v1) : "cc"); return result; } void EitherIsSubset(uint64_t v0, uint64_t v1) { if (EitherIsSubsetAsm(v0, v1)) { foo(); } else { bar(); } }
[Bug tree-optimization/101139] Unable to remove double byteswap in fast path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139 --- Comment #4 from Steinar H. Gunderson --- Yes, the integer promotion actually costs some performance. It happens on both x86 and Arm. Should I file that as a separate bug?
[Bug target/101200] Unneeded AND after shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200 --- Comment #6 from Steinar H. Gunderson --- You're right, I don't know why the shrq happened. When I run now, I get shrb. Doesn't matter for the bug, though.
[Bug tree-optimization/101200] New: Unneeded AND after shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200 Bug ID: 101200 Summary: Unneeded AND after shift Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: steinar+gcc at gunderson dot no Target Milestone: --- The code after reduction is: struct { int b[6]; } c; unsigned char d; void e() { unsigned char a = d >> 4, f = d & 15; c.b[a] = c.b[f]; } with g++-11 -O2, this produces movzbl d(%rip), %eax movq%rax, %rdx shrq$4, %rax andl$15, %edx andl$15, %eax movlc(,%rdx,4), %edx movl%edx, c(,%rax,4) ret The second AND with 15 is unneeded and should have been optimized away by VRP as I understand it. I can't reproduce it with ARM, though, so maybe there's something x86-specific? Compiler is gcc version 11.1.0 (Debian 11.1.0-3) The same code is generated back to at least 4.9. Also present in gcc version 12.0.0 20210527 (experimental) [master revision 262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian 20210527-1)
[Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956 --- Comment #7 from Steinar H. Gunderson --- To wrap this up, confirming that GCC 11 does well on my benchmark: BM_Chain2054529 iterations 18781 ns/iter GCC 10, asm bsfq BM_Chain2044584 iterations 22509 ns/iter GCC 10, ffsll() BM_Chain2049753 iterations 20216 ns/iter GCC 11, asm bsfq BM_Chain2053346 iterations 18816 ns/iter GCC 11, ffsll() BM_Chain2064926 iterations 15747 ns/iter Clang 12, asm bsfq BM_Chain2071208 iterations 14374 ns/iter Clang 12, ffsll() So basically for 11+, the ffsll() statement does better than the bsfq statement, whereas it used to do markedly worse. Clang does even better, but I can live with that. :-)
[Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139 Bug ID: 101139 Summary: Unable to remove double byteswap in fast path Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: steinar+gcc at gunderson dot no Target Milestone: --- The following code is reduced from a real interpreter: extern void (*a[])(); int d, e, h, l; typedef struct { char ab; } f; f g; short i(); short m68ki_read_imm_16() { short j, k; int b = d; f f = g; if (b < h) return __builtin_bswap16(()[0]); k = i(); short c = k; j = __builtin_bswap16(c); return j; } int b() { short m; do { m = m68ki_read_imm_16(); short c = m; l = __builtin_bswap16(c); a[l](); } while (e); return e; } Compiling with arm-linux-gnueabihf-gcc-10 -O2 yields this interesting sequence in the function: b .L11 .L15: ldrbr3, [r5, #8]@ zero_extendqisi2 rev16 r3, r3 uxthr3, r3 .L10: rev16 r3, r3 uxthr3, r3 The original code intention was to have a reusable function that returned in big-endian, but that a specific use of it would be able to ignore endianness into a table lookup, removing the double-swap entirely. GCC can normally do that, but it seems that the branch in m68ki_read_imm_16() somehow gets in the way. Just to be clear, I expect zero rev16 instructions altogether in b() when m68ki_read_imm_16() is inlined. The problem is not ARM-specific; x86 shows a similar problematic sequence: leaqa(%rip), %rbx jmp .L11 .p2align 4,,10 .p2align 3 .L15: movsbw g(%rip), %ax rolw$8, %ax .L10: rolw$8, %ax movzwl %ax, %edx Also verified with gcc version 12.0.0 20210527 (experimental) [master revision 262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian 20210527-1)