[Bug c++/115091] New: Support value speculation in frontend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091 Bug ID: 115091 Summary: Support value speculation in frontend Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- This blog post describes an interesting optimization technique for memory access. https://mazzo.li/posts/value-speculation.html A linked list walk is often be limited by the latency of the L1 cache. When the program can guess the next address (e.g. because the nodes are often allocated sequentially in memory) it is possible to use construct like if (node->next == node + 1) node++; else node = node->next; and rely on the CPU speculating the fast case. However this often runs into problems with the compiler, e.g. for next = node->next; node++; if (node != next) node = next; is often optimized away. While this can be worked around with some code restructuring, this may not always work for more complex cases. I wonder if it makes sense to formally support this technique with a "nocse" or similar variable attribute that is honored by optimization passes.
[Bug gcov-profile/113765] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765 --- Comment #3 from Andi Kleen --- -O1 fixes it, so an easy patch would be diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 63d0c3dc36df..180ed7a8260f 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1758,7 +1758,7 @@ public: bool gate (function *) final override { -return flag_auto_profile; +return flag_auto_profile && optimize > 0; } unsigned int execute (function *) final override
[Bug gcov-profile/113765] autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765 --- Comment #1 from Andi Kleen --- Seems to be a regression, I tested the same setup on gcc 13 and the test passes there: 55:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, -fprofile-generate -D_PROFILE_GENERATE 59:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution, -fprofile-generate -D_PROFILE_GENERATE 62:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, -fprofile-use -D_PROFILE_USE 66:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-fprofile-use -D_PROFILE_USE 76:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, -g -DFOR_AUTOFDO_TESTING 108:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-g -DFOR_AUTOFDO_TESTING 111:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, -fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining 115:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution, -fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining
[Bug gcov-profile/113765] New: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765 Bug ID: 113765 Summary: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- With recent trunk (019dc63819be) When running the test suite on a Intel system with autofdo installed Executing on host: /home/ak/gcc/obj-full/gcc/xgcc -B/home/ak/gcc/obj-full/gcc/ /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c -fdi agnostics-plain-output -O0 -pthread -fprofile-update=atomic -fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda -DFOR_AU TOFDO_TESTING -fearly-inlining -dumpbase-ext .x02 -lm -o /home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02 (timeout = 300) spawn -ignore SIGHUP /home/ak/gcc/obj-full/gcc/xgcc -B/home/ak/gcc/obj-full/gcc/ /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c -fdiag nostics-plain-output -O0 -pthread -fprofile-update=atomic -fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda -DFOR_AUTOFD O_TESTING -fearly-inlining -dumpbase-ext .x02 -lm -o /home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02 /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c: In function 'copy_memory': /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7: error: probability of edge from entry block not initialized /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7: error: probability of edge 2->4 not initialized /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7: error: probability of edge 5->1 not initialized during GIMPLE pass: fixup_cfg /home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7: internal compiler error: verify_flow_info failed 0xafb91e verify_flow_info() ../../gcc/gcc/cfghooks.cc:287 0xf0e8a7 execute_function_todo ../../gcc/gcc/passes.cc:2100 0xf0edde execute_todo ../../gcc/gcc/passes.cc:2142 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. compiler exited with status 1 I'm not attaching the source because it also needs the autofdo gcov file to reproduce and the test case is already in tree.
[Bug lto/107779] Support implicit references from inline assembler to compiler symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779 --- Comment #4 from Andi Kleen --- This whole manual annotation idea (which is equivalent to marking the symbols global and visible and that is what a large part of the kernel LTO patchkit) is dead on arrival because the kernel people already rejected it. Their argument is that they don't need it for LLVM why should they be forced to it for GCC. In LLVM it is just done by the assembler, and it works without any extra program changes. Since gcc is not the only game in town anymore they have a point. It's either heuristics or integrating the assembler.
[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743 --- Comment #5 from Andi Kleen --- config/i386/i386.h:#define SLOW_BYTE_ACCESS 0 You mean it doesn't define it?
[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743 --- Comment #2 from Andi Kleen --- Okay then it doesn't understand that SHL_signed and SHR_unsigned can be combined when one the values came from a shorter unsigned.
[Bug middle-end/111743] New: shifts in bit field accesses don't combine with other shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743 Bug ID: 111743 Summary: shifts in bit field accesses don't combine with other shifts Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- (not sure it's the middle-end, picked arbitrarily) The following code struct bf { unsigned a : 10, b : 20, c : 10; }; unsigned fbc(struct bf bf) { return bf.b | (bf.c << 20); } generates: movq%rdi, %rax shrq$10, %rdi shrq$32, %rax andl$1048575, %edi andl$1023, %eax sall$20, %eax orl %edi, %eax ret It doesn't understand that the shift right can be combined with the shift left. Also not sure why the shift left is arithmetic (this should be all unsigned) clang does the simplification which ends up one instruction shorter: movl%edi, %eax shrl$10, %eax andl$1048575, %eax # imm = 0xF shrq$12, %rdi andl$1072693248, %edi # imm = 0x3FF0 orl %edi, %eax retq
[Bug lto/107779] New: Support implicit references from inline assembler to compiler symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779 Bug ID: 107779 Summary: Support implicit references from inline assembler to compiler symbols Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org, mliska at suse dot cz Target Milestone: --- Created attachment 53933 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53933=edit prototype patch So I looked into the problem the kernel people complained about: a lot of assembler statements reference C symbols, which need externally_visible and global for gcc LTO, otherwise they can end up in the wrong asm file and cause missing symbols. I came up with the attached (hackish) patch that tries to solve the problem very partially: it parses the assembler strings and looks for anything that could be an identifier, and then tries to mark it externally_visible. It has the following open issues: - The parsing is very approximate and doesn't handle some obscure cases. With the approximation it's also impossible to give error messages, but hopefully the linker takes care of that. It also gives false positives with some assembler syntax, but in the worst case would just lose some optimization from unnecessary references. - It doesn't handle the case (which happens in the kernel) that the C declaration is after the asm statement. This could be fixed with some more effort. - It doesn't work for static which can get mangled (that's a lot of the kernel cases) static is a difficult problem because there could be conflicting names, so we cannot jut put it all in partition zero. This would need some special handling in the LTO partitioning code to create new partitions just for having unique name spaces, and then avoid mangling. Related problem is also PR50676 It's likely possible to create situations where it's impossible to solve, there could be circular dependencies etc. But I assume in this case the non LTO case would fail too. Or maybe do something with redefining symbols at the assembler level. This one is somewhat difficult and I don't have a simple solution currently. Unfortunately to solve the kernel issue would need a solution for static.
[Bug lto/107014] flatten+lto fails the kernel build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #9 from Andi Kleen --- I suspect what happens is that it hits in some kernel initialization function. If they don't use initcall the LTO build can all inline them into each other (because they are only called once) creating a single big initialization function. With flatten that will create an extremely large function that takes a long time to process. I suspect any use of flatten is better using always_inline, since that affects only a single function. Should probably be fixed upstream in the kernel.
[Bug preprocessor/45227] libcpp Makefile does not enable instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45227 --- Comment #5 from Andi Kleen --- I think it was the method from the info file. But I can't quite remember. If you cannot reproduce it I guess it's ok to close. Maybe I made some mistake.
[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #12 from Andi Kleen --- It looks to me separate bugs are mixed together here. For example I looked at the preallocate_pmd warning again and I don't think there is any union there. Also I noticed that when I replace the *foo[N] with **foo it disappears. So I think that is something different. So there seem to be instances where such warnings happen without union members. Perhaps that one (and perhaps some others) need to be reanalyzed. I also looked at the intel_pm.c and I think that one is a real kernel bug, where the field accessed is really too small. I'll submit a patch for that.
[Bug lto/99828] inlining failed in call to ‘always_inline’ ‘memcpy’: --param max-inline-insns-auto limit reached
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99828 --- Comment #3 from Andi Kleen --- So what do you want to fix in the kernel? Use a wrapper for taking the address of the memcpy? (I hope nothing in gcc would remove such a wrapper)
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #12 from Andi Kleen --- Okay. I only compared gcc-7 (working) vs gcc-9 (broken), but always with LTO. Looking at the kernel link it also uses --whole-archive. Perhaps that makes a difference? I'll redo the test case with --whole-archive (will need some fixes)
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #9 from Andi Kleen --- I think the STB_SECONDARY stuff is only needed if ld -r is used, but not for ar
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #8 from Andi Kleen --- It works fine without LTO. Otherwise the Linux kernel wouldn't work. It relies on this behavior for its syscalls. The test case is extracted from there.
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #5 from Andi Kleen --- It doesn't seem to be the plugin itself, I compiled trunk with the gcc-7 lto-plugin.c and it fails too.
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #4 from Andi Kleen --- Reproduced on trunk too 11.0-200626 e74c281bf4955eea7fdc5f21b43e29fa0235a5b0
[Bug bootstrap/95934] New: bootstrap fails in compiler assert in sanitizer_platform_limits_posix.cpp:1136
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95934 Bug ID: 95934 Summary: bootstrap fails in compiler assert in sanitizer_platform_limits_posix.cpp:1136 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- commit e74c281bf4955eea7fdc5f21b43e29fa0235a5b0 (HEAD -> trunk, origin/trunk, origin/master, origin/HEAD) make bootstrap fails with ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:336:30: note: in expansion of macro 'IMPL_COMPILER_ASSERT' 336 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__) | ^~~~ ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h:1442:3: note: in expansion of macro 'COMPILER_CHECK' 1442 | COMPILER_CHECK(sizeof(((__sanitizer_##CLASS *)NULL)->MEMBER) == \ | ^~ ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:1136:1: note: in expansion of macro 'CHECK_SIZE_AND_OFFSET' 1136 | CHECK_SIZE_AND_OFFSET(ipc_perm, mode); | ^ Works fine when I comment out the assert. There already is a ifdef checking for lots of cases, seems it doesn't work on mine either. This is with a recent glibc-2.31-5.9.x86_64 (opensuse glibc) Perhaps the assert should just be disabled like this? (patch is likely white space damaged) diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp index b4f8f67b664..bb6377b70cb 100644 --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp @@ -1133,7 +1133,7 @@ CHECK_SIZE_AND_OFFSET(ipc_perm, cgid); /* On aarch64 glibc 2.20 and earlier provided incorrect mode field. */ /* On Arm newer glibc provide a different mode field, it's hard to detect so just disable the check. */ -CHECK_SIZE_AND_OFFSET(ipc_perm, mode); +//CHECK_SIZE_AND_OFFSET(ipc_perm, mode); #endif CHECK_TYPE_SIZE(shmid_ds);
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #3 from Andi Kleen --- Versions reproduced: gcc version 10.1.1 20200507 [revision dd38686d9c810cecbaa80bb82ed91caaa58ad635] (SUSE Linux) gcc-9 (SUSE Linux) 9.3.1 20200406 [revision 6db837a5288ee3ca5ec504fbd5a765817e556ac2] Version which worked correctly: gcc version 7.5.0 (SUSE Linux) binutils: GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #1 from Andi Kleen --- Created attachment 48792 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48792=edit sys_ni.i
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #2 from Andi Kleen --- Created attachment 48793 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48793=edit capability.i
[Bug lto/95928] New: LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 Bug ID: 95928 Summary: LTO through ar breaks weak function resolution Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Target: x86_64-linux Created attachment 48791 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48791=edit dummy.c With the attached test case (extracted from the Linux kernel) the expected behavior is that the strong version of __x64_sys_capget overrides the weak version in sys_ni.i This works with LTO when the object files are linked directly, but doesn't work (weak version of function is output) when the linking is through a .a file. Works gcc -flto -c sys_ni.i gcc -flto -c capability.i gcc -O2 -flto dummy.c sys_ni.o capability.o # sys_ni_syscall doesn't appear, so the strong version is chosen objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall Breaks: gcc -flto -c sys_ni.i gcc -flto -c capability.i rm -f x.a gcc-ar q x.a sys_ni.o capability.o gcc -O2 -flto dummy.c x.a # sys_ni_syscall appears, so the weak version is incorrectly chosen objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall This seems to be a regression, it worked on gcc-7, but breaks on gcc 9/10. Don't have any immediate versions to test.
[Bug target/93346] gcc not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 --- Comment #1 from Andi Kleen --- typedef unsigned u; u bzhi(u src, u inx) { return src & ((1 << inx) - 1); } with -O2 -march=skylake generates movl%esi, %r8d movl$1, %esi shlx%r8d, %esi, %esi leal-1(%rsi), %eax andl%edi, %eax ret clang generates the expected bzhil %esi, %edi, %eax retq
[Bug target/93346] New: gcc not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 Bug ID: 93346 Summary: gcc not generate BZHI Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Target: x86_64
[Bug inline-asm/89839] New: section not reset to text for top level asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89839 Bug ID: 89839 Summary: section not reset to text for top level asm Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- The ELF section from the previous function doesn't get reset before top level asm statements: e.g. __attribute__((section("foo"))) void func(void) { } asm("foo:\n"); gcc -S gives .section foo,"ax",@progbits <- sets the section .globl func .type func, @function func: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 nop popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size func, .-func <--- no section reset before the asm #APP foo: The problem is if foo is some section with special behavior (for example initcall sections in the Linux kernel) this can cause crashes. I've seen such problems with LTO on the Linux kernel. gcc should always reset the section to .text before emitting top level asm. See with 8.x, but also trunk.
[Bug testsuite/86404] UNRESOLVED/UNSUPPORTED gcov test results due to Permission error mapping pages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404 --- Comment #4 from Andi Kleen --- Does something like this help? (untested, cut-n-pasted, possibly with other values) iff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile index 5da5c63cd845..8744b9f091df 100755 --- a/gcc/config/i386/gcc-auto-profile +++ b/gcc/config/i386/gcc-auto-profile @@ -67,4 +67,4 @@ model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;; echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." exit 1 ;; esac -exec perf record -e $E -b "$@" +exec perf record -m 256K -e $E -b "$@"
[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622 --- Comment #4 from Andi Kleen --- Ok that means that this code you pasted in ix86_option_override_internal somehow doesn't get executed correctly for LTO switching between different options. Adding Honza.
[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622 Andi Kleen changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #3 from Andi Kleen --- Ok that means that this code you pasted in ix86_option_override_internal somehow doesn't get executed correctly for LTO switching between different options. Adding Honza.
[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622 Andi Kleen changed: What|Removed |Added Target||x86_64-linux CC||jakub at gcc dot gnu.org --- Comment #1 from Andi Kleen --- Don't have a small reproducer, but on a large LTO build where a few files are built with -mpreferred-stack-boundary=4 with the others using the default I hit the following ICE in ix86_minimum_alignment 29610 /* Don't do dynamic stack realignment for long long objects with 29611 -mpreferred-stack-boundary=2. */ 29612 if ((mode == DImode || (type && TYPE_MODE (type) == DImode)) 29613 && (!type || !TYPE_USER_ALIGN (type)) 29614 && (!decl || !DECL_USER_ALIGN (decl))) 29615 { 29616 gcc_checking_assert (!TARGET_STV); 29617 return 32; 29618 } I suspect the right fix is to just remove the assert? Adding Jakub who added it originally in: commit 1f1475a7e758328a59db17aef5d1ccd81232ea95 Author: jakub Date: Thu Feb 4 09:02:01 2016 + PR target/69454 * config/i386/i386.c (convert_scalars_to_vector): Remove stack alignment fixes. (ix86_option_override_internal): Disable TARGET_STV if stack might not be aligned enough. (ix86_minimum_alignment): Assert that TARGET_STV is false. * gcc.target/i386/pr69454-1.c: New test. * gcc.target/i386/pr69454-2.c: New test.
[Bug target/88622] New: ICE when changing -mpreferred-stack-boundary for different files with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622 Bug ID: 88622 Summary: ICE when changing -mpreferred-stack-boundary for different files with LTO Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: ---
[Bug c/88583] New: -Wpacked-not-aligned shouldn't be in -Wall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88583 Bug ID: 88583 Summary: -Wpacked-not-aligned shouldn't be in -Wall Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- gcc 9 added -Wpacked-not-aligned to Wall. In Linux kernel builds this warning is very noisy. There's a Linux kernel patch now to disable it. But I suspect other software using packed will be affected too. It's especially pointless on x86 where unaligned only matters in some special cases (with vectors) When the programmer specified packed they should know what they are doing.
[Bug middle-end/88573] 9 regression: error: type mismatch in component reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573 --- Comment #1 from Andi Kleen --- Created attachment 45281 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45281=edit test case (unminimized) gcc-9 -O2 -S -flto arch/x86/events/intel/pt.i /home/ak/lsrc/linux/arch/x86/events/intel/pt.c: In function 'pt_buffer_reset_offsets': /home/ak/lsrc/linux/arch/x86/events/intel/pt.c:1539:1: error: type mismatch in component reference 1539 | arch_initcall(pt_init); | ^~ struct topa_entry *[0:] struct topa_entry *[0:] _13 = buf->topa_index[pg]; /home/ak/lsrc/linux/arch/x86/events/intel/pt.c:1539:1: error: type mismatch in component reference struct topa_entry *[0:] struct topa_entry *[0:] _17 = buf->topa_index[pg]; during IPA pass: *free_lang_data
[Bug middle-end/88573] New: 9 regression: error: type mismatch in component reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573 Bug ID: 88573 Summary: 9 regression: error: type mismatch in component reference Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Don't have a small test case currently, happens during a large LTO Linux kernel build. With gcc trunk (20181222) with checking enabled I get /home/ak/lsrc/linux/kernel/events/callchain.c: In function 'get_callchain_entry': /home/ak/lsrc/linux/kernel/events/callchain.c:260:1: error: type mismatch in component reference 260 | } | ^ struct perf_callchain_entry *[0:] struct perf_callchain_entry *[0:] _3 = entries->cpu_entries[cpu]; during IPA pass: *free_lang_data /home/ak/lsrc/linux/kernel/events/callchain.c:260:1: internal compiler error: verify_gimple failed gcc 8 doesn't show this problem.
[Bug sanitizer/88277] ASAN stack poisoning is using unaligned stores on e.g. x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88277 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #2 from Andi Kleen --- FWIW modern x86 CPUs are fairly good at unaligned accesses, so it might not be worth it for performance.
[Bug ipa/88231] aligned functions laid down inefficiently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88231 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #4 from Andi Kleen --- I'm not sure it's a good idea to do this. Often the goal is not to get the absolute smallest code, but to get code that minimizes cache line usage. This is important for "frontend bound" code like gcc itself often is. It would be rather better to use an algorithm like Petis-Hansen or the one in hfsort (see https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf) to lay out the code based on expected call order to minimize foot print. For best result would need profile feedback of course, but it might already do a reasonable job based on static call frequencies.
[Bug target/88096] wrong inline AVX512F optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88096 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #1 from Andi Kleen --- Can you please attach a pre-processed test case of a file that shows the bug? It's ok if it doesn't run, as long as the problem is clearly identified in the assembler. Then the test case could be likely minimized.
[Bug target/88195] New: misleading error message for unsupported builtin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88195 Bug ID: 88195 Summary: misleading error message for unsupported builtin Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Target: x86_64 On x86, when using a builtin that is not supported by the target configuration, e.g. gcc -c -m32 -ptwrite t.c.c with t.c being void f(void) { __builtin_ia32_ptwrite64 (1); } I get t.c:4:2: error: '__builtin_ia32_ptwrite64' needs isa option -mx32 -mptwrite While technically correct, -mx32 would enable the 64bit builtin, I suspect for near all users they would like to use -m64, or better not specifying -m32. So it should mention that it is incompatible with -m32.
[Bug tree-optimization/42587] bswap not recognized for memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42587 --- Comment #11 from Andi Kleen --- Only when the first test case is fixed too
[Bug c/61727] #pragma simd is undocumented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61727 --- Comment #4 from Andi Kleen --- This was originally about the #pragma simd in CIlk+, which has been removed. But it lives on in #pragma omp simd
[Bug lto/83375] partitioner partitions static arrays with label references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375 --- Comment #6 from Andi Kleen --- This breaks Linux kernel LTO builds. I currently have a workaround (disabling LTO for that file), but I don't think your "is not common" argument is valid.
[Bug other/50639] -flto=jobserver broken on large LTO build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639 --- Comment #4 from Andi Kleen --- I doubt it's fixed. It's a race so can be unstable. Especially since judging from the growing cc list other people keep seeing it It may not be something that gcc can fix, if anything it's more likely in make or in Linux.
[Bug other/50639] -flto=jobserver broken on large LTO build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639 --- Comment #2 from Andi Kleen --- FWIW the problem disappeared for me at some point (could have been newer kernel or different make). I don't see it anymore. I think it was some problems with the pipes used by the job server losing a token
[Bug c/83397] void f() { } has zero arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83397 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #4 from Andi Kleen --- If you want to skip the rax setup you can use -mskip-rax-setup But in general it's dangerous because old gcc compiled code can jump to random locations if a real varargs function gets called with undefined rax
[Bug ipa/83346] inliner crash with attribute always_inline/flatten on a destructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346 --- Comment #3 from Andi Kleen --- Fixed by https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00764.html
[Bug lto/83388] New: reference statement index not found error with -fsanitize=null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83388 Bug ID: 83388 Summary: reference statement index not found error with -fsanitize=null Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42844 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42844=edit test case With the attached test case gcc8 -m32 -O2 -flto -fsanitize=null -c core.i gcc8 -r -nostdlib core.o gives In function 'i': lto1: fatal error: Reference statement index not found compilation terminated. Happens with gcc 7 and trunk
[Bug lto/83376] ICE in LTO streamer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Andi Kleen --- Looks like it was a case of incompatible LTO object file from a different gcc build. With a clean build it doesn't happen anymore.
[Bug lto/83380] New: disk full while writing LTO files leads to ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83380 Bug ID: 83380 Summary: disk full while writing LTO files leads to ICE Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- lto1: fatal error: error writing to vmlinux.ltrans15.s: No space left on device gcc: internal compiler error: Aborted signal terminated program lto1 Please submit a full bug report, with preprocessed source if appropriate. See <https://gcc.gnu.org/bugs/> for instructions. Should just exit in this case
[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355 --- Comment #3 from Andi Kleen --- patch checked in
[Bug lto/83376] New: ICE in LTO streamer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376 Bug ID: 83376 Summary: ICE in LTO streamer Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Don't have a small test case right now, but will bisect When building Linux kernel LTO with gcc 8 I currently get an ICE. Doesn't happen on 7 and I think it's also recent on 8. In this case data_in->current_decl_data is NULL while reading a reference. 0xa58fe7 crash_signal ../../gcc/gcc/toplev.c:325 0x957a39 lto_file_decl_data_get_var_decl ../../gcc/gcc/lto-streamer.h:1210 0x957a39 lto_input_tree_ref(lto_input_block*, data_in*, function*, LTO_tags) ../../gcc/gcc/lto-streamer-in.c:366 0x957c1d lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) ../../gcc/gcc/lto-streamer-in.c:1475 0x6bdc8c lto_read_decls ../../gcc/gcc/lto/lto.c:1791 0x6bdc8c lto_file_finalize ../../gcc/gcc/lto/lto.c:2055 0x6bdc8c lto_create_files_from_ids ../../gcc/gcc/lto/lto.c:2065 0x6bdc8c lto_file_read ../../gcc/gcc/lto/lto.c:2106 0x6bdc8c read_cgraph_and_symbols ../../gcc/gcc/lto/lto.c:2818 0x6bfdb1 lto_main() ../../gcc/gcc/lto/lto.c:3323
[Bug lto/83375] partitioner partitions static arrays with label references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375 --- Comment #1 from Andi Kleen --- Actually -flto-partition=max
[Bug lto/83375] New: partitioner partitions static arrays with label references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375 Bug ID: 83375 Summary: partitioner partitions static arrays with label references Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42842 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42842=edit test case I thought there was already a bug for this, but can't find it right now. When & are put into static arrays the LTO partitioner can put the static into a different partition, which causes an assembler error because the code labels are local. This breaks Linux kernel LTO builds. See attached test case. I think ipa-comdats should put the function and the static into the same partition, but for some reason it doesn't work. Attached test case shows the problem with -flto-partition=1to1 -flto -O2
[Bug gcov-profile/83355] New: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355 Bug ID: 83355 Summary: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Running in gdb shows that there is a very deep recursion in get_index_by_decl until it overflows the stack. This patch seems to fix it (but not sure why the abstract origin would point to itself) diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 5134a795331..403709bad6b 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -477,7 +477,7 @@ string_table::get_index_by_decl (tree decl) const ret = get_index (lang_hooks.dwarf_name (decl, 0)); if (ret != -1) return ret; - if (DECL_ABSTRACT_ORIGIN (decl)) + if (DECL_ABSTRACT_ORIGIN (decl) && DECL_ABSTRACT_ORIGIN (decl) != decl) return get_index_by_decl (DECL_ABSTRACT_ORIGIN (decl)); return -1; Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485 1485{ (gdb) up #1 0x016c4c90 in pp_append_text(pretty_printer*, char const*, char const*) () at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556 1556 pp_emit_prefix (pp); (gdb) bt #0 0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485 #1 0x016c4c90 in pp_append_text(pretty_printer*, char const*, char const*) () at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556 #2 0x00b12c83 in pp_c_identifier (pp=0x229b1a0 , id=) at /home/andi/gcc/git/gcc/gcc/c-family/c-pretty-print.c:1203 #3 0x00992b46 in dump_decl (flags=0, t=0x76d2ce40, pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/tree.h:3226 #4 dump_function_name(cxx_pretty_printer*, tree_node*, int) () at /home/andi/gcc/git/gcc/gcc/cp/error.c:1852 #5 0x009940a4 in lang_decl_name(tree_node*, int, bool) () at /home/andi/gcc/git/gcc/gcc/cp/error.c:3005 #6 0x00994133 in lang_decl_dwarf_name (decl=, v=, translate=) at /home/andi/gcc/git/gcc/gcc/cp/error.c:2977 #7 0x0156762a in autofdo::string_table::get_index_by_decl(tree_node*) const () at /home/andi/gcc/git/gcc/gcc/auto-profile.c:477
[Bug ipa/83346] inliner crash with always inline and templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346 --- Comment #1 from Andi Kleen --- This fixes it. Don't know why that node has no decl. Will submit after a test cycle. diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 7846e93d119..dcd8a3de1ac 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2391,7 +2391,8 @@ ipa_inline (void) entry of cycles, possibly cloning that entry point and try to flatten itself turning it into a self-recursive function. */ - if (lookup_attribute ("flatten", + if (node->decl +&& lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) != NULL) { if (dump_file)
[Bug ipa/83346] New: inliner crash with always inline and templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346 Bug ID: 83346 Summary: inliner crash with always inline and templates Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42820 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42820=edit test case Attached test case segfaults with -O2 on gcc 7 and 8 trunk g++ -O2 -S ch-crash.i ch-crash.i:30:1: internal compiler error: Segmentation fault } ^ 0xc030f7 crash_signal ../../gcc/gcc/toplev.c:325 0x125b189 ipa_inline ../../gcc/gcc/ipa-inline.c:2388 0x125b189 execute ../../gcc/gcc/ipa-inline.c:2807
[Bug target/83052] [8 Regression] ICE in extract_insn, at recog.c:2305 starting from r254560
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83052 --- Comment #1 from Andi Kleen --- I'm not sure why you call it a regression? You must be running the test suite manually with the new option. I haven't tested, but likely it will fail if you run that test with -mcmodel=large too. The -mforce-indirect-call patch is really only a subset of -mcmodel=large. Then it would be more a latent bug.
[Bug tree-optimization/82854] New: more missing simplifcations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854 Bug ID: 82854 Summary: more missing simplifcations Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- These all come from a paper "Optgen: A Generator for Local Optimizations" (Buchwald et.al.). https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald15cc.pdf These were found by a SAT solver. I wrote them in partial pseudo match.pd syntax (untested, likely buggy) I'm not sure how useful they are really for real programs, but with the auto generated matchers scaling well to more rules they wouldn't hurt I suppose. /* x + (x & 0x8000) -> x & 0x7fff */ (simplify (plus:c @0 (bit_and @0 integer_msb_onlyp@1)) (bit_and @0 { @1 - 1; } )) /* (x | 0x8000) + 0x8000 -> x & 0x7FFF */ (simplify (plus:c (bit_ior @0 integer_msb_onlyp) msb_setp) (bit_and @0 { msb_minus_one_val(type); } )) /* x & (x + 0x8000) -> x & 0x7FFF */ (simplify (bit_and:c (plus @0 msb_setp) @0) (bit_and @0 { msb_minus_one_val(type); } )) /* x & (0x7FFF - x) -> x & 0x8000 */ (simplify (bit_and:c @0 (minus msb_minus_onep @0)) (bit_and @0 { msb_val(type); } )) /* is_power_of_2(c1) && c0 & (2 * c1 - 1) == c1 - 1 -> (c0 - x) & c1 -> x & c1 */ /* x | (x + 0x8000) -> x | 0x8000 */ (simplify (bit_ior:c @0 (plus @0 msb_onlyp)) (bit_ior @0 { msb_val(type); } )) /* x | (0x7FFF - x) -> x | 0x7FFF */ (simplify (bit_ior:c @0 (minus 0x7FFF @0)) (bit_ior @0 0x7FFF)) /* x | (x ^ y) -> x | y */ (simplify (bit_ior:c @0 (bit_xor:c @0 @1)) (bit_ior @0 @1)) /* ((c0 | -c0) & ∼c1) == 0 AND (x + c0) | c1 -> x | c1 */ /* is_power_of_2(∼c1) && c0 & (2 * ∼c1 - 1) == ∼c1 - 1 AND (c0 - x) | c1 -> x | c1 */ /* -x | 0xFFFE -> x | 0xFFFE */ (simplify (bit_or (negate @0) 0xFFFE) (bit_or @0 0xFFFE)) /* 0 - (x & 0x8000) -> x & 0x8000 */ (simplify (minus 0 (bit_and:c @0 0x8000)) (bit_and @0 0x8000)) /* 0x7FFF - (x & 0x8000) -> x | 0x7FFF */ (simplify (minus 0x7FFF (bit_and @0 0x8000)) (bit_ior @0 0x7FFF)) /* 0x7FFF - (x | 0x7FFF) -> x & 0x8000 */ (simplify (minus 0x7FFF (bit_ior:c @0 0x7FFF)) (bit_and @0 0x8000)) /* 0xFFFE - (x | 0x7FFF) -> x | 0x7FFF */ (simplify (minus 0xFFFE (bit_ior:c @0 0x7FFF)) (bit_ior @0 0x7FFF)) /* (x & 0x7FFF) - x -> x & 0x8000 */ (simplify (minus (bit_and:c @0 0x7FFF) @0) (bit_and @0 0x8000)) /* x ^ (x + 0x8000) -> 0x8000 */ (simplify (bit_xor:c (plus:c @0 0x8000)) 0x8000) /* x ^ (0x7FFF - x) -> 0x7FFF */ (simplify (bit_xor:c @0 (minus 0x7FFF @0)) 0x7FFF) /* (x + 0x7FFF) ^ 0x7FFF -> -x */ (simplify (bit_xor:c (plus:c @0 0x7FFF) 0x7FFF) (negate @0)) /* -x ^ 0x8000 -> 0x8000 - x */ (simplify (bit_xor:c (negate @0) 0x8000) (minus 0x8000 @0)) /* (0x7FFF - x) ^ 0x7FFF -> x */ (simplify (bit_xor:c (minus 0x7FFF @0) 0x7FFF) @0) /* ~(x + c) -> ~c - x */ (simplify (bit_not (plus:c @0 CONSTANT_CLASS_P@1)) (minus (bit_not c) @0)) /* -x ^ 0x7FFF -> x + 0x7FFF */ (simplify (bit_xor (negate @0) 0x7FFF) (plus @0 0x7FFF)) /* (x | c) - c -> x & ∼c */ (simplify (minus (bit_ior @0 CONSTANT_CLASS_P@1) @1) (bit_and @0 (bit_not @1))) /* ~(c - x) -> x + ∼c */ (simplify (bit_not (minus CONSTANT_CLASS_P@0 @1)) (plus @1 (bit_not @0))) /* -c0 == c1 AND (x | c0) + c1 -> x & ∼c1 */ (simplify (plus (bit_or @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2) (if (...) (bit_and @0 (bit_not @2)) /* (c0 & ∼c1) == 0 AND (x ^ c0) | c1 -> x | c1 */ /* 0x7FFF - (x ^ c) -> x ^ (0x7FFF - c) */
[Bug tree-optimization/82854] more missing simplifcations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854 --- Comment #1 from Andi Kleen --- Also I suppose a lot of them could be generalized to 8/16/64bit.
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #8 from Andi Kleen --- I'm not sure if it works with other numbers too. (need to dig through Hacker's delight & Matters Computational to see if they have anything on it) But it could be extended for other word lengths at least BTW there are some other cases, will file a bug shortly on those too.
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #5 from Andi Kleen --- Also I'm not sure why you would want it in the middle end. It should all work at the tree level
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #4 from Andi Kleen --- Right it's about special casing the complete expression
[Bug tree-optimization/82853] New: Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 Bug ID: 82853 Summary: Optimize x % 3 == 0 without modulo Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Ralph Levien pointed out as part of FizzBuzz optimization: Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xb) < 0x5556 (assuming wrapping unsigned 32 bit arithmetic). gcc currently generates the full modulo and then checks. Could be done in match.pd I suppose. Test case unsigned mod3(unsigned a) { return 0==(a%3); }
[Bug other/82784] Remove semicolon after "do {} while (0)" macros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82784 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #5 from Andi Kleen --- Sounds like a good candidate for a new warning
[Bug c/82013] New: better error message for missing semicolon in prototype
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82013 Bug ID: 82013 Summary: better error message for missing semicolon in prototype Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- gcc gives quite poor error messages when forgetting a semicolon after a prototype (common mistake when cut'n'pasting a function definition into a header) It's especially confusing when the prototype is the last in the include file, because then the errors appear in another file. As a minimum it should warn about a missing semicolon at the end of a file. Possibly this could be also used for fix-it, but that's likely more complicated.
[Bug target/80742] New: attribute target no- does not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80742 Bug ID: 80742 Summary: attribute target no- does not work Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Disabling ISAs with attribute target doesn't seem to work on x86_64 e.g. typedef float __m128 __attribute__ ((vector_size (16))); __attribute__((target("no-sse2"))) __m128 func (__m128 x, __m128 y) { __m128 xmm0 = x, xmm1 = y, xmm2; xmm0 = __builtin_ia32_xorps (xmm1, xmm1); return xmm0; } does not error out.
[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067 --- Comment #3 from Andi Kleen --- sandra, does this patch fix it? diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c index 6214e3629f2..924a270e1bd 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c @@ -2,6 +2,7 @@ gets a label. */ /* { dg-require-effective-target freorder } */ /* { dg-options "-O2 -freorder-blocks-and-partition -save-temps" } */ +/* { dg-require-profiling "-fprofile-generate" } */ #define SIZE 1
[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067 --- Comment #2 from Andi Kleen --- There's a separate fix for the random failures (or w/a increase /proc/sys/kernel/perf_event_mlock_kb), see PR 77684 Not running the test on systems without FDO seems best. I don't think it does anything useful there anyways.
[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #5 from Andi Kleen --- Created attachment 41337 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41337=edit limit perf buffer size This patch allows parallelism upto 16 with the default setting. Currently testing
[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #4 from Andi Kleen --- Thanks for tracing that down. So perf runs out of memory for the locked trace buffers Increasing the limit is a good workaround ulimit -l may also work, but also needs root. We could just pass a smaller -m value to perf Does it work when you change the last line in config/i386/gcc-auto-profile to add -m 128k (or possibly other values, have to be power of two)
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #8 from Andi Kleen --- __builtin_constant_p does not cover variable range information, which is what we're looking for here to prevent security bugs. Also in my experience these explicit expressions tend to be somewhat fragile and is not well specified. It has to assume that the optimizer does specific operations which are nowhere guaranteed. An explicit builtin could be much tighter defined.
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #6 from Andi Kleen --- In the kernel there is also an upper limit on allocations. Perhaps just a generic assert builtin that: - uses value range information - uses constant propagation - is a nop when the compiler doesn't have either of this available - otherwise warns at build time __builtin_compile_assert(size >= 0 && size < MAX_ALLOC_SIZE);
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #4 from Andi Kleen --- I tested it now and the inline trick doesn't work. Here's a test case extern void *do_alloc(int a, int b); static inline __attribute__((alloc_size(1))) void check_alloc_size(int size) { } static inline void *alloc(int a, int b) { check_alloc_size(a + b); return do_alloc(a, b); } void func(void) { alloc(-1, 0); }
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #3 from Andi Kleen --- Hmm, that trick may work for the shift too. Let me try.
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #1 from Andi Kleen --- Small correction: argument 4 would need to be a constant for shifted by.
[Bug lto/80379] New: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379 Bug ID: 80379 Summary: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I get an extra note: code may be misoptimized unless -fno-strict-aliasing is used note for type mismatches in LTO builds. But -fno-strict-aliasing is already set. In this case the extra note is pointless and should be suppressed.
[Bug c/80378] New: Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 Bug ID: 80378 Summary: Extend alloc_size attribute for better Linux kernel checking Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I've been adding alloc_size attributes to the Linux kernel allocators. However there are some allocator patterns that can currently not be correctly described. It would be nice if the attribute could be extended with more parameters to handle this. One is void *alloc(int size_a, int size_b) where the allocation size is size_a + size_b The other is void *alloc_order(int order) where the allocation size is constant << order This could be handled by two extra parameters to alloc_size, one to give a sum argument and another to to give a shifted by argument. The arguments 2,3 would also need to support a "ignore" parameter (e.g. -1)
[Bug lto/60016] gcc-nm does not report static symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60016 --- Comment #2 from Andi Kleen --- This is needed for example to generate backtraces, if the symbol table should be built in instead of read from the binary. The Linux kernel cannot read its own binary, so the symbol table has to built in.
[Bug gcov-profile/71672] New: inlining indirect calls does not work with autofdo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71672 Bug ID: 71672 Summary: inlining indirect calls does not work with autofdo Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- The current mainline version of autofdo doesn't inline indirect calls based on profiling data. I instrumented a bootstrap and it never triggers. gcc.dg/tree-prof/indir-call-prof.c also fails (needs the patch kit in https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01786.html applied first). I did some debugging and it seems to give up in update_inlined_ind_target() here 772 /* Program behavior changed, original promoted (and inlined) target is not 773 hot any more. Will avoid promote the original target. 774 775 To check if original promoted target is still hot, we check the total 776 count of the unpromoted targets (stored in old_info). If it is no less 777 than half of the callsite count (stored in INFO), the original promoted 778 target is considered not hot any more. */ 779 if (total >= info->count / 2) but even with the test commented out it doesn't work.
[Bug target/71659] New: _xgetbv intrinsic missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659 Bug ID: 71659 Summary: _xgetbv intrinsic missing Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- icc and microsoft have a _xgetbv intrinsic for the XGETBV instruction, which is needed to check if AVX or MPX are supported by the kernel. gcc is missing an intrinsic for that, so everyone has to write inline assembler. Should add one.
[Bug c/70618] New: better error messages for missing/too many arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70618 Bug ID: 70618 Summary: better error messages for missing/too many arguments Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- When doing API refactorings it is reasonable common to have too many or not enough arguments in function calls. The existing errors in gcc/g++ are not very good for that, i get at least two consecutive ones and they are not very clear. Since that is common it would be much better if the compiler could compute the minimum edit distance to the real prototype (or the nearest for C++) and then directl ysuggest what arguments are missing or which are too many. void foo(int *xp, float *yp, double *zp) { } int x; float y; double z; short k; void f2(void) { foo(, );/* forgot x */ foo(, );/* forgot y */ foo(, );/* forgot z */ foo();/* forgot y and z */ foo();/* forgot x and y*/ foo(, , , );/* x too many at end */ foo(, , , );/* x too man at start */ foo(, , , );/* y too much in the middle */ foo(, , , );/* different y in middle */ foo(, , , );/* different x at start */ foo(, , , );/* different x at end */ } gcc/tsrc/tmissing.c: In function ‘f2’: gcc/tsrc/tmissing.c:14:6: warning: passing argument 1 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘int *’ but argument is of type ‘float *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:14:10: warning: passing argument 2 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type ‘double *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:14:2: error: too few arguments to function ‘foo’ foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: declared here void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:15:10: warning: passing argument 2 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot y */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type ‘double *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:15:2: error: too few arguments to function ‘foo’ foo(, ); /* forgot y */
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #3 from Andi Kleen --- Analyzing the code more it looks like the compiler generates it correctly, the edge returned should not be 0 here.
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #2 from Andi Kleen --- Created attachment 38110 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38110=edit somewhat reduced input file, only single function
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #1 from Andi Kleen --- Created attachment 38109 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38109=edit ipa-profile input Here's the source of the miscompiled file from the compiler cc1plus -O2 ipa-profile.i -S unfortunately have to inspect assembler to see the miscompilation: look for ipa_generate_profile_summary then look for get_edge call_ZN11cgraph_node8get_edgeEP6gimple testq %rax, %rax movq%rax, %r15 je .L836< jump if rax/r15 is 0 testb $2, 96(%rax) je .L837 .L836: <--- it can be here movq16(%r12), %rax movq64(%r15), %rsi <-- BAD same miscompilation here (just with another register). r15 is referenced after being tested for NULL.
[Bug tree-optimization/70427] New: autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 Bug ID: 70427 Summary: autofdo bootstrap generates wrong code Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I've been working on building gcc with an autofdo bootstrap. Currently I always run into an crash while rebuilding tree.c with the stage2 compiler and the autofdo information Looking at the code it is clearly miscompiled in ipa_profile_generate_summary: struct cgraph_edge * e = node->get_edge (stmt); if (e && !e->indirect_unknown_callee) continue; 0x0093bb16 <+326>: callq 0x7be530 <_ZN11cgraph_node8get_edgeEP6gimple> 0x0093bb1b <+331>: test %rax,%rax # check for NUULL 0x0093bb1e <+334>: mov%rax,%r8 0x0093bb21 <+337>: je 0x93bb2d <_ZL28ipa_profile_generate_summaryv+349> 0x0093bb23 <+339>: testb $0x2,0x60(%rax) 0x0093bb27 <+343>: je 0x93baa7 <_ZL28ipa_profile_generate_summaryv+215> 0x0093bb2d <+349>: mov0x10(%r13),%rax # go here because of NULL => 0x0093bb31 <+353>: mov0x40(%r8),%rsi # but we still reference! (gdb) p $r8 $4 = 0 The crash is on bb31 because r8 is NULL. The code checked the return value of the call, but then references it afterwards before doing the continue. Command line option: cc1plus -fauto-profile=cc1plus.fda -g -O2 tree.i cc1plus.fda is at http://halobates.de/cc1plus.fda (too big to attach)
[Bug c/28901] -Wunused-variable ignores unused const initialised variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901 --- Comment #17 from Andi Kleen --- There were a few false or useless ones (e.g. related to macros and specific build configs). I didn't look through them all, but various were semi legitimate, but also very minor (small) so fixing it won't help much. I think one or two of the ones I looked at may have been real bugs. I still think the warning should not be in -Wall. thousand+ warnings in real projects is just not acceptable.
[Bug c/28901] -Wunused-variable ignores unused const initialised variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #14 from Andi Kleen --- I'm building a current Linux kernel with allyesconfig, and this new warning causes 1383(!) new warnings in the build. I think this should be revisited and the warning be turned off again.
[Bug target/68602] New: i386: -mtune/arch options not all output by -v --help
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68602 Bug ID: 68602 Summary: i386: -mtune/arch options not all output by -v --help Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- gcc -v --help does not output all the possible options for -mtune=/-march= For example corei7-avx is missing for arch, which is Sandy Bridge. tune is also mising all cpu names -march=CPU[,+EXTENSION...] generate code for CPU and EXTENSION, CPU is one of: generic32, generic64, i386, i486, i586, i686, pentium, pentiumpro, pentiumii, pentiumiii, pentium4, prescott, nocona, core, core2, corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8, amdfam10, bdver1, bdver2, bdver3, bdver4, btver1, btver2 EXTENSION is combination of: 8087, 287, 387, no87, mmx, nommx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, sse4, nosse, avx, avx2, avx512f, avx512cd, avx512er, avx512pf, avx512dq, avx512bw, avx512vl, noavx, vmx, vmfunc, smx, xsave, xsaveopt, xsavec, xsaves, aes, pclmul, fsgsbase, rdrnd, f16c, bmi2, fma, fma4, xop, lwp, movbe, cx16, ept, lzcnt, hle, rtm, invpcid, clflush, nop, syscall, rdtscp, 3dnow, 3dnowa, padlock, svme, sse4a, abm, bmi, tbm, adx, rdseed, prfchw, smap, mpx, sha, clflushopt, prefetchwt1, se1, clwb, pcommit, avx512ifma, avx512vbmi -mtune=CPU optimize for CPU, CPU is one of: generic32, generic64, i8086, i186, i286, i386, i486, i586, i686, pentium, pentiumpro, pentiumii, pentiumiii, pentium4, prescott, nocona, core, core2, corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8, amdfam10, bdver1, bdver2, bdver3, bdver4, btver1, btver2
[Bug lto/66229] LTO fails with -fauto-profile on mcf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229 --- Comment #2 from Andi Kleen --- Some analysis of the problem: At the time cc1 is streaming out profile_data it is not set to anything in autofdo. So the LTO files contain all 0 profile data, which later causes the ICE here. Seems to be some kind of ordering problem. Strangely the autofdo pass gets executed in the frontend run, but for unknown reasons the profile data doesn't survive until the LTO data is written.
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 Andi Kleen changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #10 from Andi Kleen --- Turned out to be a binutils issue with an old binutils
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 --- Comment #9 from Andi Kleen --- Created attachment 36391 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36391=edit workaround This workaround fixes it. Disable -gc-section for libstdc++. It seems like a linker bug. I opened a binutils bug report https://sourceware.org/bugzilla/show_bug.cgi?id=19008
[Bug lto/50676] Partitioning may fail with presence of static variables referring to function labels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50676 --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org --- The patch doesn't seem to be checked in yet. Is there a reason for that?
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 36008 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36008action=edit Updated patch with documentation and param I updated the patch with proper documentation and a param for the cut off. In some tests it appears to do the right thing when building a Linux kernel.
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- I suspect the patch may be too simple because it could get stuck in unlikely, but high frequency edges in the cold area. Perhaps need to adapt more of the code of the non partitioning reordering
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 35993 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35993action=edit Potential patch This patch fixes the problem for my simple test case. It adds a fall back path to the partition check: if no profile information is available only edges are checked and everything that has only 20% frequency or less incoming edges is considered cold. 20% is fairly arbitrary, likely needs tuning and should be a param. But seems to work for the test case. Comments?
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- The problem seems to be that bb-reorder.c:find_rarely_executed_basic_blocks_and_crossing_edges returns no edges without profile feedback, which prevents generation of a section split note.
[Bug rtl-optimization/66890] New: function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 Bug ID: 66890 Summary: function splitting only works with profile feedback Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Consider this simple example: volatile int count; int main() { int i; for (i = 0; i 10; i++) { if (i == 999) count *= 2; count++; } } The default EQ is unlikely heuristic in predict.* predicts that the if (i == 999) is unlikely. So the tracer moves the count *= 2 basic block out of line to preserve instruction cache. gcc50 -O2 -S thotcold.c movl$1, %edx jmp .L2 .p2align 4,,10 .p2align 3 .L4: addl$1, %edx .L2: cmpl$1000, %edx movlcount(%rip), %eax je .L6 addl$1, %eax cmpl$10, %edx movl%eax, count(%rip) jne .L4 xorl%eax, %eax ret # out of line code .L6: addl%eax, %eax movl%eax, count(%rip) movlcount(%rip), %eax addl$1, %eax movl%eax, count(%rip) jmp .L4 Now if we enable -freorder-blocks-and-partition I would expect it to be also put into .text.unlikely to given even better cache layout. But that's what is not happening. It generates the same code. Only when I use actual profile feedback and -freorder-blocks-and-partition the code actually ends up being in a separate section (it also unrolled the loop, so the code looks a bit different) gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c ./a.out gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c ... .cfi_endproc .section.text.unlikely .cfi_startproc .L55: movlcount(%rip), %ecx addl$1, %eax addl$1, %ecx cmpl$10, %eax movl%ecx, count(%rip) je .L6 cmpl$1, %edx je .L5 cmpl$2, %edx je .L28 cmpl$3, %edx -freorder-blocks-and-partition should already use the extra section even without profile feedback. I tested some larger programs and without profile feedback the unlikely section is always empty. The heuristics in predict.* often work quite well and a lot of code would benefit from moving cold code out of the way of the caches. This would allow to use the option to improve frontend bound codes without needing to do full profile feedback.
[Bug lto/61635] LTO partitioner does not handle label in statics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61635 --- Comment #7 from Andi Kleen andi-gcc at firstfloor dot org --- Still happens with current trunk and with newer LTO Linux kernels (4.0-rc*)
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 --- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org --- I still get that one with current trunk on my fedora 21 system.
[Bug c/65620] New: Incorrect warning for !! with -Wlogical-not-parentheses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65620 Bug ID: 65620 Summary: Incorrect warning for !! with -Wlogical-not-parentheses Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Created attachment 35172 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35172action=edit test case When building the linux 4.0-rc5 kernel with 5.0 there are several imho bogus warnings like warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] for constructs like this: !!test_bit(...) != ... The warning shouldn't warn for !! which is reasonably common. Looking at the c/cp parsers there is already code to check for this, but it doesn't seem to work here. In the kernel case test_bit actually expands to a complex macro like if (usage-type == 0x01 !!(__builtin_constant_p((usage-code)) ? constant_test_bit((usage-code), (input-key)) : variable_test_bit((usage-key), (input-key))) I'm attaching an (already delta'ed but still quite big) test case C++ likely has the same problem (but not tested)
[Bug bootstrap/65621] New: boot strap with checking enabled ICEs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621 Bug ID: 65621 Summary: boot strap with checking enabled ICEs Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org target: x86_64-linux ../../../../gcc/libstdc++-v3/libsupc++/tinfo.cc:82:1: internal compiler error: in mark_functions_to_output, at cgraphunit.c:1307 } ^ 0xb25f0b mark_functions_to_output ../../gcc/gcc/cgraphunit.c:1302 0xb29137 symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2330 0xb29313 symbol_table::finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2444 0x884c9a cp_write_global_declarations() ../../gcc/gcc/cp/decl2.c:4755