[Bug c++/115091] New: Support value speculation in frontend

2024-05-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091

Bug ID: 115091
   Summary: Support value speculation in frontend
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

This blog post describes an interesting optimization technique for memory
access. https://mazzo.li/posts/value-speculation.html

A linked list walk is often be limited by the latency of the L1 cache. When the
program can guess the next address (e.g. because the nodes are often allocated
sequentially in memory) it is possible to use construct like

if (node->next == node + 1)
node++;
else
node = node->next;

and rely on the CPU speculating the fast case.

However this often runs into problems with the compiler, e.g. for

next = node->next;
node++;
if (node != next)
  node = next;

is often optimized away. While this can be worked around with some code
restructuring, this may not always work for more complex cases. I wonder if it
makes sense to formally support this technique with a "nocse" or similar
variable attribute that is honored by optimization passes.

[Bug gcov-profile/113765] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

--- Comment #3 from Andi Kleen  ---
-O1 fixes it, so an easy patch would be 

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 63d0c3dc36df..180ed7a8260f 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1758,7 +1758,7 @@ public:
   bool
   gate (function *) final override
   {
-return flag_auto_profile;
+return flag_auto_profile && optimize > 0;
   }
   unsigned int
   execute (function *) final override

[Bug gcov-profile/113765] autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

--- Comment #1 from Andi Kleen  ---
Seems to be a regression, I tested the same setup on gcc 13 and the test passes
there:

55:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, 
-fprofile-generate -D_PROFILE_GENERATE
59:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,   
-fprofile-generate -D_PROFILE_GENERATE
62:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation,  -fprofile-use
-D_PROFILE_USE
66:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-fprofile-use
-D_PROFILE_USE
76:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation,  -g
-DFOR_AUTOFDO_TESTING
108:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-g
-DFOR_AUTOFDO_TESTING
111:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, 
-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining
115:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,   
-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining

[Bug gcov-profile/113765] New: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

Bug ID: 113765
   Summary: autofdo: val-profiler-threads-1.c compilation,  error:
probability of edge from entry block not initialized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

With recent trunk (019dc63819be)

When running the test suite on a Intel system with autofdo installed

Executing on host: /home/ak/gcc/obj-full/gcc/xgcc -B/home/ak/gcc/obj-full/gcc/ 
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c   
-fdi
agnostics-plain-output   -O0 -pthread -fprofile-update=atomic
-fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda
-DFOR_AU
TOFDO_TESTING -fearly-inlining -dumpbase-ext .x02  -lm  -o
/home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02   
(timeout = 300)
spawn -ignore SIGHUP /home/ak/gcc/obj-full/gcc/xgcc
-B/home/ak/gcc/obj-full/gcc/
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c -fdiag
nostics-plain-output -O0 -pthread -fprofile-update=atomic
-fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda
-DFOR_AUTOFD
O_TESTING -fearly-inlining -dumpbase-ext .x02 -lm -o
/home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c: In
function 'copy_memory':
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge from entry block not initialized
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge 2->4 not initialized
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge 5->1 not initialized
during GIMPLE pass: fixup_cfg
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
internal compiler error: verify_flow_info failed
0xafb91e verify_flow_info()
../../gcc/gcc/cfghooks.cc:287
0xf0e8a7 execute_function_todo
../../gcc/gcc/passes.cc:2100
0xf0edde execute_todo
../../gcc/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1

I'm not attaching the source because it also needs the autofdo gcov file to
reproduce and the test case is already in tree.

[Bug lto/107779] Support implicit references from inline assembler to compiler symbols

2023-10-15 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

--- Comment #4 from Andi Kleen  ---
This whole manual annotation idea (which is equivalent to marking the symbols
global and visible and that is what a large part of the kernel LTO patchkit) is
dead on arrival because the kernel people already rejected it. Their argument
is that they don't need it for LLVM why should they be forced to it for GCC. In
LLVM it is just done by the assembler, and it works without any extra program
changes.

Since gcc is not the only game in town anymore they have a point.

It's either heuristics or integrating the assembler.

[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

--- Comment #5 from Andi Kleen  ---

config/i386/i386.h:#define SLOW_BYTE_ACCESS 0

You mean it doesn't define it?

[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

--- Comment #2 from Andi Kleen  ---
Okay then it doesn't understand that SHL_signed and SHR_unsigned can be
combined when one the values came from a shorter unsigned.

[Bug middle-end/111743] New: shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

Bug ID: 111743
   Summary: shifts in bit field accesses don't combine with other
shifts
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

(not sure it's the middle-end, picked arbitrarily)

The following code

struct bf { 
unsigned a : 10, b : 20, c : 10;
};
unsigned fbc(struct bf bf) { return bf.b | (bf.c << 20); }


generates:

movq%rdi, %rax
shrq$10, %rdi
shrq$32, %rax   
andl$1048575, %edi
andl$1023, %eax
sall$20, %eax
orl %edi, %eax
ret

It doesn't understand that the shift right can be combined with the shift left.
Also not sure why the shift left is arithmetic (this should be all unsigned) 

clang does the simplification which ends up one instruction shorter:
movl%edi, %eax
shrl$10, %eax
andl$1048575, %eax  # imm = 0xF
shrq$12, %rdi
andl$1072693248, %edi   # imm = 0x3FF0
orl %edi, %eax
retq

[Bug lto/107779] New: Support implicit references from inline assembler to compiler symbols

2022-11-20 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

Bug ID: 107779
   Summary: Support implicit references from inline assembler to
compiler symbols
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org,
mliska at suse dot cz
  Target Milestone: ---

Created attachment 53933
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53933=edit
prototype patch

So I looked into the problem the kernel people complained about: a
lot of assembler statements reference C symbols, which need externally_visible
and
global for gcc LTO, otherwise they can end up in the wrong asm file
and cause missing symbols.

I came up with the attached (hackish) patch that tries to solve the problem
very
partially: it parses the assembler strings and looks for anything that
could be an identifier, and then tries to mark it externally_visible.

It has the following open issues:

- The parsing is very approximate and doesn't handle some obscure cases.
With the approximation it's also impossible to give error messages,
but hopefully the linker takes care of that.
It also gives false positives with some assembler syntax,
but in the worst case would just lose some optimization from unnecessary
references.

- It doesn't handle the case (which happens in the kernel) that the C
declaration is after the asm statement. This could be fixed with some
more effort.

- It doesn't work for static which can get mangled (that's a lot of
the kernel cases)
static is a difficult problem because there could be conflicting names,
so we cannot jut put it all in partition zero.

This would need some special handling in the LTO partitioning code to
create new partitions just for having unique name spaces, and then
avoid mangling.  Related problem is also PR50676

It's likely possible to create situations where it's impossible to
solve, there could be circular dependencies etc. But I assume in this
case the non LTO case would fail too.

Or maybe do something with redefining symbols at the assembler level.

This one is somewhat difficult and I don't have a simple solution
currently. Unfortunately to solve the kernel issue would need a
solution for static.

[Bug lto/107014] flatten+lto fails the kernel build

2022-09-25 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #9 from Andi Kleen  ---
I suspect what happens is that it hits in some kernel initialization function.
If they don't use initcall the LTO build can all inline them into each other
(because they are only called once) creating a single big initialization
function. With flatten that will create an extremely large function that takes
a long time to process.

I suspect any use of flatten is better using always_inline, since that affects
only a single function. Should probably be fixed upstream in the kernel.

[Bug preprocessor/45227] libcpp Makefile does not enable instrumentation

2022-01-04 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45227

--- Comment #5 from Andi Kleen  ---
I think it was the method from the info file.

But I can't quite remember. If you cannot reproduce it I guess it's ok to
close. Maybe I made some mistake.

[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal

2021-05-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #12 from Andi Kleen  ---
It looks to me separate bugs are mixed together here.

For example I looked at the preallocate_pmd warning again and I don't think
there is any union there. Also I noticed that when I replace the *foo[N] with
**foo it disappears. So I think that is something different.

So there seem to be instances where such warnings happen without union members.
Perhaps that one (and perhaps some others) need to be reanalyzed.

I also looked at the intel_pm.c and I think that one is a real kernel bug,
where the field accessed is really too small. I'll submit a patch for that.

[Bug lto/99828] inlining failed in call to ‘always_inline’ ‘memcpy’: --param max-inline-insns-auto limit reached

2021-03-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99828

--- Comment #3 from Andi Kleen  ---
So what do you want to fix in the kernel? 

Use a wrapper for taking the address of the memcpy?
(I hope nothing in gcc would remove such a wrapper)

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #12 from Andi Kleen  ---
Okay. I only compared gcc-7 (working) vs gcc-9 (broken), but always with LTO.

Looking at the kernel link it also uses --whole-archive. Perhaps that makes a
difference?

I'll redo the test case with --whole-archive (will need some fixes)

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #9 from Andi Kleen  ---
I think the STB_SECONDARY stuff is only needed if ld -r is used, but not for ar

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #8 from Andi Kleen  ---
It works fine without LTO.

Otherwise the Linux kernel wouldn't work. It relies on this behavior for its
syscalls.

The test case is extracted from there.

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #5 from Andi Kleen  ---
It doesn't seem to be the plugin itself, I compiled trunk with the gcc-7
lto-plugin.c and it fails too.

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #4 from Andi Kleen  ---
Reproduced on trunk too

11.0-200626  e74c281bf4955eea7fdc5f21b43e29fa0235a5b0

[Bug bootstrap/95934] New: bootstrap fails in compiler assert in sanitizer_platform_limits_posix.cpp:1136

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95934

Bug ID: 95934
   Summary: bootstrap fails in compiler assert in
sanitizer_platform_limits_posix.cpp:1136
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

commit e74c281bf4955eea7fdc5f21b43e29fa0235a5b0 (HEAD -> trunk, origin/trunk,
origin/master, origin/HEAD)

make bootstrap fails with 


../../../../gcc/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:336:30:
note: in expansion of macro 'IMPL_COMPILER_ASSERT'
  336 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__)
  |  ^~~~
../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h:1442:3:
note: in expansion of macro 'COMPILER_CHECK'
 1442 |   COMPILER_CHECK(sizeof(((__sanitizer_##CLASS *)NULL)->MEMBER) == \
  |   ^~
../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:1136:1:
note: in expansion of macro 'CHECK_SIZE_AND_OFFSET'
 1136 | CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
  | ^


Works fine when I comment out the assert. There already is a ifdef checking for
lots of cases, seems it doesn't work on mine either. This is with a recent

glibc-2.31-5.9.x86_64

(opensuse glibc)


Perhaps the assert should just be disabled like this?
(patch is likely white space damaged)


diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp
b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp
index b4f8f67b664..bb6377b70cb 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp
@@ -1133,7 +1133,7 @@ CHECK_SIZE_AND_OFFSET(ipc_perm, cgid);
 /* On aarch64 glibc 2.20 and earlier provided incorrect mode field.  */
 /* On Arm newer glibc provide a different mode field, it's hard to detect
so just disable the check.  */
-CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
+//CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
 #endif

 CHECK_TYPE_SIZE(shmid_ds);

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #3 from Andi Kleen  ---
Versions reproduced:

gcc version 10.1.1 20200507 [revision dd38686d9c810cecbaa80bb82ed91caaa58ad635]
(SUSE Linux) 

gcc-9 (SUSE Linux) 9.3.1 20200406 [revision
6db837a5288ee3ca5ec504fbd5a765817e556ac2]


Version which worked correctly:

gcc version 7.5.0 (SUSE Linux) 



binutils:

GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #1 from Andi Kleen  ---
Created attachment 48792
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48792=edit
sys_ni.i

[Bug lto/95928] LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

--- Comment #2 from Andi Kleen  ---
Created attachment 48793
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48793=edit
capability.i

[Bug lto/95928] New: LTO through ar breaks weak function resolution

2020-06-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928

Bug ID: 95928
   Summary: LTO through ar breaks weak function resolution
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-linux

Created attachment 48791
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48791=edit
dummy.c

With the attached test case (extracted from the Linux kernel) the expected
behavior is that the strong version of __x64_sys_capget overrides the weak
version in sys_ni.i

This works with LTO when the object files are linked directly, but doesn't work
(weak version of function is output) when the linking is through a .a file.

Works

gcc -flto -c sys_ni.i
gcc -flto -c capability.i
gcc -O2 -flto dummy.c sys_ni.o capability.o
# sys_ni_syscall doesn't appear, so the strong version is chosen
objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall

Breaks:

gcc -flto -c sys_ni.i 
gcc -flto -c capability.i 
rm -f x.a
gcc-ar q x.a sys_ni.o capability.o 
gcc -O2 -flto dummy.c x.a
# sys_ni_syscall appears, so the weak version is incorrectly chosen
objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall

This seems to be a regression, it worked on gcc-7, but breaks on gcc 9/10.
Don't have any immediate versions to test.

[Bug target/93346] gcc not generate BZHI

2020-01-20 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346

--- Comment #1 from Andi Kleen  ---
typedef unsigned u;
u bzhi(u src, u inx) { return src & ((1 << inx) - 1); } 


with -O2 -march=skylake generates

movl%esi, %r8d
movl$1, %esi
shlx%r8d, %esi, %esi
leal-1(%rsi), %eax
andl%edi, %eax
ret


clang generates the expected
bzhil   %esi, %edi, %eax
retq

[Bug target/93346] New: gcc not generate BZHI

2020-01-20 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346

Bug ID: 93346
   Summary: gcc not generate BZHI
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---
Target: x86_64

[Bug inline-asm/89839] New: section not reset to text for top level asm

2019-03-26 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89839

Bug ID: 89839
   Summary: section not reset to text for top level asm
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

The ELF section from the previous function doesn't get reset before top level
asm statements:

e.g.

__attribute__((section("foo"))) void func(void)
{
}

asm("foo:\n");

gcc -S gives

 .section foo,"ax",@progbits <- sets the section
 .globl func
 .type func, @function
func:
.LFB0:
 .cfi_startproc
 pushq %rbp
 .cfi_def_cfa_offset 16
 .cfi_offset 6, -16
 movq %rsp, %rbp  
 .cfi_def_cfa_register 6
 nop
 popq %rbp
 .cfi_def_cfa 7, 8
 ret
 .cfi_endproc
.LFE0:
 .size func, .-func
<--- no section reset before the asm
#APP
 foo:


The problem is if foo is some section with special behavior (for example
initcall sections in the Linux kernel) this can cause crashes. I've seen such
problems with LTO on the Linux kernel.

gcc should always reset the section to .text before emitting top level asm.

See with 8.x, but also trunk.

[Bug testsuite/86404] UNRESOLVED/UNSUPPORTED gcov test results due to Permission error mapping pages

2019-02-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404

--- Comment #4 from Andi Kleen  ---
Does something like this help?
(untested, cut-n-pasted, possibly with other values)

iff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5da5c63cd845..8744b9f091df 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -67,4 +67,4 @@ model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to
update script."
exit 1 ;;
 esac
-exec perf record -e $E -b "$@"
+exec perf record -m 256K -e $E -b "$@"

[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO

2018-12-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622

--- Comment #4 from Andi Kleen  ---
Ok that means that this code you pasted in ix86_option_override_internal
somehow doesn't get executed correctly for LTO switching between different
options.

Adding Honza.

[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO

2018-12-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622

Andi Kleen  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #3 from Andi Kleen  ---
Ok that means that this code you pasted in ix86_option_override_internal
somehow doesn't get executed correctly for LTO switching between different
options.

Adding Honza.

[Bug target/88622] ICE when changing -mpreferred-stack-boundary for different files with LTO

2018-12-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622

Andi Kleen  changed:

   What|Removed |Added

 Target||x86_64-linux
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Andi Kleen  ---
Don't have a small reproducer, but on a large LTO build where a few files are
built with

-mpreferred-stack-boundary=4

with the others using the default

I hit the following ICE in ix86_minimum_alignment

29610   /* Don't do dynamic stack realignment for long long objects with
29611  -mpreferred-stack-boundary=2.  */
29612   if ((mode == DImode || (type && TYPE_MODE (type) == DImode))
29613   && (!type || !TYPE_USER_ALIGN (type))
29614   && (!decl || !DECL_USER_ALIGN (decl)))
29615 {
29616   gcc_checking_assert (!TARGET_STV);
29617   return 32;
29618 }


I suspect the right fix is to just remove the assert?


Adding Jakub who added it originally in:

commit 1f1475a7e758328a59db17aef5d1ccd81232ea95
Author: jakub 
Date:   Thu Feb 4 09:02:01 2016 +

PR target/69454
* config/i386/i386.c (convert_scalars_to_vector): Remove
stack alignment fixes.
(ix86_option_override_internal): Disable TARGET_STV if stack
might not be aligned enough.
(ix86_minimum_alignment): Assert that TARGET_STV is false.

* gcc.target/i386/pr69454-1.c: New test.
* gcc.target/i386/pr69454-2.c: New test.

[Bug target/88622] New: ICE when changing -mpreferred-stack-boundary for different files with LTO

2018-12-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88622

Bug ID: 88622
   Summary: ICE when changing -mpreferred-stack-boundary for
different files with LTO
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

[Bug c/88583] New: -Wpacked-not-aligned shouldn't be in -Wall

2018-12-23 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88583

Bug ID: 88583
   Summary: -Wpacked-not-aligned shouldn't be in -Wall
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

gcc 9 added -Wpacked-not-aligned to Wall. In Linux kernel builds this warning
is very noisy. There's a Linux kernel patch now to disable it. But I suspect
other software using packed will be affected too.

It's especially pointless on x86 where unaligned only matters in some special
cases (with vectors)

When the programmer specified packed they should know what they are doing.

[Bug middle-end/88573] 9 regression: error: type mismatch in component reference

2018-12-21 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573

--- Comment #1 from Andi Kleen  ---
Created attachment 45281
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45281=edit
test case (unminimized)

gcc-9 -O2 -S -flto arch/x86/events/intel/pt.i
/home/ak/lsrc/linux/arch/x86/events/intel/pt.c: In function
'pt_buffer_reset_offsets':
/home/ak/lsrc/linux/arch/x86/events/intel/pt.c:1539:1: error: type mismatch in
component reference
 1539 | arch_initcall(pt_init);
  | ^~
struct topa_entry *[0:]

struct topa_entry *[0:]

_13 = buf->topa_index[pg];
/home/ak/lsrc/linux/arch/x86/events/intel/pt.c:1539:1: error: type mismatch in
component reference
struct topa_entry *[0:]

struct topa_entry *[0:]

_17 = buf->topa_index[pg];
during IPA pass: *free_lang_data

[Bug middle-end/88573] New: 9 regression: error: type mismatch in component reference

2018-12-21 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573

Bug ID: 88573
   Summary: 9 regression: error: type mismatch in component
reference
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Don't have a small test case currently, happens during a large LTO Linux kernel
build.

With gcc trunk (20181222) with checking enabled I get

/home/ak/lsrc/linux/kernel/events/callchain.c: In function
'get_callchain_entry':
/home/ak/lsrc/linux/kernel/events/callchain.c:260:1: error: type mismatch in
component reference
  260 | }
  | ^
struct perf_callchain_entry *[0:]

struct perf_callchain_entry *[0:]

_3 = entries->cpu_entries[cpu];
during IPA pass: *free_lang_data
/home/ak/lsrc/linux/kernel/events/callchain.c:260:1: internal compiler error:
verify_gimple failed

gcc 8 doesn't show this problem.

[Bug sanitizer/88277] ASAN stack poisoning is using unaligned stores on e.g. x86_64

2018-11-30 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88277

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen  ---
FWIW modern x86 CPUs are fairly good at unaligned accesses, so it might not be
worth it for performance.

[Bug ipa/88231] aligned functions laid down inefficiently

2018-11-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88231

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #4 from Andi Kleen  ---
I'm not sure it's a good idea to do this. Often the goal is not to get the
absolute smallest code, but to get code that minimizes cache line usage.
This is important for "frontend bound" code like gcc itself often is.

It would be rather better to use an algorithm like Petis-Hansen or the one in
hfsort (see
https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf)
to lay out the code based on expected call order to minimize foot print. For
best result would need profile feedback of course, but it might already do a
reasonable job based on static call frequencies.

[Bug target/88096] wrong inline AVX512F optimization

2018-11-28 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88096

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #1 from Andi Kleen  ---
Can you please attach a pre-processed test case of a file that shows the bug?

It's ok if it doesn't run, as long as the problem is clearly identified in the
assembler.

Then the test case could be likely minimized.

[Bug target/88195] New: misleading error message for unsupported builtin

2018-11-25 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88195

Bug ID: 88195
   Summary: misleading error message for unsupported builtin
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---
Target: x86_64

On x86, when using a builtin that is not supported by the target configuration,
e.g.

gcc -c -m32 -ptwrite t.c.c
with t.c being
void f(void)
{
__builtin_ia32_ptwrite64 (1);
}

I get

t.c:4:2: error: '__builtin_ia32_ptwrite64' needs isa option -mx32 -mptwrite



While technically correct, -mx32 would enable the 64bit builtin, I suspect for
near all users they would like to use -m64, or better not specifying -m32. So
it should mention that it is incompatible with -m32.

[Bug tree-optimization/42587] bswap not recognized for memory

2018-11-19 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42587

--- Comment #11 from Andi Kleen  ---
Only when the first test case is fixed too

[Bug c/61727] #pragma simd is undocumented

2018-11-15 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61727

--- Comment #4 from Andi Kleen  ---
This was originally about the #pragma simd in CIlk+, which has been removed.
But it lives on in #pragma omp simd

[Bug lto/83375] partitioner partitions static arrays with label references

2018-10-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

--- Comment #6 from Andi Kleen  ---
This breaks Linux kernel LTO builds. I currently have a workaround (disabling
LTO for that file), but I don't think your "is not common" argument is valid.

[Bug other/50639] -flto=jobserver broken on large LTO build

2018-04-19 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639

--- Comment #4 from Andi Kleen  ---
I doubt it's fixed. It's a race so can be unstable.

Especially since judging from the growing cc list other people keep seeing it

It may not be something that gcc can fix, if anything it's more likely in make
or in Linux.

[Bug other/50639] -flto=jobserver broken on large LTO build

2018-04-18 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639

--- Comment #2 from Andi Kleen  ---
FWIW the problem disappeared for me at some point (could have been newer kernel
or different make). I don't see it anymore.

I think it was some problems with the pipes used by the job server losing a
token

[Bug c/83397] void f() { } has zero arguments

2017-12-12 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83397

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #4 from Andi Kleen  ---
If you want to skip the rax setup you can use -mskip-rax-setup

But in general it's dangerous because old gcc compiled code can jump to random
locations if a real varargs function gets called with undefined rax

[Bug ipa/83346] inliner crash with attribute always_inline/flatten on a destructor

2017-12-12 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346

--- Comment #3 from Andi Kleen  ---
Fixed by https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00764.html

[Bug lto/83388] New: reference statement index not found error with -fsanitize=null

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83388

Bug ID: 83388
   Summary: reference statement index not found error with
-fsanitize=null
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42844
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42844=edit
test case

With the attached test case

gcc8  -m32 -O2 -flto -fsanitize=null -c core.i
gcc8 -r -nostdlib core.o

gives

In function 'i':
lto1: fatal error: Reference statement index not found
compilation terminated.

Happens with gcc 7 and trunk

[Bug lto/83376] ICE in LTO streamer

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376

Andi Kleen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andi Kleen  ---
Looks like it was a case of incompatible LTO object file from a different gcc
build. With a clean build it doesn't happen anymore.

[Bug lto/83380] New: disk full while writing LTO files leads to ICE

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83380

Bug ID: 83380
   Summary: disk full while writing LTO files leads to ICE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

lto1: fatal error: error writing to vmlinux.ltrans15.s: No space left on device
gcc: internal compiler error: Aborted signal terminated program lto1
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.

Should just exit in this case

[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

--- Comment #3 from Andi Kleen  ---
patch checked in

[Bug lto/83376] New: ICE in LTO streamer

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376

Bug ID: 83376
   Summary: ICE in LTO streamer
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Don't have a small test case right now, but will bisect

When building Linux kernel LTO with gcc 8 I currently get an ICE. Doesn't
happen on 7 and I think it's also recent on 8.

In this case data_in->current_decl_data is NULL while reading a reference.

0xa58fe7 crash_signal
../../gcc/gcc/toplev.c:325
0x957a39 lto_file_decl_data_get_var_decl
../../gcc/gcc/lto-streamer.h:1210
0x957a39 lto_input_tree_ref(lto_input_block*, data_in*, function*, LTO_tags)
../../gcc/gcc/lto-streamer-in.c:366
0x957c1d lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
../../gcc/gcc/lto-streamer-in.c:1475
0x6bdc8c lto_read_decls
../../gcc/gcc/lto/lto.c:1791
0x6bdc8c lto_file_finalize
../../gcc/gcc/lto/lto.c:2055
0x6bdc8c lto_create_files_from_ids
../../gcc/gcc/lto/lto.c:2065
0x6bdc8c lto_file_read
../../gcc/gcc/lto/lto.c:2106
0x6bdc8c read_cgraph_and_symbols
../../gcc/gcc/lto/lto.c:2818
0x6bfdb1 lto_main()
../../gcc/gcc/lto/lto.c:3323

[Bug lto/83375] partitioner partitions static arrays with label references

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

--- Comment #1 from Andi Kleen  ---
Actually -flto-partition=max

[Bug lto/83375] New: partitioner partitions static arrays with label references

2017-12-11 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

Bug ID: 83375
   Summary: partitioner partitions static arrays with label
references
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42842
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42842=edit
test case

I thought there was already a bug for this, but can't find it right now.

When & are put into static arrays the LTO partitioner can put the static
into a different partition, which causes an assembler error because the code
labels are local.

This breaks Linux kernel LTO builds.

See attached test case.

I think ipa-comdats should put the function and the static into the same
partition, but for some reason it doesn't work.

Attached test case shows the problem with -flto-partition=1to1 -flto -O2

[Bug gcov-profile/83355] New: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2017-12-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

Bug ID: 83355
   Summary: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Running in gdb shows that there is a very deep recursion in get_index_by_decl
until it overflows the stack.

This patch seems to fix it (but not sure why the abstract origin would point to
itself)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 5134a795331..403709bad6b 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -477,7 +477,7 @@ string_table::get_index_by_decl (tree decl) const
   ret = get_index (lang_hooks.dwarf_name (decl, 0));
   if (ret != -1)
 return ret;
-  if (DECL_ABSTRACT_ORIGIN (decl))
+  if (DECL_ABSTRACT_ORIGIN (decl) && DECL_ABSTRACT_ORIGIN (decl) != decl)
 return get_index_by_decl (DECL_ABSTRACT_ORIGIN (decl));

   return -1;


Backtrace:


Program received signal SIGSEGV, Segmentation fault.
0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at
/home/andi/gcc/git/gcc/gcc/pretty-print.c:1485
1485{
(gdb) up
#1  0x016c4c90 in pp_append_text(pretty_printer*, char const*, char
const*) ()
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556
1556  pp_emit_prefix (pp);
(gdb) bt
#0  0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 )
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485
#1  0x016c4c90 in pp_append_text(pretty_printer*, char const*, char
const*) ()
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556
#2  0x00b12c83 in pp_c_identifier (pp=0x229b1a0
, id=)
at /home/andi/gcc/git/gcc/gcc/c-family/c-pretty-print.c:1203
#3  0x00992b46 in dump_decl (flags=0, t=0x76d2ce40, pp=0x229b1a0
)
at /home/andi/gcc/git/gcc/gcc/tree.h:3226
#4  dump_function_name(cxx_pretty_printer*, tree_node*, int) () at
/home/andi/gcc/git/gcc/gcc/cp/error.c:1852
#5  0x009940a4 in lang_decl_name(tree_node*, int, bool) () at
/home/andi/gcc/git/gcc/gcc/cp/error.c:3005
#6  0x00994133 in lang_decl_dwarf_name (decl=,
v=, translate=)
at /home/andi/gcc/git/gcc/gcc/cp/error.c:2977
#7  0x0156762a in autofdo::string_table::get_index_by_decl(tree_node*)
const ()
at /home/andi/gcc/git/gcc/gcc/auto-profile.c:477


[Bug ipa/83346] inliner crash with always inline and templates

2017-12-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346

--- Comment #1 from Andi Kleen  ---
This fixes it. Don't know why that node has no decl.

Will submit after a test cycle.

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 7846e93d119..dcd8a3de1ac 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2391,7 +2391,8 @@ ipa_inline (void)
 entry of cycles, possibly cloning that entry point and
 try to flatten itself turning it into a self-recursive
 function.  */
-  if (lookup_attribute ("flatten",
+  if (node->decl
+&& lookup_attribute ("flatten",
DECL_ATTRIBUTES (node->decl)) != NULL)
{
  if (dump_file)

[Bug ipa/83346] New: inliner crash with always inline and templates

2017-12-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346

Bug ID: 83346
   Summary: inliner crash with always inline and templates
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42820
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42820=edit
test case

Attached test case segfaults with -O2 on gcc 7 and 8 trunk
g++ -O2 -S ch-crash.i

ch-crash.i:30:1: internal compiler error: Segmentation fault
 }
 ^
0xc030f7 crash_signal
../../gcc/gcc/toplev.c:325
0x125b189 ipa_inline
../../gcc/gcc/ipa-inline.c:2388
0x125b189 execute
../../gcc/gcc/ipa-inline.c:2807

[Bug target/83052] [8 Regression] ICE in extract_insn, at recog.c:2305 starting from r254560

2017-11-20 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83052

--- Comment #1 from Andi Kleen  ---
I'm not sure why you call it a regression? You must be running the test suite
manually with the new option. 

I haven't tested, but likely it will fail if you run that test with
-mcmodel=large too. The -mforce-indirect-call patch is really only a subset
of -mcmodel=large.  Then it would be more a latent bug.

[Bug tree-optimization/82854] New: more missing simplifcations

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854

Bug ID: 82854
   Summary: more missing simplifcations
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

These all come from a paper

"Optgen: A Generator for Local Optimizations" (Buchwald et.al.).
https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald15cc.pdf

These were found by a SAT solver.

I wrote them in partial pseudo match.pd syntax (untested, likely buggy)

I'm not sure how useful they are really for real programs, but with the auto
generated matchers scaling well to more rules they wouldn't hurt I suppose.

/* x + (x & 0x8000) -> x & 0x7fff */
(simplify
  (plus:c @0 (bit_and @0 integer_msb_onlyp@1))
  (bit_and @0 { @1 - 1; } ))

/* (x | 0x8000) + 0x8000 -> x & 0x7FFF */
(simplify
  (plus:c (bit_ior @0 integer_msb_onlyp) msb_setp)
  (bit_and @0 { msb_minus_one_val(type); } ))

/* x & (x + 0x8000) -> x & 0x7FFF */
(simplify
  (bit_and:c (plus @0 msb_setp) @0)
  (bit_and @0 { msb_minus_one_val(type); } ))

/* x & (0x7FFF - x) -> x & 0x8000 */
(simplify
  (bit_and:c @0 (minus msb_minus_onep @0))
  (bit_and @0 { msb_val(type); } ))

/* is_power_of_2(c1) && c0 & (2 * c1 - 1) == c1 - 1 ->
   (c0 - x) & c1 -> x & c1 */

/* x | (x + 0x8000) -> x | 0x8000 */
(simplify
  (bit_ior:c @0 (plus @0 msb_onlyp))
  (bit_ior @0 { msb_val(type); } ))

/* x | (0x7FFF - x) -> x | 0x7FFF */
(simplify
  (bit_ior:c @0 (minus 0x7FFF @0))
  (bit_ior @0 0x7FFF))

/* x | (x ^ y) -> x | y */
(simplify
  (bit_ior:c @0 (bit_xor:c @0 @1))
  (bit_ior @0 @1))

/* ((c0 | -c0) & ∼c1) == 0 AND (x + c0) | c1 -> x | c1 */

/* is_power_of_2(∼c1) && c0 & (2 * ∼c1 - 1) == ∼c1 - 1 AND
   (c0 - x) | c1 ->
   x | c1 */

/* -x | 0xFFFE -> x | 0xFFFE */
(simplify
  (bit_or (negate @0) 0xFFFE)
  (bit_or @0 0xFFFE))

/* 0 - (x & 0x8000) -> x & 0x8000 */
(simplify
  (minus 0 (bit_and:c @0 0x8000))
  (bit_and @0 0x8000))

/* 0x7FFF - (x & 0x8000) -> x | 0x7FFF */
(simplify
  (minus 0x7FFF (bit_and @0 0x8000))
  (bit_ior @0 0x7FFF))

/* 0x7FFF - (x | 0x7FFF) -> x & 0x8000 */
(simplify
  (minus 0x7FFF (bit_ior:c @0 0x7FFF))
  (bit_and @0 0x8000))

/* 0xFFFE - (x | 0x7FFF) -> x | 0x7FFF */
(simplify
  (minus 0xFFFE (bit_ior:c @0 0x7FFF))
  (bit_ior @0 0x7FFF))

/* (x & 0x7FFF) - x -> x & 0x8000 */
(simplify
  (minus (bit_and:c @0 0x7FFF) @0)
  (bit_and @0 0x8000))

/* x ^ (x + 0x8000) -> 0x8000 */
(simplify
  (bit_xor:c (plus:c @0 0x8000))
  0x8000)

/* x ^ (0x7FFF - x) -> 0x7FFF */
(simplify
  (bit_xor:c @0 (minus 0x7FFF @0))
  0x7FFF)

/* (x + 0x7FFF) ^ 0x7FFF -> -x */
(simplify
  (bit_xor:c (plus:c @0 0x7FFF) 0x7FFF)
  (negate @0))

/* -x ^ 0x8000 -> 0x8000 - x */
(simplify
  (bit_xor:c (negate @0) 0x8000)
  (minus 0x8000 @0))

/* (0x7FFF - x) ^ 0x7FFF -> x */
(simplify
  (bit_xor:c (minus 0x7FFF @0) 0x7FFF)
  @0)

/* ~(x + c) -> ~c - x */
(simplify
  (bit_not (plus:c @0 CONSTANT_CLASS_P@1))
  (minus (bit_not c) @0))

/* -x ^ 0x7FFF -> x + 0x7FFF */
(simplify
  (bit_xor (negate @0) 0x7FFF)
  (plus @0 0x7FFF))

/* (x | c) - c -> x & ∼c */
(simplify
  (minus (bit_ior @0 CONSTANT_CLASS_P@1) @1)
  (bit_and @0 (bit_not @1)))

/* ~(c - x) -> x + ∼c */
(simplify
  (bit_not (minus CONSTANT_CLASS_P@0 @1))
  (plus @1 (bit_not @0)))

/* -c0 == c1 AND (x | c0) + c1 -> x & ∼c1 */
(simplify
  (plus (bit_or @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
  (if (...)
(bit_and @0 (bit_not @2))

/* (c0 & ∼c1) == 0 AND (x ^ c0) | c1 -> x | c1 */

/* 0x7FFF - (x ^ c) -> x ^ (0x7FFF - c) */

[Bug tree-optimization/82854] more missing simplifcations

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854

--- Comment #1 from Andi Kleen  ---
Also I suppose a lot of them could be generalized to 8/16/64bit.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #8 from Andi Kleen  ---
I'm not sure if it works with other numbers too.

(need to dig through Hacker's delight & Matters Computational to see if they
have anything on it)

But it could be extended for other word lengths at least

BTW there are some other cases, will file a bug shortly on those too.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #5 from Andi Kleen  ---
Also I'm not sure why you would want it in the middle end. It should all work
at the tree level

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #4 from Andi Kleen  ---
Right it's about special casing the complete expression

[Bug tree-optimization/82853] New: Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Bug ID: 82853
   Summary: Optimize x % 3 == 0 without modulo
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Ralph Levien pointed out as part of FizzBuzz optimization:

Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xb)
< 0x5556 (assuming wrapping unsigned 32 bit arithmetic).

gcc currently generates the full modulo and then checks.

Could be done in match.pd I suppose.

Test case

unsigned mod3(unsigned a) { return 0==(a%3); }

[Bug other/82784] Remove semicolon after "do {} while (0)" macros

2017-11-04 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82784

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #5 from Andi Kleen  ---
Sounds like a good candidate for a new warning

[Bug c/82013] New: better error message for missing semicolon in prototype

2017-08-28 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82013

Bug ID: 82013
   Summary: better error message for missing semicolon in
prototype
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

gcc gives quite poor error messages when forgetting a semicolon after a
prototype (common mistake when cut'n'pasting a function definition into a
header)

It's especially confusing when the prototype is the last in the include file,
because then the errors appear in another file.

As a minimum it should warn about a missing semicolon at the end of a file.

Possibly this could be also used for fix-it, but that's likely more
complicated.

[Bug target/80742] New: attribute target no- does not work

2017-05-14 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80742

Bug ID: 80742
   Summary: attribute target no- does not work
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Disabling ISAs with attribute target doesn't seem to work on x86_64

e.g. 

typedef float __m128 __attribute__ ((vector_size (16)));

__attribute__((target("no-sse2"))) __m128 func (__m128 x, __m128 y)
{
__m128 xmm0 = x, xmm1 = y, xmm2;
xmm0 = __builtin_ia32_xorps (xmm1, xmm1);
return xmm0;
}

does not error out.

[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out

2017-05-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067

--- Comment #3 from Andi Kleen  ---
sandra,

does this patch fix it?

diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 6214e3629f2..924a270e1bd 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -2,6 +2,7 @@
gets a label.  */
 /* { dg-require-effective-target freorder } */
 /* { dg-options "-O2 -freorder-blocks-and-partition -save-temps" } */
+/* { dg-require-profiling "-fprofile-generate" } */

 #define SIZE 1

[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out

2017-05-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067

--- Comment #2 from Andi Kleen  ---
There's a separate fix for the random failures (or w/a increase
/proc/sys/kernel/perf_event_mlock_kb), see PR 77684

Not running the test on systems without FDO seems best. I don't think it does
anything useful there anyways.

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2017-05-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684

--- Comment #5 from Andi Kleen  ---
Created attachment 41337
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41337=edit
limit perf buffer size

This patch allows parallelism upto 16 with the default setting.
Currently testing

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2017-05-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684

--- Comment #4 from Andi Kleen  ---
Thanks for tracing that down. 

So perf runs out of memory for the locked trace buffers

Increasing the limit is a good workaround
ulimit -l may also work, but also needs root.

We could just pass a smaller -m value to perf

Does it work when you change the last line in config/i386/gcc-auto-profile
to add -m 128k 

(or possibly other values, have to be power of two)

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #8 from Andi Kleen  ---
__builtin_constant_p does not cover variable range information, which is what
we're looking for here to prevent security bugs.

Also in my experience these explicit expressions tend to be somewhat fragile
and is not well specified.  It has to assume that the optimizer does specific
operations which are nowhere guaranteed.

An explicit builtin could be much tighter defined.

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #6 from Andi Kleen  ---
In the kernel there is also an upper limit on allocations.

Perhaps just a generic assert builtin that:
- uses value range information
- uses constant propagation
- is a nop when the compiler doesn't have either of this available
- otherwise warns at build time

__builtin_compile_assert(size >= 0 && size < MAX_ALLOC_SIZE);

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #4 from Andi Kleen  ---
I tested it now and the inline trick doesn't work. Here's a test case

extern void *do_alloc(int a, int b);

static inline __attribute__((alloc_size(1))) void check_alloc_size(int size)
{
}

static inline void *alloc(int a, int b)
{
check_alloc_size(a + b);
return do_alloc(a, b);
}

void func(void)
{
alloc(-1, 0);
}

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #3 from Andi Kleen  ---
Hmm, that trick may work for the shift too. Let me try.

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #1 from Andi Kleen  ---
Small correction: argument 4 would need to be a constant for shifted by.

[Bug lto/80379] New: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used

2017-04-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379

Bug ID: 80379
   Summary: Redundant  note: code may be misoptimized unless
-fno-strict-aliasing is used
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I get an extra

 note: code may be misoptimized unless -fno-strict-aliasing is used

note for type mismatches in LTO builds. But -fno-strict-aliasing is already
set. In this case the extra note is pointless and should be suppressed.

[Bug c/80378] New: Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

Bug ID: 80378
   Summary: Extend alloc_size attribute for better Linux kernel
checking
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I've been adding alloc_size attributes to the Linux kernel allocators.

However there are some allocator patterns that can currently not be correctly
described. It would be nice if the attribute could be extended with more
parameters to handle this.

One is 

void *alloc(int size_a, int size_b)

where the allocation size is size_a + size_b

The other is

void *alloc_order(int order)

where the allocation size is constant << order

This could be handled by two extra parameters to alloc_size, one to give a sum
argument and another to to give a shifted by argument. The arguments 2,3 would
also need to support a "ignore" parameter (e.g. -1)

[Bug lto/60016] gcc-nm does not report static symbols

2016-09-12 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60016

--- Comment #2 from Andi Kleen  ---
This is needed for example to generate backtraces, if the symbol table should
be built in instead of read from the binary.

The Linux kernel cannot read its own binary, so the symbol table has to built
in.

[Bug gcov-profile/71672] New: inlining indirect calls does not work with autofdo

2016-06-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71672

Bug ID: 71672
   Summary: inlining indirect calls does not work with autofdo
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

The current mainline version of autofdo doesn't inline indirect calls based on
profiling data.

I instrumented a bootstrap and it never triggers.

gcc.dg/tree-prof/indir-call-prof.c

also fails (needs the patch kit in
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01786.html applied first). 

I did some debugging and it seems to give up in update_inlined_ind_target()
here

 772   /* Program behavior changed, original promoted (and inlined) target is
not
 773  hot any more. Will avoid promote the original target.
 774 
 775  To check if original promoted target is still hot, we check the total
 776  count of the unpromoted targets (stored in old_info). If it is no
less
 777  than half of the callsite count (stored in INFO), the original
promoted
 778  target is considered not hot any more.  */
 779   if (total >= info->count / 2)

but even with the test commented out it doesn't work.

[Bug target/71659] New: _xgetbv intrinsic missing

2016-06-25 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659

Bug ID: 71659
   Summary: _xgetbv intrinsic missing
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

icc and microsoft have a _xgetbv intrinsic for the XGETBV instruction, which is
needed to check if AVX or MPX are supported by the kernel.

gcc is missing an intrinsic for that, so everyone has to write inline
assembler. Should add one.

[Bug c/70618] New: better error messages for missing/too many arguments

2016-04-10 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70618

Bug ID: 70618
   Summary: better error messages for missing/too many arguments
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

When doing API refactorings it is reasonable common to have too many or not
enough arguments in function calls. The existing errors in gcc/g++ are not very
good for that, i get at least two consecutive ones and they are not very clear.

Since that is common it would be much better if the compiler could compute the
minimum edit distance to the real prototype (or the nearest for C++) and then
directl ysuggest what arguments are missing or which are too many.

void foo(int *xp, float *yp, double *zp)
{
}

int x;
float y;
double z;
short k;

void f2(void)
{
foo(, );/* forgot x */
foo(, );/* forgot y */
foo(, );/* forgot z */
foo();/* forgot y and z */
foo();/* forgot x and y*/

foo(, , , );/* x too many at end */
foo(, , , );/* x too man at start */
foo(, , , );/* y too much in the middle */
foo(, , , );/* different y in middle */
foo(, , , );/* different x at start */
foo(, , , );/* different x at end */
}
gcc/tsrc/tmissing.c: In function ‘f2’:
gcc/tsrc/tmissing.c:14:6: warning: passing argument 1 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘int *’ but argument is of type ‘float
*’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:14:10: warning: passing argument 2 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type
‘double *’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:14:2: error: too few arguments to function ‘foo’
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: declared here
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:15:10: warning: passing argument 2 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, ); /* forgot y */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type
‘double *’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:15:2: error: too few arguments to function ‘foo’
  foo(, ); /* forgot y */

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #3 from Andi Kleen  ---

Analyzing the code more it looks like the compiler generates it correctly, the
edge returned should not be 0 here.

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #2 from Andi Kleen  ---
Created attachment 38110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38110=edit
somewhat reduced input file, only single function

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #1 from Andi Kleen  ---
Created attachment 38109
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38109=edit
ipa-profile input

Here's the source of the miscompiled file from the compiler

cc1plus -O2 ipa-profile.i  -S

unfortunately have to inspect assembler to see the miscompilation:

look for ipa_generate_profile_summary

then look for get_edge

call_ZN11cgraph_node8get_edgeEP6gimple
testq   %rax, %rax
movq%rax, %r15 
je  .L836< jump if rax/r15 is 0
testb   $2, 96(%rax)
je  .L837
.L836:   <--- it can be here
movq16(%r12), %rax
movq64(%r15), %rsi <-- BAD

same miscompilation here (just with another register). r15 is referenced after
being tested for NULL.

[Bug tree-optimization/70427] New: autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

Bug ID: 70427
   Summary: autofdo bootstrap generates wrong code
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I've been working on building gcc with an autofdo bootstrap.

Currently I always run into an crash while rebuilding tree.c with the stage2
compiler and the autofdo information 

Looking at the code it is clearly miscompiled in ipa_profile_generate_summary:

struct cgraph_edge * e = node->get_edge (stmt);
if (e && !e->indirect_unknown_callee)
  continue;


   0x0093bb16 <+326>:   callq  0x7be530
<_ZN11cgraph_node8get_edgeEP6gimple> 
   0x0093bb1b <+331>:   test   %rax,%rax   # check for NUULL
   0x0093bb1e <+334>:   mov%rax,%r8
   0x0093bb21 <+337>:   je 0x93bb2d   
<_ZL28ipa_profile_generate_summaryv+349>
   0x0093bb23 <+339>:   testb  $0x2,0x60(%rax)
   0x0093bb27 <+343>:   je 0x93baa7
<_ZL28ipa_profile_generate_summaryv+215>
   0x0093bb2d <+349>:   mov0x10(%r13),%rax # go here because of
NULL
=> 0x0093bb31 <+353>:   mov0x40(%r8),%rsi  # but we still
reference!

(gdb) p $r8
$4 = 0

The crash is on bb31 because r8 is NULL. The code checked the return value of
the call, but then references it afterwards before doing the continue.

Command line option:

cc1plus -fauto-profile=cc1plus.fda  -g -O2 tree.i

cc1plus.fda is at http://halobates.de/cc1plus.fda (too big to attach)

[Bug c/28901] -Wunused-variable ignores unused const initialised variables

2015-11-30 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901

--- Comment #17 from Andi Kleen  ---
There were a few false or useless ones (e.g. related to macros and specific
build configs).  I didn't look through them all, but various were semi
legitimate, but also very minor (small) so fixing it won't help much. I think
one or two of the ones I looked at may have been real bugs.

I still think the warning should not be in -Wall. thousand+ warnings in real
projects is just not acceptable.

[Bug c/28901] -Wunused-variable ignores unused const initialised variables

2015-11-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #14 from Andi Kleen  ---
I'm building a current Linux kernel with allyesconfig, and this new warning
causes
1383(!) new warnings in the build.

I think this should be revisited and the warning be turned off again.

[Bug target/68602] New: i386: -mtune/arch options not all output by -v --help

2015-11-28 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68602

Bug ID: 68602
   Summary: i386: -mtune/arch options not all output by -v --help
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

gcc -v --help does not output all the possible options for -mtune=/-march=

For example corei7-avx is missing for arch, which is Sandy Bridge. tune is also
mising all cpu names



  -march=CPU[,+EXTENSION...]
  generate code for CPU and EXTENSION, CPU is one of:
   generic32, generic64, i386, i486, i586, i686,
   pentium, pentiumpro, pentiumii, pentiumiii,
pentium4,
   prescott, nocona, core, core2, corei7, l1om, k1om,
   k6, k6_2, athlon, opteron, k8, amdfam10, bdver1,
   bdver2, bdver3, bdver4, btver1, btver2
  EXTENSION is combination of:
   8087, 287, 387, no87, mmx, nommx, sse, sse2, sse3,
   ssse3, sse4.1, sse4.2, sse4, nosse, avx, avx2,
   avx512f, avx512cd, avx512er, avx512pf, avx512dq,
   avx512bw, avx512vl, noavx, vmx, vmfunc, smx, xsave,
   xsaveopt, xsavec, xsaves, aes, pclmul, fsgsbase,
   rdrnd, f16c, bmi2, fma, fma4, xop, lwp, movbe, cx16,
   ept, lzcnt, hle, rtm, invpcid, clflush, nop,
syscall,
   rdtscp, 3dnow, 3dnowa, padlock, svme, sse4a, abm,
   bmi, tbm, adx, rdseed, prfchw, smap, mpx, sha,
   clflushopt, prefetchwt1, se1, clwb, pcommit,
   avx512ifma, avx512vbmi
  -mtune=CPU  optimize for CPU, CPU is one of:
   generic32, generic64, i8086, i186, i286, i386, i486,
   i586, i686, pentium, pentiumpro, pentiumii,
   pentiumiii, pentium4, prescott, nocona, core, core2,
   corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8,
   amdfam10, bdver1, bdver2, bdver3, bdver4, btver1,
   btver2

[Bug lto/66229] LTO fails with -fauto-profile on mcf

2015-11-28 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229

--- Comment #2 from Andi Kleen  ---
Some analysis of the problem:

At the time cc1 is streaming out profile_data it is not set to anything in
autofdo. So the LTO files contain all 0 profile data, which later causes the
ICE here.

Seems to be some kind of ordering problem.

Strangely the autofdo pass gets executed in the frontend run, but for unknown
reasons the profile data doesn't survive until the LTO data is written.

[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-09-28 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

Andi Kleen  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from Andi Kleen  ---
Turned out to be a binutils issue with an old binutils


[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-09-25 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

--- Comment #9 from Andi Kleen  ---
Created attachment 36391
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36391=edit
workaround

This workaround fixes it. Disable -gc-section for libstdc++.

It seems like a linker bug. I opened a binutils bug report
https://sourceware.org/bugzilla/show_bug.cgi?id=19008


[Bug lto/50676] Partitioning may fail with presence of static variables referring to function labels

2015-07-18 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50676

--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org ---
The patch doesn't seem to be checked in yet. Is there a reason for that?


[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-17 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 36008
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36008action=edit
Updated patch with documentation and param

I updated the patch with proper documentation and a param for the cut off.
In some tests it appears to do the right thing when building a Linux kernel.


[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-16 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
I suspect the patch may be too simple because it could get stuck in unlikely,
but high frequency edges in the cold area. Perhaps need to adapt more of the
code of the non partitioning reordering


[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-16 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 35993
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35993action=edit
Potential patch


This patch fixes the problem for my simple test case. It adds a fall back path
to the partition check: if no profile information is available only edges are
checked and everything that has only 20% frequency or less incoming edges is
considered cold.

20% is fairly arbitrary, likely needs tuning and should be a param. But seems
to work for the test case.

Comments?


[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-15 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---

The problem seems to be that
bb-reorder.c:find_rarely_executed_basic_blocks_and_crossing_edges
returns no edges without profile feedback, which prevents generation of a
section split note.


[Bug rtl-optimization/66890] New: function splitting only works with profile feedback

2015-07-15 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

Bug ID: 66890
   Summary: function splitting only works with profile feedback
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Consider this simple example:

volatile int count;

int main()
{
int i;
for (i = 0; i  10; i++) {
if (i == 999)
count *= 2;
count++;
}
}

The default EQ is unlikely heuristic in predict.* predicts that the if (i ==
999) is unlikely. So the tracer moves the count *= 2 basic block out of line to
preserve instruction cache.

gcc50 -O2 -S thotcold.c

movl$1, %edx
jmp .L2
.p2align 4,,10
.p2align 3
.L4:
addl$1, %edx
.L2:
cmpl$1000, %edx
movlcount(%rip), %eax
je  .L6
addl$1, %eax
cmpl$10, %edx
movl%eax, count(%rip)
jne .L4
xorl%eax, %eax
ret
# out of line code
.L6:
addl%eax, %eax
movl%eax, count(%rip)
movlcount(%rip), %eax
addl$1, %eax
movl%eax, count(%rip)
jmp .L4


Now if we enable -freorder-blocks-and-partition I would expect it to be also
put into .text.unlikely to given even better cache layout. But that's what is
not happening. It generates the same code.

Only when I use actual profile feedback and -freorder-blocks-and-partition the
code actually ends up being in a separate section

(it also unrolled the loop, so the code looks a bit different)

gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c
./a.out 
gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c 
...
   .cfi_endproc
.section.text.unlikely
.cfi_startproc
.L55:
movlcount(%rip), %ecx
addl$1, %eax
addl$1, %ecx
cmpl$10, %eax
movl%ecx, count(%rip)
je  .L6
cmpl$1, %edx
je  .L5
cmpl$2, %edx
je  .L28
cmpl$3, %edx


-freorder-blocks-and-partition should already use the extra section even
without profile feedback. 

I tested some larger programs and without profile feedback the unlikely section
is always empty.

The heuristics in predict.* often work quite well and a lot of code would
benefit from moving cold code out of the way of the caches.

This would allow to use the option to improve frontend bound codes without
needing to do full profile feedback.


[Bug lto/61635] LTO partitioner does not handle label in statics

2015-03-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61635

--- Comment #7 from Andi Kleen andi-gcc at firstfloor dot org ---
Still happens with current trunk and with newer LTO Linux kernels (4.0-rc*)


[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-03-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

--- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org ---
I still get that one with current trunk on my fedora 21 system.


[Bug c/65620] New: Incorrect warning for !! with -Wlogical-not-parentheses

2015-03-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65620

Bug ID: 65620
   Summary: Incorrect warning for !! with
-Wlogical-not-parentheses
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

Created attachment 35172
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35172action=edit
test case

When building the linux 4.0-rc5 kernel with 5.0 there are several imho
bogus warnings like

warning: logical not is only applied to the left hand side of comparison
[-Wlogical-not-parentheses]

for constructs like this:

  !!test_bit(...) != ...

The warning shouldn't warn for !! which is reasonably common. Looking at the
c/cp parsers there is already code to check for this, but it doesn't seem to
work here.

In the kernel case test_bit actually expands to a complex macro like

 if (usage-type == 0x01   !!(__builtin_constant_p((usage-code)) ? 
  constant_test_bit((usage-code), (input-key)) :
variable_test_bit((usage-key),   
   (input-key)))

I'm attaching an (already delta'ed but still quite big) test case

C++ likely has the same problem (but not tested)


[Bug bootstrap/65621] New: boot strap with checking enabled ICEs

2015-03-29 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621

Bug ID: 65621
   Summary: boot strap with checking enabled ICEs
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

target: x86_64-linux

../../../../gcc/libstdc++-v3/libsupc++/tinfo.cc:82:1: internal compiler error:
in mark_functions_to_output, at cgraphunit.c:1307
 }
 ^
0xb25f0b mark_functions_to_output
../../gcc/gcc/cgraphunit.c:1302
0xb29137 symbol_table::compile()
../../gcc/gcc/cgraphunit.c:2330
0xb29313 symbol_table::finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2444
0x884c9a cp_write_global_declarations()
../../gcc/gcc/cp/decl2.c:4755


  1   2   3   4   5   >