[Bug analyzer/115044] New: -Wanalyzer-malloc-leak diagnostic in terminal path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115044 Bug ID: 115044 Summary: -Wanalyzer-malloc-leak diagnostic in terminal path Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- I've been experimenting with -fanalyzer and attribute(malloc) in Xen. For a simple case, it reported: drivers/passthrough/x86/iommu.c: In function 'iommu_init_domid': ./include/xen/bug.h:141:13: warning: leak of '_xzalloc(4096, 8)' [CWE-401] [-Wanalyzer-malloc-leak] 141 | do { if ( unlikely(!(p)) ) assert_failed(#p); } while (0) | ^ drivers/passthrough/x86/iommu.c:551:9: note: in expansion of macro 'ASSERT' 551 | ASSERT(reserve > DOMID_MASK); | ^~ If I'm reading this correctly, it's saying that allocated memory is being leaked by ASSERT(). The whole function is: unsigned long *__init iommu_init_domid(domid_t reserve) { unsigned long *map; if ( !iommu_quarantine ) return ZERO_BLOCK_PTR; BUILD_BUG_ON(DOMID_MASK * 2U >= UINT16_MAX); map = xzalloc_array(unsigned long, BITS_TO_LONGS(UINT16_MAX - DOMID_MASK)); if ( map && reserve != DOMID_INVALID ) { ASSERT(reserve > DOMID_MASK); __set_bit(reserve & DOMID_MASK, map); } return map; } but I can't seem to reproduce the -fanalyzer warning in a simpler example. Assuming it might be something to do with our implementation of ASSERT(), I reduced it to just do { if (!(x)) __builtin_trap(); } while ( 0 ) in place, and that still did reproduce the warning. So while the analyser is technically accurate (i.e. the memory is leaked when we encounter a fatal path), I'm not sure it's a helpful diagnostic. Is there a reason why it's reported like this?
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #15 from Andrew Cooper --- Wow that's a lot of junk getting included for the minimal include set I could easily make. It occurs to me only after posting that you're liable to fail at: asm ( ".include \"arch/x86/include/asm/asm-macros.h\"" ); which always trips things up. You can safely drop it if you're just interested in the analyser behaviour.
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #14 from Andrew Cooper --- Created attachment 54572 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54572=edit Preprocessed example
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #13 from Andrew Cooper --- I've constructed an example which might be the knockon effect you were worried about? void foo(char *other) { char *ptr = NULL; if ( current->domain ) ptr = other; asm volatile ("cmc"); if ( current->domain ) ptr[0] = ~ptr[0]; } yields arch/x86/tmp.c: In function 'foo': arch/x86/tmp.c:14:22: error: dereference of NULL 'ptr' [CWE-476] [-Werror=analyzer-null-dereference] 14 | ptr[0] = ~ptr[0]; | ~~~^~~ 'foo': events 1-5 | |8 | if ( current->domain ) | |^ | || | |(1) following 'false' branch... |.. | 11 | asm volatile ("cmc"); | | ~~~ | | | | | (2) ...to here | 12 | | 13 | if ( current->domain ) | |~ | || | |(3) following 'true' branch... | 14 | ptr[0] = ~ptr[0]; | | ~~~ ~~ | | || | | |(5) dereference of NULL 'ptr' | | (4) ...to here |
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #10 from Andrew Cooper --- >From trying this out, a const attribute doesn't alter the code generation in the slightest, so I presume GCC has already figured the const-ness out.
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #9 from Andrew Cooper --- Thank-you for the fix. I've recompiled master and this uninitialised warning has gone. Unfortunately, Xen isn't GCC-13 clean (seems like a real bug in Xen), and the analyser has pointed out various other things which I'm still looking in to. I don't see anything which looks like it is a new knock-on effect from this change. Our code does fundamentally rely on get_cpu_info() always returning the same pointer (on a single CPU). For example, `current` is defined as `get_cpu_info()->current` and we do expect that to yield the same pointer when used multiple times. Even if the analyser was interpreting the generated asm, there's no way it could prove this without knowing the size/alignment constraints of our stacks. Would a const annotation on get_cpu_info() be likely to help? It occurs to me that this is true in all cases that the compiler could legitimately reason about. (It would only cease being true if we fell off our stack, at which point UB is the very least of our worries.)
[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #6 from Andrew Cooper --- (In reply to David Malcolm from comment #5) > Minimal reproducer: https://godbolt.org/z/E6EEY1WT6 > > Am I right in understanding that: > register unsigned long sp asm("rsp"); > is intended as a way to read the %rsp register? Ultimately, yes. More generally, this just creates a way to access the specific register. It is the only way for example to create an asm constraint on e.g. %r8, so the following is common to see for MSABI: register unsigned long param asm ("%r8"); param = 4; asm ("tdcall/whatever" : "+r" (param) ...); (Example here is from Intel's new TDX technology, but the actual asm instruction isn't important.)
[Bug c/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 Andrew Cooper changed: What|Removed |Added Component|analyzer|c --- Comment #4 from Andrew Cooper --- (In reply to Andreas Schwab from comment #3) > Perhaps it works if you declare the register variable in file scope. Huh. I honestly expected that not to compile, but it appears to, and it appears to work. There is minor perturbation in the build, but as far as I can see, it's just slightly different register/instruction scheduling. Why does being at global scope change the diagnostic?
[Bug c/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 --- Comment #2 from Andrew Cooper --- __builtin_frame_address() does appear to resolve the warning, but the knock-on effect for code generation is even worse than the asm() block. It forces a frame-pointer setup in all functions that use it (which is most functions in Xen), even leaf functions, and despite -fomit-frame-pointer, which in turn causes spilling of other registers now that %rbp isn't usable.
[Bug c/108968] New: fanalyzer false positive with the uninitalised-ness of the stack pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968 Bug ID: 108968 Summary: fanalyzer false positive with the uninitalised-ness of the stack pointer Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- I experimented with -fanalyzer on Xen, given all the recent work on Linux. We're quite similar, but one area where we are very different is accessing per-cpu variables. For architectural reasons (i.e. because we were virtualising Linux, and Linux uses %gs for its per-cpu variables), Xen doesn't. In Xen, we have a block of metadata at the base of the stack, and the stack suitably aligned such that we can do something like this: static inline struct cpu_info *get_cpu_info(void) { register unsigned long sp asm("rsp"); return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1; } Which turns into roughly: ptr = ((rsp | 0x7fff) + 1) - sizeof(struct cpu_info) which is correct and work suitably due to the alignment of the stack in the first place. Unfortunately, it triggers: ./arch/x86/include/asm/current.h:95:5: error: use of uninitialized value 'sp' [CWE-457] [-Werror=analyzer-use-of-uninitialized-value] reliably, every time macros such as `current` get expanded, which is everywhere. The reality is that the stack pointer is never uninitialised. It is unpredictable in the general case, but implementations can account for and remove that unpredictability. The normal trick to hide a variable from uninitialised handling (e.g. to asm("" : "+g"(var)); ) doesn't work, as it suffers from the same error. Is there any way to tell fanalyzer that this value really isn't uninitialised? I can't see anything obvious. I can work around the warning by doing: unsigned long sp; asm ( "mov %%rsp, %0" : "=r" (sp) ); but this impacts code generation quite substantially. This primitive is used all over the place, and the regular C form undergoes far better CSE than the explicit mov to retrieve the stack pointer.
[Bug middle-end/108799] Improper deprecation diagnostic for rsp clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799 --- Comment #2 from Andrew Cooper --- Adding a memory clobber doesn't make any difference that I can see, and I'm not aware of any reason why it ought to make a difference. I suppose that my real request here is to figure out what is the correct way to indicate that the redzone is clobbered, and to document it.
[Bug c/108799] New: Improper deprecation diagnostic for rsp clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799 Bug ID: 108799 Summary: Improper deprecation diagnostic for rsp clobber Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Originally from LKML. https://lore.kernel.org/lkml/y9lfmq%2fr1%2fpep...@biznet-home.integral.gnuweeb.org/ Slightly modified example: https://godbolt.org/z/xx76nEvKM Given: static void clobber_redzone_buggy(void) { register unsigned long rsp asm("rsp"); unsigned long fl; asm volatile ("pushf; popq %[fl]" : [fl] "=r" (fl) , "+r" (rsp) : : //"rsp" ); } static void set_red_zone(long *mem, long val) { __asm__ volatile ("movq %[val], %[mem]" : [mem] "=m" (*mem) : [val] "r" (val)); } static long get_red_zone(long *mem) { long ret; __asm__ volatile ("movq %[in], %[out]" : [out] "=r" (ret) : [in] "m" (*mem)); return ret; } long a_leaf_func_with_red_zone(void) { long x; set_red_zone(, 1); clobber_redzone_buggy(); /* The correct retval is 1 */ return get_red_zone(); } gcc generates: a_leaf_func_with_red_zone: movl$1, %eax movq%rax, -8(%rsp) pushf popq%rax movq-8(%rsp), %rax ret which is buggy because the asm clobbers the same redzone slot as `x` occupies. Swapping the "+r"(rsp) constraint for an explicit "rsp" clobber generates: a_leaf_func_with_red_zone: pushq %rbp movl$1, %eax movq%rsp, %rbp subq$16, %rsp movq%rax, -8(%rbp) pushf popq%rax movq-8(%rbp), %rax leave ret which seems to do the right thing. It sets up a full stack frame and avoids using the redzone. However, doing so yields: warning: listing the stack pointer register 'rsp' in a clobber list is deprecated [-Wdeprecated] note: the value of the stack pointer after an 'asm' statement must be the same as it was before the statement The note is incorrect. For ABIs with a redzone, the requirement is stricter than simply preserving the value of the stack pointer. The warning suggests that there ought to be a different way to express "this clobbers the redzone", but there doesn't appear to be any other way. If this is the case, why is it deprecated?
[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 --- Comment #3 from Andrew Cooper --- So yes - my experimentation did start from investigating the memory ordering behaviour of these builtins, based on a thread on LKML. The pushf in readflags and popf in writeflags have wildly different ordering requirements, depending on which flags are wanted/modified. AC for example (and IF for kernels) need to not be reordered with respect to any memory access. As you observe, readflags in particular needs to not be reordered with any instruction that modifies the arithmetic flags (which is most of them). IMO, it would be safe to omit the pushf from readflags if the result is not not used, because there are no unexpected side effects for pushf. The same is not true of popf in writeflags, which has side effects even when written twice with the same value.
[Bug middle-end/104971] New: Optimisation for __builtin_ia32_readeflags corrupts the stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971 Bug ID: 104971 Summary: Optimisation for __builtin_ia32_readeflags corrupts the stack Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Full example: https://godbolt.org/z/xGq3c4Mnc Given: int broken(void) { int fl = __builtin_ia32_readeflags_u64(); } gcc -O2 generates: broken: pushfq ret Which is going explode very quickly. Code generation appears to be safe without optimisation, but even -O alone is enough to create problems. At a guess, the optimiser has concluded that the result is unused, drops the `pop %reg`, but fails to also drop the `pushf` too. Looking through history on Godbolt, it appears that GCC 4.9 (which introduced this builtin) has correct optimised code generation, and it regressed between 4.9 and 5.1.
[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816 --- Comment #7 from Andrew Cooper --- (In reply to H.J. Lu from comment #5) > Are you suggesting to add an option to generate jump table with ENDBR? Jump tables are a legitimate optimisation. NOTRACK is a weakness in CET protections, and fully hardened userspace (as well as kernels) will want to run with MSR_{U,S}_CET.NOTRACK_EN=0. There should be some future where jump tables can be used in combination with NOTRACK_EN=0.
[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816 --- Comment #4 from Andrew Cooper --- I've worked around this in Xen with: https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=9d4a44380d273de22d5753883cbf5581795ff24d and https://lore.kernel.org/lkml/yixpv0q88paph...@hirez.programming.kicks-ass.net/ is pending for Linux. IMO, it's an error that -fcf-protection=branch is not obeyed for jump tables, and we don't want to end up in a situation where jump tables are unusable with CET.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #25 from Andrew Cooper --- (In reply to H.J. Lu from comment #24) > (In reply to Andrew Cooper from comment #23) > > Apologies for the delay, but I do now have a working prototype of Xen with > > CET-IBT active, using the current version of these patches. > > > > The result actually builds back to older versions of GCCs, but the lack of > > cf_check-ness typechecking makes this a fragile activity. > > Should I polish my patches and submit them now? Sorry - I missed this request. Yes please. These changes have been used for a while now with no issues identified.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #40 from Andrew Cooper --- I've given the GCC-11 branch a test and everything appears to be in order.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #33 from Andrew Cooper --- Looks good to me
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #30 from Andrew Cooper --- (In reply to CVS Commits from comment #27) > > x86: Add -mharden-sls=[none|all|return|indirect-branch] > It occurs to me that `indirect-branch` needs renaming to be `indirect-jmp` as the logic specifically does not make changes to code generation relating to `call` instructions.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #23 from Andrew Cooper --- Apologies for the delay, but I do now have a working prototype of Xen with CET-IBT active, using the current version of these patches. The result actually builds back to older versions of GCCs, but the lack of cf_check-ness typechecking makes this a fragile activity.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #21 from Andrew Cooper --- Another possibly-bug, but possibly mis-expectations on my behalf. I've found some code in the depths of Xen which is causing a failure on final link due to a missing `__x86_indirect_thunk_nt_rax` symbol. $ cat fnptr-typeof.c extern void (*fnptrs[])(char); void foo(int a) { typeof(foo) *bar = (void *)fnptrs[0]; bar(a); } I realise this is wildly undefined behaviour, and I will try to address it in due course. However, the instruction generation is bizarre. When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain `jmp *fnptrs(%rip)` instruction. (This is fine.) When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp *fnptrs(%rip)`. I'm not sure why the notrack is warranted here; for all GCC knows, the target does have a suitable ENDBR64 instruction. When I compile with -mindirect-branch=thunk as well, I get a load into %rax and a normal looking retpoline thunk. (This is as expected too.) However, when I switch to -mindirect-branch=thunk-extern, I get the the same load into %rax, and then a jump to `__x86_indirect_thunk_nt_rax`. Presumably the nt is short for notrack. Irrespective of whether there should be a notrack or not on the jmp form, it weird for the retpoline thunk ABI to be changing based on extern or not. What is the reasoning behind this?
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #20 from Andrew Cooper --- (In reply to H.J. Lu from comment #19) > (In reply to Andrew Cooper from comment #17) > > I think I've found a bug in the -fcf-check-attribute implementation. > > Please try the v5 patch. Thanks. That does fix the issue. > BTW, do you have a testcase to show how > -fcf-check-attribute=yes is used? So, this was something I was going to leave until I'd got CET-IBT working, so as to have time to consider all parts before proposing improvements. I don't have a usecase for -fcf-check-attribute=yes, because it is almost totally redundant with regular -fcf-protection in the first place. When you are are applying control flow checks, every function is either checked or not checked. But GCC currently has a 3-way model of {unknown, explicit yes, explicit no} on which it builds its typechecking. Furthermore, -mmanual-endbr is a gross hack which by default leaves you with a broken binary. If I were building this from scratch, I'd not have -mmanual-endbr or -fcf-check-attribute at all, because they're exposing complexity which ought not to exist. I get why the default for -fcf-protection=branch puts an ENDBR* instruction everywhere. It is the quick, easy and non-invasive way to make libraries compatible with CET, which is a net improvement, even if not ideal. The ideal way, and definitely future work, is for GCC to calculate the minimum set of required ENDBR*. At a guess, all non-local symbols (except those LTO can determine are not publicly visible), and any local symbols used by function pointers. What I'm trying to do is a stopgap in the middle. No ENDBR*'s by default, but have the compiler tell me where I've got function pointers to a non-ENDBR'd function, so when the result compiles, it stands a reasonable chance of functioning correctly. Personally, I'd suggest having these as sub-modes of -fcf-protection=branch, instead of exposing all the internals on the command line.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #17 from Andrew Cooper --- I think I've found a bug in the -fcf-check-attribute implementation. $ cat fnptr-array-arg.c static int __attribute__((cf_check)) foo(char a[], int b) { return 0; } int (*ptr)(char[], int) = foo; $ gcc -Wall -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o fnptr-array-arg.c:5:27: warning: initialization of 'int (*)(char *, int)' from incompatible pointer type 'int (__attribute__((nocf_check)) *)(char *, int)' [-Wincompatible-pointer-types] 5 | int (*ptr)(char[], int) = foo; | ^~~ tmp.o: file format elf64-x86-64 Disassembly of section .text: : 0: 31 c0 xor%eax,%eax 2: c3 retq Despite the explicit cf_check, a diagnostic is raised complaining about cf_check-ness of the pointer, and the generated code has no `endbr64` instruction. This issue only manifests when using array arguments in the function. When switching `char[]` for `char*`, everything works as expected. Also, dropping -fcf-check-attribute=no also causes things to work. $ gcc -Wall -fcf-protection=branch -mmanual-endbr -c fnptr-array-arg.c -o tmp.o && objdump -d tmp.o tmp.o: file format elf64-x86-64 Disassembly of section .text: : 0: f3 0f 1e fa endbr64 4: 31 c0 xor%eax,%eax 6: c3 retq Something about the array type seems to cause the explicit cf_check attribute to be lost.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #22 from Andrew Cooper --- One curious thing I have discovered. While auditing the -mharden-sls=all code generation in Xen, I found examples where I got "ret int3 ret int3" with no intervening instructions. It turns out this is not a regression in this change. It is a pre-existing missing optimisation, which is made more obvious when every ret is extended with an int3. It occurs for functions with either no stack frame at all, or functions which have an early exit before setting up the stack frame. Some examples which occur at -O1 do not occur at -O2. One curious example which does still repro at -O2 is this. We have a hash lookup function: struct context *sidtab_search(struct sidtab *s, u32 sid) { int hvalue; struct sidtab_node *cur; if ( !s ) return NULL; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( cur == NULL || sid != cur->sid ) { /* Remap invalid SIDs to the unlabeled SID. */ sid = SECINITSID_UNLABELED; hvalue = SIDTAB_HASH(sid); cur = s->htable[hvalue]; while ( cur != NULL && sid > cur->sid ) cur = cur->next; if ( !cur || sid != cur->sid ) return NULL; } return >context; } which compiles (reformatted a little for width - unmodified: https://paste.debian.net/hidden/7bf675d6/) to: : 48 85 ff test %rdi,%rdi /--- 74 63je |48 8b 17 mov(%rdi),%rdx |89 f0mov%esi,%eax |83 e0 7f and$0x7f,%eax |48 8b 04 c2 mov(%rdx,%rax,8),%rax |48 85 c0 test %rax,%rax | /--- 75 13jne | /|--- eb 17jmp | ||0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) | ||00 | ||/-> 48 8b 40 48 mov0x48(%rax),%rax | ||| 48 85 c0 test %rax,%rax | +||-- 74 06je | |\|-> 39 30cmp%esi,(%rax) | | \-- 72 f3jb | /| 74 24je | |\---> 48 8b 42 28 mov0x28(%rdx),%rax | | 48 85 c0 test %rax,%rax | | /--- 75 11jne |/|-|--- eb 32jmp // (1) ||| |66 0f 1f 44 00 00nopw 0x0(%rax,%rax,1) ||| |/-> 48 8b 40 48 mov0x48(%rax),%rax ||| || 48 85 c0 test %rax,%rax |||/||-- 74 17je // (2) \|-> 83 38 04 cmpl $0x4,(%rax) \-- 76 f2jbe 83 38 05 cmpl $0x5,(%rax) +||| 75 15jne ||\|---> 48 83 c0 08 add$0x8,%rax || | c3 retq || | cc int3 || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) || \---> c3 retq// Target of (2) || cc int3 || 66 0f 1f 44 00 00nopw 0x0(%rax,%rax,1) \|-> 31 c0xor%eax,%eax | c3 retq | cc int3 \-> c3 retq// Target of (1) cc int3 66 90xchg %ax,%ax There are 4 exits in total. Two have to set up %eax, so they can't usefully be merged. However, the unconditional jmp at (1) is 2 bytes, and could fully contain its target ret;int3 without even impacting the surrounding padding. Whether it inlines or merges, this drops 4 bytes. The conditional jump at (2) could be folded in to any of the other exit paths, dropping 16 bytes from the total size size. I have no idea how easy/hard this may be to track down, or whether it is worth pursuing urgently, but it probably does want looking at, seeing as SLS hardening doubles the hit.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #14 from Andrew Cooper --- (In reply to H.J. Lu from comment #13) > (In reply to Andrew Cooper from comment #11) > > > > There should be a diagnostic, but it ought to include cf_check in the type > > it prints. > > Try the v3 patch. Thanks. Now get: proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute for 'foo'; have 'void(void)' 2 | static void __attribute__((unused)) foo(void) | ^~~ proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' 1 | static void __attribute__((cf_check)) foo(void); | ^~~ which at least highlights the issue. Any variant like this, but possibly even simply reporting 'void __attribute__((nocf_check))(void)' should be fine.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #11 from Andrew Cooper --- (In reply to H.J. Lu from comment #10) > (In reply to Andrew Cooper from comment #8) > > Actually, there is a (possibly pre-existing) diagnostics issue: > > > > $ cat proto.c > > static void __attribute__((cf_check)) foo(void); > > static void __attribute__((unused)) foo(void) > > { > > } > > void (*ptr)(void) = foo; > > > > $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr > > -fcf-check-attribute=no -c proto.c -o proto.o > > proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)' > > 2 | static void __attribute__((unused)) foo(void) > > | ^~~ > > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' > > 1 | static void __attribute__((cf_check)) foo(void); > > | ^~~ > > > > > > The diagnostic complaining that the forward declaration doesn't match the > > definition gives 'void(void)' as the type in both cases, leaving out the > > fact that they differ by cf_check-ness. > > Please try the v2 patch. I appear to get no diagnostic at all now. This seems like a regression from v1. There should be a diagnostic, but it ought to include cf_check in the type it prints.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #8 from Andrew Cooper --- Actually, there is a (possibly pre-existing) diagnostics issue: $ cat proto.c static void __attribute__((cf_check)) foo(void); static void __attribute__((unused)) foo(void) { } void (*ptr)(void) = foo; $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c proto.c -o proto.o proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)' 2 | static void __attribute__((unused)) foo(void) | ^~~ proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)' 1 | static void __attribute__((cf_check)) foo(void); | ^~~ The diagnostic complaining that the forward declaration doesn't match the definition gives 'void(void)' as the type in both cases, leaving out the fact that they differ by cf_check-ness.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #18 from Andrew Cooper --- Yes to both.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #15 from Andrew Cooper --- So this is the irritating corner case where the two options are linked. *If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite `jmp __x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on boot time configuration/settings. In this case, we still need to fit the `int3` for SLS protection in somewhere. The two options are: 1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump and write out an `int3` directly, or 2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at boot time.
[Bug c/102967] confusing location in -Waddress for a subexpression of a ternary expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967 --- Comment #2 from Andrew Cooper --- (In reply to Martin Sebor from comment #1) > The warning is intended: it points out that the second operand of the > conditional expression is necessarily true: > > if ( !(pa ? >c : NULL) ) > ^^ > > There's no point in testing the address of a member for equality to null > because the member of no object can reside at that address. The above can > be simplified to > > if (!pa) Hmm, true. I suppose I got hung up on the statement made by the diagnostic, rather than the implication that a simplification could be made. Moving the underlining would certainly help. > When reporting bugs, please be sure to include the full test case and its > output as requested at https://gcc.gnu.org/bugs/#need. My apologies. I'll try to be better next time.
[Bug target/102953] Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 --- Comment #7 from Andrew Cooper --- Thankyou. I've tried these two patches and they do appear to be behaving as intended. I've put together a slightly extended version of the original test. Compile with gcc -Wall -fno-pic -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no static void __attribute__((noinline)) a(void) { asm volatile ("sti":::"memory"); } void __attribute__((nocf_check, noinline)) b(void) { asm volatile ("std":::"memory"); } void __attribute__((cf_check, noinline)) c(void) { asm volatile ("cmc":::"memory"); } void (*ptr_a)(void) = a; // Now raises a diagnostic void (*ptr_b)(void) = b; // Did diagnose previously, still does void (*ptr_c)(void) = c; int test(void) { ptr_a(); ptr_b(); ptr_c(); a(); b(); c(); // When fully linked, does skip c's ENDBR64 instruction return 0; } int main(void) { return 0; } I'll try applying this to a bigger codebase now.
[Bug c/102967] New: -Waddress with nested structures: Incorrect "the comparison will always evaluate as 'true'"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967 Bug ID: 102967 Summary: -Waddress with nested structures: Incorrect "the comparison will always evaluate as 'true'" Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Hello, I have been compiling Xen with gcc/master, and found what appears to be a regression from GCC 11. https://godbolt.org/z/qa61fs1hK is a simplified repro. The address comparison seems to be incorrectly applied to only of the two possible options. The expression as a whole is most certainly not unconditionally true.
[Bug target/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #2 from Andrew Cooper --- PeterZ has suggested that the straight line speculation case can be dis-entangled with the thunk inlining case. If an `int3` is emitted following any `jmp __x86_indirect_thunk_*` instruction (i.e. treated as an indirect jump despite retpoline), then the inlining logic need not worry about straight line speculation at all. However, this does depend on not generating `jcc __x86_indirect_thunk_*` as inlining that would require an additional `int3` for SLS safety.
[Bug c/102953] New: Improvements to CET-IBT and ENDBR generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953 Bug ID: 102953 Summary: Improvements to CET-IBT and ENDBR generation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Hello, With CET-IBT, ENDBR{32,64} instructions are used to mark legitimate forward edges for indirect branches. GCC can generate a CET-IBT binary with -fcf-protection, but the default behaviour is to generate ENDBR instructions for every function. This creates more "legal" forward edges than necessary in the eyes of CET-IBT. https://godbolt.org/z/M15rjMb4G is almost excellent, but it would be far more helpful if all functions were implicitly nocf_check, so this example produces a diagnostic. That way, GCC can point out all functions used by function pointers, rather than the result compiling and failing to be CET-IBT compatible. This on its own would be enough to let embedded projects minimise their ENDBR* count while having some compiler assistance while doing so. More generally, a lot of common cases (e.g. Linux) could be computed automatically. Drivers filling in ops structures typically refer to local symbols, so these defaulting to cf_check would be an improvement still. This would leave only global symbols needing explicit cf_check. (Maybe LTO could even figure out the global symbols cases correctly?) Finally, one minor code generation improvement. When GCC emits a direct call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation to skip the ENDBR instruction (not needed for direct call/jmp's) and save on decode bandwidth.
[Bug c/102952] New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 --- Comment #1 from Andrew Cooper --- https://bugs.llvm.org/show_bug.cgi?id=52323 for Clang cross-request.
[Bug c/102952] New: New code-gen options for retpolines and straight line speculation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 Bug ID: 102952 Summary: New code-gen options for retpolines and straight line speculation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Hello [FYI, this is being cross-requested of Clang too] Linux and other kernel level software makes use of -mindirect-branch=thunk-extern to be able to alter the handling of indirect branches at boot. It turns out to be advantageous to inline the thunks when retpoline is not in use. https://lore.kernel.org/lkml/20211026120132.613201...@infradead.org/ is some infrastructure to make this work. In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk. This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes. It would be very useful to have a code-gen option to write out `call %cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined. Relatedly, x86 straight line speculation has been discussed before, but without any action taken. It would be helpful to have a code gen option which would emit `int3` following any `ret` instruction, and any indirect jump, as neither of these two cases have following architectural execution. The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline `lfence; jmp *%reg; int3`. Third (and possibly only for future optimisations), Clang has been observed to spot conditional tail calls as `Jcc __x86_indirect_thunk_*`. This is a 6 byte source size, but needs up to 9 bytes of space for inlining including an `int3` for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986...@infradead.org/ for full details). It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.
[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 --- Comment #15 from Andrew Cooper --- (In reply to Martin Sebor from comment #1) > For now, I recommend suppressing the warning either by #pragma GCC > diagnostic or by making the pointer volatile. Trying this with the code sample from comment 14 doesn't work. Not only does the -Werror=array-bounds diagnostic remain, a second is picked up from: passing argument 1 of ‘__builtin_memcpy’ discards ‘volatile’ qualifier from pointer target type [-Werror=discarded-qualifiers]
[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 Andrew Cooper changed: What|Removed |Added CC||andrew.cooper3 at citrix dot com --- Comment #14 from Andrew Cooper --- I too have had what appears to be this bug, raised against a microkernel project. The logic is a test case involving interactions at the x86-64 lower canonical boundary, and reads like so: ... uint64_t *ptr = (void *)0x7ff8ul; memcpy(ptr, "\x90\x90\xf\xbxen\x90", 8); ... This yields: include/xtf/libc.h:36:37: error: ‘__builtin_memcpy’ offset [0, 7] is out of the bounds [0, 0] [-Werror=array-bounds] 36 | #define memcpy(d, s, n) __builtin_memcpy(d, s, n) | ^ main.c:81:5: note: in expansion of macro ‘memcpy’ 81 | memcpy(ptr, "\x90\x90\xf\xbxen\x90", 8); | ^~ It is worth pointing out that it is common for kernels to have some virtual addresses derived from compile-time constants, notably the fixmap and frametable.
[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654 --- Comment #11 from Andrew Cooper --- Thankyou very much for your help
[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654 --- Comment #7 from Andrew Cooper --- Yes - that works. Confirmed that I have both retpolines and endbr64 instructions, and the resulting hypervisor boots. (I don't have CET capable hardware, but that is fine. There is at minimum there is MSR setup to do, so any other issues can be addressed at that time.)
[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654 --- Comment #5 from Andrew Cooper --- Thanks. Just compiling the patch. As a note however, https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html will need adjusting to remove the final note for -mindirect-branch=choice
[Bug target/93654] New: Inappropriate "- -fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654 Bug ID: 93654 Summary: Inappropriate "- -fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87412 prohibited the use of -fcf-protection and -mindirect-branch=thunk in combination. However, it also breaks kernels which use -mindirect-branch=thunk-extern When retpoline protections were developed, I specifically requested thunk-extern to exist for kernels which provide their own, so that it can be made compatible with CET. A kernel which provides its own thunks will boot-time modify them to be appropriate for the system, and may not be a retpoline gadget. (They may be lfence; jmp *%reg which is recommended on AMD, or just jmp *%reg with IBRS) -mindirect-branch=thunk-extern specifically should be permitted with -fcf-protection, because this *was* the plan to make a single binary capable of using CET on applicable hardware, yet being safe to Spectre v2 on older hardware.
[Bug middle-end/92237] New: [x86] Missed optimisation opportunity with bit tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92237 Bug ID: 92237 Summary: [x86] Missed optimisation opportunity with bit tests Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- See https://godbolt.org/z/mP-8Y7 An expression such as: bool foo(uint64_t val) { return (val & 0x120) == 0x20; } gets assembled to: : 0: 81 e7 20 01 00 00 and$0x120,%edi 6: 48 83 ff 20 cmp$0x20,%rdi a: 0f 94 c0sete %al d: c3 retq Some part of optimisation has noticed that, due to the 32bit constant, the AND can be performed on %edi, but hasn't spotted that the same is true for the following CMP. In this example, the CMP could use %edi as well, and save emitting the REX prefix into the instruction stream.
[Bug c/91745] New: Documentation for __builtin_speculation_safe_value() doesn't compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91745 Bug ID: 91745 Summary: Documentation for __builtin_speculation_safe_value() doesn't compile Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fspeculation_005fsafe_005fvalue-1 The 3rd code sample in the documentation talks about how to transform the code when element 0 of the array isn't safe to speculatively access. https://godbolt.org/z/98LNs8 Unfortunately, it doesn't compile, citing: [x86-64 gcc 9.2 #1] error: both arguments must be compatible Replacing NULL with (int *)0 does allow the sample to compile. I'm not wel- versed enough in the C spec to comment on whether NULL and (int *)0 are compatible or not, but the intention of the documentation is clearly that NULL should be acceptable in this circumstance. Is this a documentation bug, or a diagnostics bug?
[Bug lto/90434] [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90434 --- Comment #2 from Andrew Cooper --- I'm afraid that so far, I haven't found a way to reduce the test case. I'm still trying.
[Bug lto/90434] New: [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90434 Bug ID: 90434 Summary: [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Hello, Apologies for this being a complicated bug report, but I think I may need some help tracking things down further. Nevertheless, I am moderately confident it is a regression in GCC 9.x, because 8.x works correctly. I have two versions of GCC built today from the tip of stable branches. $ PATH=/local/bin/gcc-8.x/bin:$PATH gcc --version gcc (GCC) 8.3.1 20190511 ... $ PATH=/local/bin/gcc-9.x/bin:$PATH gcc --version gcc (GCC) 9.1.1 20190511 The 8.x version works correctly, while the 9.x version results in incorrect code generation, which leads to a crash when executed. Both versions work fine when LTO is not in use. The environment is http://xenbits.xen.org/docs/xtf/ which is a set of microkerenls. As such, they are freestanding builds and linked to a single final executable, but are standard x86 32bit and 64bit ELF binaries. Because -ffreestanding disables a whole swathe of builtin optimisations, the headers locally reinsert the optimisations in the following manner: int strcmp(const char *s1, const char *s2); #define strcmp(s1, s2) __builtin_strcmp(s1, s2) An implementation of strcmp() is also provided for when a real call is emitted by the builtin: int (strcmp)(const char *_s1, const char *_s2) { unsigned char s1, s2; do { s1 = *_s1++; s2 = *_s2++; } while ( s1 && s1 == s2 ); return (s1 < s2) ? -1 : (s1 > s2); } The problematic test in question calls strcmp() from its main function. With 8.x, the builtin is expanded inline, and (having no callers) the strcmp() function itself is omitted from the final binary. With 9.x, the builtin is not expanded inline, and a call to strcmp() is emitted. strcmp() itself crashes when it first dereferences the second parameter. First of all, I found that if I replace the above strcmp() implementation with this alternative (which I borrowed from the Linux Kernel): int (strcmp)(const char *_s1, const char *_s2) { unsigned char s1, s2; while ( 1 ) { s1 = *_s1++; s2 = *_s2++; if ( s1 != s2 ) return s1 < s2 ? -1 : 1; if ( !s1 ) break; } return 0; } then 9.x *does* expand the builtin inline, and the strcmp() function itself is omitted from the final binary. This binary also executes correctly. Therefore, something is clearly inspecting the content of my locally provided strcmp() function and deciding that it can't expand the builtin inline. However, I'm not sure what it is checking for, because AFAICT, the ABI of those two variations is identical. That on its own would only cause an efficiently problem, not a correctness problem. When inspecting the disassembly of the bad binary, I see that strcmp() has its parameters passed in registers rather than on the stack. AFAICT, the value of the _s1 parameter is always correct, while the value of _s2 seems to be less than 256 (possibly a single char?). The crash occurs because the deference of _s2 hits the zero page. As a result, it is clear that the strcmp() function has been compiled with a non-standard ABI. AFAICT, the calling code in main() does conform to the expected ABI, by passing the parameters on the stack. Therefore, I wonder whether the code actually emitted for strcmp() was supposed to be a specialised version (strcmp.isra.$foo perhaps, or something else), which actually got called with the original ABI. Anyway - that is as far as I've got debugging. Can anyone suggest further steps to help narrow this down?