[Bug analyzer/115044] New: -Wanalyzer-malloc-leak diagnostic in terminal path

2024-05-11 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115044

Bug ID: 115044
   Summary: -Wanalyzer-malloc-leak diagnostic in terminal path
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

I've been experimenting with -fanalyzer and attribute(malloc) in Xen.

For a simple case, it reported:

drivers/passthrough/x86/iommu.c: In function 'iommu_init_domid':
./include/xen/bug.h:141:13: warning: leak of '_xzalloc(4096, 8)' [CWE-401]
[-Wanalyzer-malloc-leak]
  141 | do { if ( unlikely(!(p)) ) assert_failed(#p); } while (0)
  | ^
drivers/passthrough/x86/iommu.c:551:9: note: in expansion of macro 'ASSERT'
  551 | ASSERT(reserve > DOMID_MASK);
  | ^~


If I'm reading this correctly, it's saying that allocated memory is being
leaked by ASSERT().

The whole function is:

unsigned long *__init iommu_init_domid(domid_t reserve)
{
unsigned long *map;

if ( !iommu_quarantine )
return ZERO_BLOCK_PTR;

BUILD_BUG_ON(DOMID_MASK * 2U >= UINT16_MAX);

map = xzalloc_array(unsigned long, BITS_TO_LONGS(UINT16_MAX - DOMID_MASK));
if ( map && reserve != DOMID_INVALID )
{
ASSERT(reserve > DOMID_MASK);
__set_bit(reserve & DOMID_MASK, map);
}

return map;
}

but I can't seem to reproduce the -fanalyzer warning in a simpler example. 
Assuming it might be something to do with our implementation of ASSERT(), I
reduced it to just

  do { if (!(x)) __builtin_trap(); } while ( 0 )

in place, and that still did reproduce the warning.

So while the analyser is technically accurate (i.e. the memory is leaked when
we encounter a fatal path), I'm not sure it's a helpful diagnostic.

Is there a reason why it's reported like this?

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-03-02 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #15 from Andrew Cooper  ---
Wow that's a lot of junk getting included for the minimal include set I could
easily make.

It occurs to me only after posting that you're liable to fail at:

  asm ( ".include \"arch/x86/include/asm/asm-macros.h\"" );

which always trips things up.  You can safely drop it if you're just interested
in the analyser behaviour.

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-03-02 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #14 from Andrew Cooper  ---
Created attachment 54572
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54572=edit
Preprocessed example

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-03-02 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #13 from Andrew Cooper  ---
I've constructed an example which might be the knockon effect you were worried
about?

void foo(char *other)
{
char *ptr = NULL;

if ( current->domain )
ptr = other;

asm volatile ("cmc");

if ( current->domain )
ptr[0] = ~ptr[0];
}

yields 

arch/x86/tmp.c: In function 'foo':
arch/x86/tmp.c:14:22: error: dereference of NULL 'ptr' [CWE-476]
[-Werror=analyzer-null-dereference]
   14 | ptr[0] = ~ptr[0];
  |   ~~~^~~
  'foo': events 1-5
|
|8 | if ( current->domain )
|  |^
|  ||
|  |(1) following 'false' branch...
|..
|   11 | asm volatile ("cmc");
|  | ~~~ 
|  | |
|  | (2) ...to here
|   12 | 
|   13 | if ( current->domain )
|  |~
|  ||
|  |(3) following 'true' branch...
|   14 | ptr[0] = ~ptr[0];
|  | ~~~   ~~
|  | ||
|  | |(5) dereference of NULL 'ptr'
|  | (4) ...to here
|

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-03-02 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #10 from Andrew Cooper  ---
>From trying this out, a const attribute doesn't alter the code generation in
the slightest, so I presume GCC has already figured the const-ness out.

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-03-02 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #9 from Andrew Cooper  ---
Thank-you for the fix.

I've recompiled master and this uninitialised warning has gone.

Unfortunately, Xen isn't GCC-13 clean (seems like a real bug in Xen), and the
analyser has pointed out various other things which I'm still looking in to.  I
don't see anything which looks like it is a new knock-on effect from this
change.

Our code does fundamentally rely on get_cpu_info() always returning the same
pointer (on a single CPU).  For example, `current` is defined as
`get_cpu_info()->current` and we do expect that to yield the same pointer when
used multiple times.

Even if the analyser was interpreting the generated asm, there's no way it
could prove this without knowing the size/alignment constraints of our stacks.

Would a const annotation on get_cpu_info() be likely to help?  It occurs to me
that this is true in all cases that the compiler could legitimately reason
about.  (It would only cease being true if we fell off our stack, at which
point UB is the very least of our worries.)

[Bug analyzer/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-02-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #6 from Andrew Cooper  ---
(In reply to David Malcolm from comment #5)
> Minimal reproducer: https://godbolt.org/z/E6EEY1WT6
> 
> Am I right in understanding that:
> register unsigned long sp asm("rsp");
> is intended as a way to read the %rsp register?

Ultimately, yes.

More generally, this just creates a way to access the specific register.

It is the only way for example to create an asm constraint on e.g. %r8, so the
following is common to see for MSABI:

  register unsigned long param asm ("%r8");

  param = 4;
  asm ("tdcall/whatever"
   : "+r" (param) ...);

(Example here is from Intel's new TDX technology, but the actual asm
instruction isn't important.)

[Bug c/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-02-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

Andrew Cooper  changed:

   What|Removed |Added

  Component|analyzer|c

--- Comment #4 from Andrew Cooper  ---
(In reply to Andreas Schwab from comment #3)
> Perhaps it works if you declare the register variable in file scope.

Huh.  I honestly expected that not to compile, but it appears to, and it
appears to work.

There is minor perturbation in the build, but as far as I can see, it's just
slightly different register/instruction scheduling.

Why does being at global scope change the diagnostic?

[Bug c/108968] fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-02-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

--- Comment #2 from Andrew Cooper  ---
__builtin_frame_address() does appear to resolve the warning, but the knock-on
effect for code generation is even worse than the asm() block.

It forces a frame-pointer setup in all functions that use it (which is most
functions in Xen), even leaf functions, and despite -fomit-frame-pointer, which
in turn causes spilling of other registers now that %rbp isn't usable.

[Bug c/108968] New: fanalyzer false positive with the uninitalised-ness of the stack pointer

2023-02-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108968

Bug ID: 108968
   Summary: fanalyzer false positive with the uninitalised-ness of
the stack pointer
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

I experimented with -fanalyzer on Xen, given all the recent work on Linux. 
We're quite similar, but one area where we are very different is accessing
per-cpu variables.

For architectural reasons (i.e. because we were virtualising Linux, and Linux
uses %gs for its per-cpu variables), Xen doesn't.  In Xen, we have a block of
metadata at the base of the stack, and the stack suitably aligned such that we
can do something like this:

  static inline struct cpu_info *get_cpu_info(void)
  {
register unsigned long sp asm("rsp");

return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
  }

Which turns into roughly:

  ptr = ((rsp | 0x7fff) + 1) - sizeof(struct cpu_info)

which is correct and work suitably due to the alignment of the stack in the
first place.

Unfortunately, it triggers:

  ./arch/x86/include/asm/current.h:95:5: error: use of uninitialized value 'sp'
[CWE-457] [-Werror=analyzer-use-of-uninitialized-value]

reliably, every time macros such as `current` get expanded, which is
everywhere.


The reality is that the stack pointer is never uninitialised.  It is
unpredictable in the general case, but implementations can account for and
remove that unpredictability.

The normal trick to hide a variable from uninitialised handling (e.g. to asm(""
: "+g"(var)); ) doesn't work, as it suffers from the same error.

Is there any way to tell fanalyzer that this value really isn't uninitialised? 
I can't see anything obvious.


I can work around the warning by doing:

  unsigned long sp;
  asm ( "mov %%rsp, %0" : "=r" (sp) );

but this impacts code generation quite substantially.  This primitive is used
all over the place, and the regular C form undergoes far better CSE than the
explicit mov to retrieve the stack pointer.

[Bug middle-end/108799] Improper deprecation diagnostic for rsp clobber

2023-02-15 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

--- Comment #2 from Andrew Cooper  ---
Adding a memory clobber doesn't make any difference that I can see, and I'm not
aware of any reason why it ought to make a difference.

I suppose that my real request here is to figure out what is the correct way to
indicate that the redzone is clobbered, and to document it.

[Bug c/108799] New: Improper deprecation diagnostic for rsp clobber

2023-02-15 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

Bug ID: 108799
   Summary: Improper deprecation diagnostic for rsp clobber
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Originally from LKML. 
https://lore.kernel.org/lkml/y9lfmq%2fr1%2fpep...@biznet-home.integral.gnuweeb.org/

Slightly modified example: https://godbolt.org/z/xx76nEvKM

Given:
  static void clobber_redzone_buggy(void)
  {
  register unsigned long rsp asm("rsp");
  unsigned long fl;

  asm volatile ("pushf; popq %[fl]"
: [fl] "=r" (fl)
   , "+r" (rsp)
:
: //"rsp"
  );
  }

  static void set_red_zone(long *mem, long val)
  {
  __asm__ volatile ("movq %[val], %[mem]"
: [mem] "=m" (*mem)
: [val] "r" (val));
  }

  static long get_red_zone(long *mem)
  {
  long ret;

  __asm__ volatile ("movq %[in], %[out]"
: [out] "=r" (ret)
: [in] "m" (*mem));
  return ret;
  }

  long a_leaf_func_with_red_zone(void)
  {
  long x;

  set_red_zone(, 1);
  clobber_redzone_buggy();
  /* The correct retval is 1 */
  return get_red_zone();
  }

gcc generates:

  a_leaf_func_with_red_zone:
  movl$1, %eax
  movq%rax, -8(%rsp)
  pushf
  popq%rax
  movq-8(%rsp), %rax
  ret

which is buggy because the asm clobbers the same redzone slot as `x` occupies.

Swapping the "+r"(rsp) constraint for an explicit "rsp" clobber generates:

  a_leaf_func_with_red_zone:
  pushq   %rbp
  movl$1, %eax
  movq%rsp, %rbp
  subq$16, %rsp
  movq%rax, -8(%rbp)
  pushf
  popq%rax
  movq-8(%rbp), %rax
  leave
  ret

which seems to do the right thing.  It sets up a full stack frame and avoids
using the redzone.

However, doing so yields:

  warning: listing the stack pointer register 'rsp' in a clobber list is
deprecated [-Wdeprecated]
  note: the value of the stack pointer after an 'asm' statement must be the
same as it was before the statement

The note is incorrect.  For ABIs with a redzone, the requirement is stricter
than simply preserving the value of the stack pointer.

The warning suggests that there ought to be a different way to express "this
clobbers the redzone", but there doesn't appear to be any other way.  If this
is the case, why is it deprecated?

[Bug middle-end/104971] [9/10/11/12 Regression] Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

--- Comment #3 from Andrew Cooper  ---
So yes - my experimentation did start from investigating the memory ordering
behaviour of these builtins, based on a thread on LKML.

The pushf in readflags and popf in writeflags have wildly different ordering
requirements, depending on which flags are wanted/modified.  AC for example
(and IF for kernels) need to not be reordered with respect to any memory
access.

As you observe, readflags in particular needs to not be reordered with any
instruction that modifies the arithmetic flags (which is most of them).

IMO, it would be safe to omit the pushf from readflags if the result is not not
used, because there are no unexpected side effects for pushf.

The same is not true of popf in writeflags, which has side effects even when
written twice with the same value.

[Bug middle-end/104971] New: Optimisation for __builtin_ia32_readeflags corrupts the stack

2022-03-17 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

Bug ID: 104971
   Summary: Optimisation for __builtin_ia32_readeflags corrupts
the stack
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Full example: https://godbolt.org/z/xGq3c4Mnc

Given:

int broken(void)
{
int fl = __builtin_ia32_readeflags_u64();
}

gcc -O2 generates:

broken:
pushfq
ret

Which is going explode very quickly.

Code generation appears to be safe without optimisation, but even -O alone is
enough to create problems.

At a guess, the optimiser has concluded that the result is unused, drops the
`pop %reg`, but fails to also drop the `pushf` too.

Looking through history on Godbolt, it appears that GCC 4.9 (which introduced
this builtin) has correct optimised code generation, and it regressed between
4.9 and 5.1.

[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps

2022-03-07 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816

--- Comment #7 from Andrew Cooper  ---
(In reply to H.J. Lu from comment #5)
> Are you suggesting to add an option to generate jump table with ENDBR?

Jump tables are a legitimate optimisation.  NOTRACK is a weakness in CET
protections, and fully hardened userspace (as well as kernels) will want to run
with MSR_{U,S}_CET.NOTRACK_EN=0.

There should be some future where jump tables can be used in combination with
NOTRACK_EN=0.

[Bug target/104816] -fcf-protection=branch should generate endbr instead of notrack jumps

2022-03-07 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104816

--- Comment #4 from Andrew Cooper  ---
I've worked around this in Xen with:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=9d4a44380d273de22d5753883cbf5581795ff24d
and 
https://lore.kernel.org/lkml/yixpv0q88paph...@hirez.programming.kicks-ass.net/
is pending for Linux.

IMO, it's an error that -fcf-protection=branch is not obeyed for jump tables,
and we don't want to end up in a situation where jump tables are unusable with
CET.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2022-02-23 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #25 from Andrew Cooper  ---
(In reply to H.J. Lu from comment #24)
> (In reply to Andrew Cooper from comment #23)
> > Apologies for the delay, but I do now have a working prototype of Xen with
> > CET-IBT active, using the current version of these patches.
> > 
> > The result actually builds back to older versions of GCCs, but the lack of
> > cf_check-ness typechecking makes this a fragile activity.
> 
> Should I polish my patches and submit them now?

Sorry - I missed this request.  Yes please.  These changes have been used for a
while now with no issues identified.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2022-02-07 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #40 from Andrew Cooper  ---
I've given the GCC-11 branch a test and everything appears to be in order.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2022-01-06 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #33 from Andrew Cooper  ---
Looks good to me

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2022-01-06 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #30 from Andrew Cooper  ---
(In reply to CVS Commits from comment #27)
> 
> x86: Add -mharden-sls=[none|all|return|indirect-branch]
> 
It occurs to me that `indirect-branch` needs renaming to be `indirect-jmp` as
the logic specifically does not make changes to code generation relating to
`call` instructions.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-11-05 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #23 from Andrew Cooper  ---
Apologies for the delay, but I do now have a working prototype of Xen with
CET-IBT active, using the current version of these patches.

The result actually builds back to older versions of GCCs, but the lack of
cf_check-ness typechecking makes this a fragile activity.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-29 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #21 from Andrew Cooper  ---
Another possibly-bug, but possibly mis-expectations on my behalf.

I've found some code in the depths of Xen which is causing a failure on final
link due to a missing `__x86_indirect_thunk_nt_rax` symbol.

  $ cat fnptr-typeof.c
  extern void (*fnptrs[])(char);

  void foo(int a)
  {
  typeof(foo) *bar = (void *)fnptrs[0];
  bar(a);
  }

I realise this  is wildly undefined behaviour, and I will try to address it in
due course.  However, the instruction generation is bizarre.

When I compile with -fcf-protection=branch -mmanual-endbr, I get a plain `jmp
*fnptrs(%rip)` instruction.  (This is fine.)

When I compile with -fcf-check-attribute=no as well, then I get `notrack jmp
*fnptrs(%rip)`.  I'm not sure why the notrack is warranted here; for all GCC
knows, the target does have a suitable ENDBR64 instruction.

When I compile with -mindirect-branch=thunk as well, I get a load into %rax and
a normal looking retpoline thunk.  (This is as expected too.)

However, when I switch to -mindirect-branch=thunk-extern, I get the the same
load into %rax, and then a jump to `__x86_indirect_thunk_nt_rax`.  Presumably
the nt is short for notrack.


Irrespective of whether there should be a notrack or not on the jmp form, it
weird for the retpoline thunk ABI to be changing based on extern or not.  What
is the reasoning behind this?

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-29 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #20 from Andrew Cooper  ---
(In reply to H.J. Lu from comment #19)
> (In reply to Andrew Cooper from comment #17)
> > I think I've found a bug in the -fcf-check-attribute implementation.
> 
> Please try the v5 patch.

Thanks.  That does fix the issue.

>  BTW, do you have a testcase to show how
> -fcf-check-attribute=yes is used?

So, this was something I was going to leave until I'd got CET-IBT working, so
as to have time to consider all parts before proposing improvements.

I don't have a usecase for -fcf-check-attribute=yes, because it is almost
totally redundant with regular -fcf-protection in the first place.

When you are are applying control flow checks, every function is either checked
or not checked.  But GCC currently has a 3-way model of {unknown, explicit yes,
explicit no} on which it builds its typechecking.

Furthermore, -mmanual-endbr is a gross hack which by default leaves you with a
broken binary.

If I were building this from scratch, I'd not have -mmanual-endbr or
-fcf-check-attribute at all, because they're exposing complexity which ought
not to exist.

I get why the default for -fcf-protection=branch puts an ENDBR* instruction
everywhere.  It is the quick, easy and non-invasive way to make libraries
compatible with CET, which is a net improvement, even if not ideal.

The ideal way, and definitely future work, is for GCC to calculate the minimum
set of required ENDBR*.  At a guess, all non-local symbols (except those LTO
can determine are not publicly visible), and any local symbols used by function
pointers.

What I'm trying to do is a stopgap in the middle.  No ENDBR*'s by default, but
have the compiler tell me where I've got function pointers to a non-ENDBR'd
function, so when the result compiles, it stands a reasonable chance of
functioning correctly.

Personally, I'd suggest having these as sub-modes of -fcf-protection=branch,
instead of exposing all the internals on the command line.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-29 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #17 from Andrew Cooper  ---
I think I've found a bug in the -fcf-check-attribute implementation.

  $ cat fnptr-array-arg.c
  static int __attribute__((cf_check)) foo(char a[], int b)
  {
  return 0;
  }
  int (*ptr)(char[], int) = foo;

  $ gcc -Wall -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no -c
fnptr-array-arg.c -o tmp.o && objdump -d tmp.o
  fnptr-array-arg.c:5:27: warning: initialization of 'int (*)(char *, int)'
from incompatible pointer type 'int (__attribute__((nocf_check)) *)(char *,
int)' [-Wincompatible-pointer-types]
  5 | int (*ptr)(char[], int) = foo;
|   ^~~

  tmp.o: file format elf64-x86-64


  Disassembly of section .text:

   :
 0: 31 c0   xor%eax,%eax
 2: c3  retq   


Despite the explicit cf_check, a diagnostic is raised complaining about
cf_check-ness of the pointer, and the generated code has no `endbr64`
instruction.

This issue only manifests when using array arguments in the function.  When
switching `char[]` for `char*`, everything works as expected.  Also, dropping
-fcf-check-attribute=no also causes things to work.

  $ gcc -Wall -fcf-protection=branch -mmanual-endbr -c fnptr-array-arg.c -o
tmp.o && objdump -d tmp.o

  tmp.o: file format elf64-x86-64


  Disassembly of section .text:

   :
 0: f3 0f 1e fa endbr64 
 4: 31 c0   xor%eax,%eax
 6: c3  retq   


Something about the array type seems to cause the explicit cf_check attribute
to be lost.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #22 from Andrew Cooper  ---
One curious thing I have discovered.  While auditing the -mharden-sls=all code
generation in Xen, I found examples where I got "ret int3 ret int3" with no
intervening instructions.

It turns out this is not a regression in this change.  It is a pre-existing
missing optimisation, which is made more obvious when every ret is extended
with an int3.

It occurs for functions with either no stack frame at all, or functions which
have an early exit before setting up the stack frame.  Some examples which
occur at -O1 do not occur at -O2.

One curious example which does still repro at -O2 is this.  We have a hash
lookup function:

struct context *sidtab_search(struct sidtab *s, u32 sid)
{
int hvalue;
struct sidtab_node *cur;

if ( !s )
return NULL;

hvalue = SIDTAB_HASH(sid);
cur = s->htable[hvalue];
while ( cur != NULL && sid > cur->sid )
cur = cur->next;

if ( cur == NULL || sid != cur->sid )
{
/* Remap invalid SIDs to the unlabeled SID. */
sid = SECINITSID_UNLABELED;
hvalue = SIDTAB_HASH(sid);
cur = s->htable[hvalue];
while ( cur != NULL && sid > cur->sid )
cur = cur->next;
if ( !cur || sid != cur->sid )
return NULL;
}

return >context;
}

which compiles (reformatted a little for width - unmodified:
https://paste.debian.net/hidden/7bf675d6/) to:

:
 48 85 ff test   %rdi,%rdi
/--- 74 63je 
|48 8b 17 mov(%rdi),%rdx
|89 f0mov%esi,%eax
|83 e0 7f and$0x7f,%eax
|48 8b 04 c2  mov(%rdx,%rax,8),%rax
|48 85 c0 test   %rax,%rax
|   /--- 75 13jne
|  /|--- eb 17jmp
|  ||0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
|  ||00 
|  ||/-> 48 8b 40 48  mov0x48(%rax),%rax
|  |||   48 85 c0 test   %rax,%rax
|  +||-- 74 06je 
|  |\|-> 39 30cmp%esi,(%rax)
|  | \-- 72 f3jb 
| /| 74 24je 
| |\---> 48 8b 42 28  mov0x28(%rdx),%rax
| |  48 85 c0 test   %rax,%rax
| | /--- 75 11jne
|/|-|--- eb 32jmp // (1)
||| |66 0f 1f 44 00 00nopw   0x0(%rax,%rax,1)
||| |/-> 48 8b 40 48  mov0x48(%rax),%rax
||| ||   48 85 c0 test   %rax,%rax
|||/||-- 74 17je  // (2)
\|-> 83 38 04 cmpl   $0x4,(%rax)
 \-- 76 f2jbe
 83 38 05 cmpl   $0x5,(%rax)
+||| 75 15jne
||\|---> 48 83 c0 08  add$0x8,%rax
|| | c3   retq   
|| | cc   int3   
|| | 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
|| \---> c3   retq// Target of (2)
||   cc   int3   
||   66 0f 1f 44 00 00nopw   0x0(%rax,%rax,1)
\|-> 31 c0xor%eax,%eax
 |   c3   retq   
 |   cc   int3   
 \-> c3   retq// Target of (1)
 cc   int3   
 66 90xchg   %ax,%ax

There are 4 exits in total.  Two have to set up %eax, so they can't usefully be
merged.

However, the unconditional jmp at (1) is 2 bytes, and could fully contain its
target ret;int3 without even impacting the surrounding padding.  Whether it
inlines or merges, this drops 4 bytes.

The conditional jump at (2) could be folded in to any of the other exit paths,
dropping 16 bytes from the total size size.

I have no idea how easy/hard this may be to track down, or whether it is worth
pursuing urgently, but it probably does want looking at, seeing as SLS
hardening doubles the hit.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #14 from Andrew Cooper  ---
(In reply to H.J. Lu from comment #13)
> (In reply to Andrew Cooper from comment #11)
> > 
> > There should be a diagnostic, but it ought to include cf_check in the type
> > it prints.
> 
> Try the v3 patch.

Thanks.  Now get:

proto.c:2:37: error: conflicting types with implied 'nocf_check' attribute for
'foo'; have 'void(void)'
2 | static void __attribute__((unused)) foo(void)
  | ^~~
proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
1 | static void __attribute__((cf_check)) foo(void);
  |   ^~~

which at least highlights the issue.  Any variant like this, but possibly even
simply reporting 'void __attribute__((nocf_check))(void)' should be fine.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-28 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #11 from Andrew Cooper  ---
(In reply to H.J. Lu from comment #10)
> (In reply to Andrew Cooper from comment #8)
> > Actually, there is a (possibly pre-existing) diagnostics issue:
> > 
> > $ cat proto.c
> > static void __attribute__((cf_check)) foo(void);
> > static void __attribute__((unused)) foo(void)
> > {
> > }
> > void (*ptr)(void) = foo;
> > 
> > $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr
> > -fcf-check-attribute=no -c proto.c -o proto.o
> > proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
> > 2 | static void __attribute__((unused)) foo(void)
> >   | ^~~
> > proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
> > 1 | static void __attribute__((cf_check)) foo(void);
> >   |   ^~~
> > 
> > 
> > The diagnostic complaining that the forward declaration doesn't match the
> > definition gives 'void(void)' as the type in both cases, leaving out the
> > fact that they differ by cf_check-ness.
> 
> Please try the v2 patch.

I appear to get no diagnostic at all now.  This seems like a regression from
v1.

There should be a diagnostic, but it ought to include cf_check in the type it
prints.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #8 from Andrew Cooper  ---
Actually, there is a (possibly pre-existing) diagnostics issue:

$ cat proto.c
static void __attribute__((cf_check)) foo(void);
static void __attribute__((unused)) foo(void)
{
}
void (*ptr)(void) = foo;

$ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no
-c proto.c -o proto.o
proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
2 | static void __attribute__((unused)) foo(void)
  | ^~~
proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
1 | static void __attribute__((cf_check)) foo(void);
  |   ^~~


The diagnostic complaining that the forward declaration doesn't match the
definition gives 'void(void)' as the type in both cases, leaving out the fact
that they differ by cf_check-ness.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #18 from Andrew Cooper  ---
Yes to both.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #15 from Andrew Cooper  ---
So this is the irritating corner case where the two options are linked.

*If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite `jmp
__x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on boot
time configuration/settings.

In this case, we still need to fit the `int3` for SLS protection in somewhere.

The two options are:
1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump and
write out an `int3` directly, or
2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at
boot time.

[Bug c/102967] confusing location in -Waddress for a subexpression of a ternary expression

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967

--- Comment #2 from Andrew Cooper  ---
(In reply to Martin Sebor from comment #1)
> The warning is intended: it points out that the second operand of the
> conditional expression is necessarily true:
> 
>   if ( !(pa ? >c : NULL) )
>   ^^
> 
> There's no point in testing the address of a member for equality to null
> because  the member of no object can reside at that address.  The above can
> be simplified to
> 
>   if (!pa)

Hmm, true.  I suppose I got hung up on the statement made by the diagnostic,
rather than the implication that a simplification could be made.

Moving the underlining would certainly help.

> When reporting bugs, please be sure to include the full test case and its
> output as requested at https://gcc.gnu.org/bugs/#need.

My apologies.  I'll try to be better next time.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #7 from Andrew Cooper  ---
Thankyou.  I've tried these two patches and they do appear to be behaving as
intended.

I've put together a slightly extended version of the original test.  Compile
with gcc -Wall -fno-pic -Os -fcf-protection=branch -mmanual-endbr
-fcf-check-attribute=no

static void __attribute__((noinline)) a(void)
{
asm volatile ("sti":::"memory");
}

void __attribute__((nocf_check, noinline)) b(void)
{
asm volatile ("std":::"memory");
}

void __attribute__((cf_check, noinline)) c(void)
{
asm volatile ("cmc":::"memory");
}

void (*ptr_a)(void) = a; // Now raises a diagnostic
void (*ptr_b)(void) = b; // Did diagnose previously, still does
void (*ptr_c)(void) = c;

int test(void)
{
ptr_a();
ptr_b();
ptr_c();

a();
b();
c(); // When fully linked, does skip c's ENDBR64 instruction

return 0;
}

int main(void)
{
return 0;
}

I'll try applying this to a bigger codebase now.

[Bug c/102967] New: -Waddress with nested structures: Incorrect "the comparison will always evaluate as 'true'"

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967

Bug ID: 102967
   Summary: -Waddress with nested structures: Incorrect "the
comparison will always evaluate as 'true'"
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Hello,

I have been compiling Xen with gcc/master, and found what appears to be a
regression from GCC 11.

https://godbolt.org/z/qa61fs1hK is a simplified repro.

The address comparison seems to be incorrectly applied to only of the two
possible options.  The expression as a whole is most certainly not
unconditionally true.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-26 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #2 from Andrew Cooper  ---
PeterZ has suggested that the straight line speculation case can be
dis-entangled with the thunk inlining case.

If an `int3` is emitted following any `jmp __x86_indirect_thunk_*` instruction
(i.e. treated as an indirect jump despite retpoline), then the inlining logic
need not worry about straight line speculation at all.

However, this does depend on not generating `jcc __x86_indirect_thunk_*` as
inlining that would require an additional `int3` for SLS safety.

[Bug c/102953] New: Improvements to CET-IBT and ENDBR generation

2021-10-26 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

Bug ID: 102953
   Summary: Improvements to CET-IBT and ENDBR generation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Hello,

With CET-IBT, ENDBR{32,64} instructions are used to mark legitimate forward
edges for indirect branches.

GCC can generate a CET-IBT binary with -fcf-protection, but the default
behaviour is to generate ENDBR instructions for every function.  This creates
more "legal" forward edges than necessary in the eyes of CET-IBT.

https://godbolt.org/z/M15rjMb4G is almost excellent, but it would be far more
helpful if all functions were implicitly nocf_check, so this example produces a
diagnostic.  That way, GCC can point out all functions used by function
pointers, rather than the result compiling and failing to be CET-IBT
compatible.

This on its own would be enough to let embedded projects minimise their ENDBR*
count while having some compiler assistance while doing so.


More generally, a lot of common cases (e.g. Linux) could be computed
automatically.  Drivers filling in ops structures typically refer to local
symbols, so these defaulting to cf_check would be an improvement still.  This
would leave only global symbols needing explicit cf_check.  (Maybe LTO could
even figure out the global symbols cases correctly?)

Finally, one minor code generation improvement.  When GCC emits a direct
call/jmp to an ENDBR'd symbol, it can actually use sym+4 as an optimisation to
skip the ENDBR instruction (not needed for direct call/jmp's) and save on
decode bandwidth.

[Bug c/102952] New code-gen options for retpolines and straight line speculation

2021-10-26 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #1 from Andrew Cooper  ---
https://bugs.llvm.org/show_bug.cgi?id=52323 for Clang cross-request.

[Bug c/102952] New: New code-gen options for retpolines and straight line speculation

2021-10-26 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

Bug ID: 102952
   Summary: New code-gen options for retpolines and straight line
speculation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Hello

[FYI, this is being cross-requested of Clang too]

Linux and other kernel level software makes use of
-mindirect-branch=thunk-extern to be able to alter the handling of indirect
branches at boot.  It turns out to be advantageous to inline the thunks when
retpoline is not in use. 
https://lore.kernel.org/lkml/20211026120132.613201...@infradead.org/ is some
infrastructure to make this work.

In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk.  This
is fine for the low 8 registers, but not fine for %r{8..15} where the REX
prefix pushes the replacement size to being 6 bytes.

It would be very useful to have a code-gen option to write out `call
%cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will increase
the instruction length to 6, allowing the non-retpoline form to be inlined.


Relatedly, x86 straight line speculation has been discussed before, but without
any action taken.  It would be helpful to have a code gen option which would
emit `int3` following any `ret` instruction, and any indirect jump, as neither
of these two cases have following architectural execution.

The reason these two are related is that if both options are in use, we want an
extra byte of replacement space to be able to inline `lfence; jmp *%reg; int3`.


Third (and possibly only for future optimisations), Clang has been observed to
spot conditional tail calls as `Jcc __x86_indirect_thunk_*`.  This is a 6 byte
source size, but needs up to 9 bytes of space for inlining including an `int3`
for straight line speculation reasons (See
https://lore.kernel.org/lkml/20211026120310.359986...@infradead.org/ for full
details).  It might be enough to simply prohibit an optimisation like this when
trying to pad retpolines for inlineability.

[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal

2021-05-19 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578

--- Comment #15 from Andrew Cooper  ---
(In reply to Martin Sebor from comment #1)
> For now, I recommend suppressing the warning either by #pragma GCC
> diagnostic or by making the pointer volatile.

Trying this with the code sample from comment 14 doesn't work.

Not only does the -Werror=array-bounds diagnostic remain, a second is picked up
from:

passing argument 1 of ‘__builtin_memcpy’ discards ‘volatile’ qualifier from
pointer target type [-Werror=discarded-qualifiers]

[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal

2021-05-19 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578

Andrew Cooper  changed:

   What|Removed |Added

 CC||andrew.cooper3 at citrix dot 
com

--- Comment #14 from Andrew Cooper  ---
I too have had what appears to be this bug, raised against a microkernel
project.

The logic is a test case involving interactions at the x86-64 lower canonical
boundary, and reads like so:

...
uint64_t *ptr = (void *)0x7ff8ul;

memcpy(ptr, "\x90\x90\xf\xbxen\x90", 8);
...

This yields:

include/xtf/libc.h:36:37: error: ‘__builtin_memcpy’ offset [0, 7] is out of
the bounds [0, 0] [-Werror=array-bounds]
   36 | #define memcpy(d, s, n) __builtin_memcpy(d, s, n)
  | ^
main.c:81:5: note: in expansion of macro ‘memcpy’
   81 | memcpy(ptr, "\x90\x90\xf\xbxen\x90", 8);
  | ^~

It is worth pointing out that it is common for kernels to have some virtual
addresses derived from compile-time constants, notably the fixmap and
frametable.

[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction

2020-04-29 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654

--- Comment #11 from Andrew Cooper  ---
Thankyou very much for your help

[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction

2020-04-28 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654

--- Comment #7 from Andrew Cooper  ---
Yes - that works.  Confirmed that I have both retpolines and endbr64
instructions, and the resulting hypervisor boots.

(I don't have CET capable hardware, but that is fine.  There is at minimum
there is MSR setup to do, so any other issues can be addressed at that time.)

[Bug target/93654] Inappropriate "-fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction

2020-04-28 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654

--- Comment #5 from Andrew Cooper  ---
Thanks.  Just compiling the patch.

As a note however, https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html will
need adjusting to remove the final note for -mindirect-branch=choice

[Bug target/93654] New: Inappropriate "- -fcf-protection and -mindirect-branch=thunk are incompatible on x86_64" restriction

2020-02-10 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654

Bug ID: 93654
   Summary: Inappropriate "- -fcf-protection and
-mindirect-branch=thunk are incompatible on x86_64"
restriction
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

Bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87412 prohibited the use of
-fcf-protection and -mindirect-branch=thunk in combination.

However, it also breaks kernels which use -mindirect-branch=thunk-extern

When retpoline protections were developed, I specifically requested
thunk-extern to exist for kernels which provide their own, so that it can be
made compatible with CET.

A kernel which provides its own thunks will boot-time modify them to be
appropriate for the system, and may not be a retpoline gadget.  (They may be
lfence; jmp *%reg which is recommended on AMD, or just jmp *%reg with IBRS)

-mindirect-branch=thunk-extern specifically should be permitted with
-fcf-protection, because this *was* the plan to make a single binary capable of
using CET on applicable hardware, yet being safe to Spectre v2 on older
hardware.

[Bug middle-end/92237] New: [x86] Missed optimisation opportunity with bit tests

2019-10-26 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92237

Bug ID: 92237
   Summary: [x86] Missed optimisation opportunity with bit tests
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

See https://godbolt.org/z/mP-8Y7

An expression such as:

bool foo(uint64_t val)
{
return (val & 0x120) == 0x20;
}

gets assembled to:
:
   0:   81 e7 20 01 00 00   and$0x120,%edi
   6:   48 83 ff 20 cmp$0x20,%rdi
   a:   0f 94 c0sete   %al
   d:   c3  retq


Some part of optimisation has noticed that, due to the 32bit constant, the AND
can be performed on %edi, but hasn't spotted that the same is true for the
following CMP.

In this example, the CMP could use %edi as well, and save emitting the REX
prefix into the instruction stream.

[Bug c/91745] New: Documentation for __builtin_speculation_safe_value() doesn't compile

2019-09-11 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91745

Bug ID: 91745
   Summary: Documentation for __builtin_speculation_safe_value()
doesn't compile
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
  Target Milestone: ---

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fspeculation_005fsafe_005fvalue-1

The 3rd code sample in the documentation talks about how to transform the code
when element 0 of the array isn't safe to speculatively access.

https://godbolt.org/z/98LNs8

Unfortunately, it doesn't compile, citing:

[x86-64 gcc 9.2 #1] error: both arguments must be compatible

Replacing NULL with (int *)0 does allow the sample to compile.

I'm not wel- versed enough in the C spec to comment on whether NULL and (int
*)0 are compatible or not, but the intention of the documentation is clearly
that NULL should be acceptable in this circumstance.

Is this a documentation bug, or a diagnostics bug?

[Bug lto/90434] [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary

2019-05-15 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90434

--- Comment #2 from Andrew Cooper  ---
I'm afraid that so far, I haven't found a way to reduce the test case.  I'm
still trying.

[Bug lto/90434] New: [regression from 8.x] Incorrect code generation for __builtin_strcmp with LTO and freestanding binary

2019-05-11 Thread andrew.cooper3 at citrix dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90434

Bug ID: 90434
   Summary: [regression from 8.x] Incorrect code generation for
__builtin_strcmp with LTO and freestanding binary
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrew.cooper3 at citrix dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Hello,

Apologies for this being a complicated bug report, but I think I may need some
help tracking things down further.  Nevertheless, I am moderately confident it
is a regression in GCC 9.x, because 8.x works correctly.


I have two versions of GCC built today from the tip of stable branches.

$ PATH=/local/bin/gcc-8.x/bin:$PATH gcc --version
gcc (GCC) 8.3.1 20190511
...

$ PATH=/local/bin/gcc-9.x/bin:$PATH gcc --version
gcc (GCC) 9.1.1 20190511

The 8.x version works correctly, while the 9.x version results in incorrect
code generation, which leads to a crash when executed.  Both versions work fine
when LTO is not in use.


The environment is http://xenbits.xen.org/docs/xtf/ which is a set of
microkerenls.  As such, they are freestanding builds and linked to a single
final executable, but are standard x86 32bit and 64bit ELF binaries.

Because -ffreestanding disables a whole swathe of builtin optimisations, the
headers locally reinsert the optimisations in the following manner:

int strcmp(const char *s1, const char *s2);
#define strcmp(s1, s2) __builtin_strcmp(s1, s2)

An implementation of strcmp() is also provided for when a real call is emitted
by the builtin:

int (strcmp)(const char *_s1, const char *_s2)
{
unsigned char s1, s2;

do {
s1 = *_s1++;
s2 = *_s2++;
} while ( s1 && s1 == s2 );

return (s1 < s2) ? -1 : (s1 > s2);
}


The problematic test in question calls strcmp() from its main function.  With
8.x, the builtin is expanded inline, and (having no callers) the strcmp()
function itself is omitted from the final binary.

With 9.x, the builtin is not expanded inline, and a call to strcmp() is
emitted.  strcmp() itself crashes when it first dereferences the second
parameter.


First of all, I found that if I replace the above strcmp() implementation with
this alternative (which I borrowed from the Linux Kernel):

int (strcmp)(const char *_s1, const char *_s2)
{
unsigned char s1, s2;

while ( 1 )
{
s1 = *_s1++;
s2 = *_s2++;

if ( s1 != s2 )
return s1 < s2 ? -1 : 1;
if ( !s1 )
break;
}
return 0;
}

then 9.x *does* expand the builtin inline, and the strcmp() function itself is
omitted from the final binary.  This binary also executes correctly.


Therefore, something is clearly inspecting the content of my locally provided
strcmp() function and deciding that it can't expand the builtin inline. 
However, I'm not sure what it is checking for, because AFAICT, the ABI of those
two variations is identical.


That on its own would only cause an efficiently problem, not a correctness
problem.

When inspecting the disassembly of the bad binary, I see that strcmp() has its
parameters passed in registers rather than on the stack.  AFAICT, the value of
the _s1 parameter is always correct, while the value of _s2 seems to be less
than 256 (possibly a single char?).  The crash occurs because the deference of
_s2 hits the zero page.

As a result, it is clear that the strcmp() function has been compiled with a
non-standard ABI.  AFAICT, the calling code in main() does conform to the
expected ABI, by passing the parameters on the stack.

Therefore, I wonder whether the code actually emitted for strcmp() was supposed
to be a specialised version (strcmp.isra.$foo perhaps, or something else),
which actually got called with the original ABI.


Anyway - that is as far as I've got debugging.  Can anyone suggest further
steps to help narrow this down?