Re: Passing the complex args in the GPR's

2023-06-07 Thread Michael Matz
Hey,

On Tue, 6 Jun 2023, Umesh Kalappa via Gcc wrote:

> Question is : Why does GCC choose to use GPR's here and have any
> reference to support this decision  ?

You explicitely used -m32 ppc, so 
https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf
 
applies.  It explicitely states in "B.1 ATR-Linux Inclusion and 
Conformance" that it is "ATR-PASS-COMPLEX-IN-GPRS", and other sections 
detail what that means (namely passing complex args in r3 .. r10, whatever 
fits).  GCC adheres to that, and has to.

The history how that came to be was explained in the thread.


Ciao,
Michael.

 > 
> Thank you
> ~Umesh
> 
> 
> 
> On Tue, Jun 6, 2023 at 10:16 PM Segher Boessenkool
>  wrote:
> >
> > Hi!
> >
> > On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> > > Hi Adnrew,
> > > Thank you for the quick response and for PPC64 too ,we do have
> > > mismatches in ABI b/w complex operations like
> > > https://godbolt.org/z/bjsYovx4c .
> > >
> > > Any reason why GCC chose to use GPR 's here ?
> >
> > What did you expect, what happened instead?  Why did you expect that,
> > and why then is it an error what did happen?
> >
> > You used -O0.  As long as the code works, all is fine.  But unoptimised
> > code frequently is hard to read, please use -O2 instead?
> >
> > As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
> > hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
> > PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
> > (LE version).
> >
> >
> > Segher
> 

Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Michael Matz
Hello,

On Wed, 14 Sep 2022, Peter Zijlstra wrote:

> > Maybe this is semantics, but I wouldn't characterize objtool's existence
> > as being based on the mistrust of tools.  It's main motivation is to
> > fill in the toolchain's blind spots in asm and inline-asm, which exist
> > by design.
> 
> That and a fairly deep seated loathing for the regular CFI annotations
> and DWARF in general. Linus was fairly firm he didn't want anything to
> do with DWARF for in-kernel unwinding.

I was referring only to the check-stuff functionality of objtool, not to 
its other parts.  Altough, of course, "deep seated loathing" is a special 
form of mistrust as well ;-)

> That left us in a spot that we needed unwind information in a 'better'
> format than DWARF.
> 
> Objtool was born out of those contraints. ORC not needing the CFI
> annotations and ORC being *much* faster at unwiding and generation
> (debug builds are slow) were all good.

Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
doesn't imply the former.  Out of interest: how does ORC get around the 
need for CFI annotations (or equivalents to restore registers) and what 
makes it fast?  I want faster unwinding for DWARF as well, when there's 
feature parity :-)  Maybe something can be learned for integration into 
dwarf-unwind.


Ciao,
Michael.


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-14 Thread Michael Matz
Hello,

On Wed, 14 Sep 2022, Josh Poimboeuf wrote:

> > >This information is needed because the
> > >code after the call to such a function is optimized out as
> > >unreachable and objtool has no way of knowing that.
> > 
> > Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> > wherever control flow would otherwise go into limbo.
> 
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

There are two reasons (which will turn out to be the same) for a trap (say 
'UD2' on x86-64) directly after a call insn:
1) "the call shall not have returned"
2) something else jumps to that trap because it was __builtin_unreachable 
   (or equivalent), and the compiler happened to put that ud2 directly 
   after the call.  It could have done that only when the call itself was 
   noreturn:
 cmp $foo, %rax
 jne do_trap
 call noret
do_trap:
 ud2

So, it's all the same.  If there's an ud2 (or whatever the trap maker is) 
after a call then it was because it's noreturn.

(But, of course this costs (little) code size, unlike the non-alloc 
checker sections)


Ciao,
Michael.


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-12 Thread Michael Matz
Hey,

On Mon, 12 Sep 2022, Borislav Petkov wrote:

> Micha, any opinions on the below are appreciated.
> 
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:

> > difficult to ensure correctness.  Also, due to kernel live patching, the
> > kernel relies on 100% correctness of unwinding metadata, whereas the
> > toolchain treats it as a best effort.

Unwinding certainly is not best effort.  It's 100% reliable as far as the 
source language or compilation options require.  But as it doesn't 
touch the discussed features I won't belabor that point.

I will mention that objtool's existence is based on mistrust, of persons 
(not correctly annotating stuff) and of tools (not correctly heeding those 
annotations).  The mistrust in persons is understandable and can be dealt 
with by tools, but the mistrust in tools can't be fixed by making tools 
more complicated by emitting even more information; there's no good reason 
to assume that one piece of info can be trusted more than other pieces.  
So, if you mistrust the tools you have already lost.  That's somewhat 
philosophical, so I won't beat that horse much more either.

Now, recovering the CFG.  I'll switch order of your two items:

2) noreturn function

> >   .pushsection .annotate.noreturn
> > .quad func1
> > .quad func2
> > .quad func3
> >   .popsection

This won't work for indirect calls to noreturn functions:

  void (* __attribute__((noreturn)) noretptr)(void);
  int callnoret (int i)
  {
noretptr();
return i + 32;
  }

The return statement is unreachable (and removed by GCC).  To know that 
you would have to mark the call statements, not the individual functions.  
All schemes that mark functions that somehow indicates a meaningful 
difference in the calling sequence (e.g. the ABI of functions) have the 
same problem: it's part of the call expressions type, not of individual 
decls.

Second problem: it's not extensible.  Today it's noreturn functions you 
want to know, and tomorrow?  So, add a flag word per entry, define bit 0 
for now to be NORETURN, and see what comes.  Add a header with a version 
(and/or identifier) as well and it's properly extensible.  For easy 
linking and identifying the blobs in the linked result include a length in 
the header.  If this were in an allocated section it would be a good idea 
to refer to the symbols in a PC-relative manner, so as to not result in 
runtime relocations.  In this case, as it's within a non-alloc section 
that doesn't matter.  So:

.section .annotate.functions
.long 1   # version
.long 0xcafe  # ident
.long 2f-1f   # length
1:
.quad func1, 1   # noreturn
.quad func2, 1   # noreturn
.quad func3, 32  # something_else_but_not_noreturn
...
2:
.long 1b-2b   # align and "checksum"

It might be that the length-and-header scheme is cumbersome if you need to 
write those section commands by hand, in which case another scheme might 
be preferrable, but it should somehow be self-delimiting.

For the above problem of indirect calls to noreturns, instead do:

  .text
  noretcalllabel:
call noreturn
  othercall:
call really_special_thing
  .section .annotate.noretcalls
  .quad noretcalllabel, 1  # noreturn call
  .quad othercall, 32  # call to some special(-ABI?) function

Same thoughts re extensibility and self-delimitation apply.

1) jump tables

> > Create an .annotate.jump_table section which is an array of the
> > following variable-length structure:
> > 
> >   struct annotate_jump_table {
> > void *indirect_jmp;
> > long num_targets;
> > void *targets[];
> >   };

It's very often the case that the compiler already emits what your 
.targets[] member would encode, just at some unknown place, length and 
encoding.  So you would save space if you instead only remember the 
encoding and places of those jump tables:

struct {
  void *indirect_jump;
  long num_tables;
  struct {
unsigned num_entries;
unsigned encoding;
void *start_of_table;
  } tables[];
};

The usual encodings are: direct, PC-relative, relative-to-start-of-table.  
Usually for a specific jump instruction there's only one table, so 
optimizing for that makes sense.  For strange unthought-of cases it's 
probably a good idea to have your initial scheme as fallback, which could 
be indicated by a special .encoding value.

> > For example, given the following switch statement code:
> > 
> >   .Lswitch_jmp:
> > // %rax is .Lcase_1 or .Lcase_2
> > jmp %rax

So, usually %rax would point into a table (somewhere in .rodata/.text) 
that looks like so:

.Ljump_table:
 .quad .Lcase_1 - .Ljump_table
 .quad .Lcase_2 - .Ljump_table

(for position-independend code)

and hence you would emit this as annotation:

.quad .Lswitch_jmp
.quad 1   # only a single table
.long 2   # with two entries
.long RELATIVE_TO_START   # all entries are X - start_of_table
.quad .Ljump_table

In this case you won't save anything of course, but as soon as there's a