Re: violating function pointer signature

2020-11-19 Thread Nick Desaulniers
On Thu, Nov 19, 2020 at 9:05 AM Alexei Starovoitov
 wrote:
>
> On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt  wrote:
> > Linux obviously
> > supports multiple architectures (more than any other OS), but it is pretty
> > stuck to gcc as a compiler (with LLVM just starting to work too).
> >
> > We are fine with being stuck to a compiler if it gives us what we want.
>
> I beg to disagree.
> android, chrome and others changed their kernel builds to
> "make LLVM=1" some time ago.
> It's absolutely vital for the health of the kernel to be built with
> both gcc and llvm.

Our fleet of machines in the data centers is currently mid-ramp, at
around or slightly just over 50% of kernels built with Clang.  Soon to
be 100%.  So "a good chunk of Google services," too, FWIW.

OpenMandriva is on track for their 4.2 release to use LLVM for their kernels.
-- 
Thanks,
~Nick Desaulniers


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 05:42:34PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 19 November 2020 16:35
> > I just meant "valid C language code as defined by the standards".  Many
> > people want all UB to just go away, while that is *impossible* to do for
> > many compilers: for example where different architectures or different
> > ABIs have contradictory requirements.
> 
> Some of the UB in the C language are (probably) there because
> certain (now obscure) hardware behaved that way.

Yes.

> For instance integer arithmetic may saturate on overflow
> (or do even stranger things if the sign is a separate bit).

And some still does!

> I'm not quite sure it was ever possible to write a C compiler
> for a cpu that processed numbers in ASCII (up to 10 digits),
> binary arithmetic was almost impossible.

A machine that really stores decimal numbers?  Not BCD or the like?
Yeah wow, that will be hard.

> There are also the CPU that only have 'word' addressing - so
> that 'pointers to characters' take extra instructions.

Such machines are still made, and are programmed in C as well.

> ISTM that a few years ago the gcc developers started looking
> at some of these 'UB' and decided they could make use of
> them to make some code faster (and break other code).

When UB would happen in some situation, the compiler can simply assume
that situation does not happen.  This makes it possible to do a lot of
optimisations (many to do with loops) that cannot be done otherwise
(including those to do with signed overflow).  And many of those
optimisations are worthwhile.

> One of the problems with UB is that whereas you might expect
> UB arithmetic to generate an unexpected result and/or signal
> it is completely open-ended and could fire an ICBM at the coder.

Yes, UB is undefined behaviour.  Unspecified is something else (and C
has that as well, also implementation-defined, etc.)

In some cases GCC (and any other modern compiler) can make UB be IB
instead, with some flag for example, like -fno-strict-* does.  In other
cases it isn't so easy at all.  In cases like you have here (where the
validity of what you want to do depends on the ABI in effect) things are
not easy :-/


Segher


RE: violating function pointer signature

2020-11-19 Thread David Laight
From: Segher Boessenkool
> Sent: 19 November 2020 16:35
...
> I just meant "valid C language code as defined by the standards".  Many
> people want all UB to just go away, while that is *impossible* to do for
> many compilers: for example where different architectures or different
> ABIs have contradictory requirements.

Some of the UB in the C language are (probably) there because
certain (now obscure) hardware behaved that way.
For instance integer arithmetic may saturate on overflow
(or do even stranger things if the sign is a separate bit).
I'm not quite sure it was ever possible to write a C compiler
for a cpu that processed numbers in ASCII (up to 10 digits),
binary arithmetic was almost impossible.
There are also the CPU that only have 'word' addressing - so
that 'pointers to characters' take extra instructions.

ISTM that a few years ago the gcc developers started looking
at some of these 'UB' and decided they could make use of
them to make some code faster (and break other code).

One of the problems with UB is that whereas you might expect
UB arithmetic to generate an unexpected result and/or signal
it is completely open-ended and could fire an ICBM at the coder.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: violating function pointer signature

2020-11-19 Thread Steven Rostedt
On Thu, 19 Nov 2020 09:04:57 -0800
Alexei Starovoitov  wrote:

> On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt  wrote:
> > Linux obviously
> > supports multiple architectures (more than any other OS), but it is pretty
> > stuck to gcc as a compiler (with LLVM just starting to work too).
> >
> > We are fine with being stuck to a compiler if it gives us what we want.  
> 
> I beg to disagree.

I think you misunderstood.

> android, chrome and others changed their kernel builds to
> "make LLVM=1" some time ago.
> It's absolutely vital for the health of the kernel to be built with
> both gcc and llvm.

That's what I meant with "LLVM just starting to work too".

LLVM has been working hard to make sure that it can do the same tricks that
the kernel depends on gcc for. And LLVM appears to be working fine with the
nop stub logic (it's already in 5.10-rc with with the static callers).

We can easily create a boot up test (config option) that will test it, and
if a compiler breaks it, this test would trigger a failure.

Again, both static calls and tracepoint callbacks are limited in what they
can do. Both return void, and both do are not variadic functions. Although
passing in a struct as a parameter is possible, we could add testing to
detect this, as that's rather slow to begin with.

-- Steve


Re: violating function pointer signature

2020-11-19 Thread Alexei Starovoitov
On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt  wrote:
> Linux obviously
> supports multiple architectures (more than any other OS), but it is pretty
> stuck to gcc as a compiler (with LLVM just starting to work too).
>
> We are fine with being stuck to a compiler if it gives us what we want.

I beg to disagree.
android, chrome and others changed their kernel builds to
"make LLVM=1" some time ago.
It's absolutely vital for the health of the kernel to be built with
both gcc and llvm.


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 09:59:51AM -0500, Steven Rostedt wrote:
> On Thu, 19 Nov 2020 08:37:35 -0600
> Segher Boessenkool  wrote:
> > > Note that we have a fairly extensive tradition of defining away UB with
> > > language extentions, -fno-strict-overflow, -fno-strict-aliasing,  
> > 
> > These are options to make a large swath of not correct C programs
> > compile (and often work) anyway.  This is useful because there are so
> > many such programs, because a) people did not lint; and/or b) the
> > problem never was obvious with some other (or older) compiler; and/or
> > c) people do not care about writing portable C and prefer writing in
> > their own non-C dialect.
> 
> Note, this is not about your average C program. This is about the Linux
> kernel, which already does a lot of tricks in C.

Yes, I know.  And some of that can be supported just fine (usually
because the compiler explicitly supports it); some of that will not
cause problems in practice (e.g. the scope it could cause problems in
is very limited); and some of that is just waiting to break down.  The
question is what category your situation here is in.  The middle one I
think.

> There's a lot of code in
> assembly that gets called from C (and vise versa).

That is just fine, that is what ABIs are for :-)

> We modify code on the
> fly (which tracepoints use two methods of that - with asm-goto/jump-labels
> and static functions).

And that is a lot trickier.  It can be made to work, but there are many
dark nooks and crannies problems can hide in.

> As for your point c), I'm not sure what you mean about portable C

I just meant "valid C language code as defined by the standards".  Many
people want all UB to just go away, while that is *impossible* to do for
many compilers: for example where different architectures or different
ABIs have contradictory requirements.

> (stuck to
> a single compiler, or stuck to a single architecture?). Linux obviously
> supports multiple architectures (more than any other OS), but it is pretty
> stuck to gcc as a compiler (with LLVM just starting to work too).
> 
> We are fine with being stuck to a compiler if it gives us what we want.

Right.  Just know that a compiler can make defined behaviour for *some*
things that are UB in standard C (possibly at a runtime performance
cost), but most things are not feasible.

Alexei's SPARC example (thanks!) shows that even "obvious" things are
not so obviously (or at all) implemented the way you expect on all
systems you care about.


Segher


Re: violating function pointer signature

2020-11-19 Thread Steven Rostedt
On Thu, 19 Nov 2020 08:37:35 -0600
Segher Boessenkool  wrote:

> > Note that we have a fairly extensive tradition of defining away UB with
> > language extentions, -fno-strict-overflow, -fno-strict-aliasing,  
> 
> These are options to make a large swath of not correct C programs
> compile (and often work) anyway.  This is useful because there are so
> many such programs, because a) people did not lint; and/or b) the
> problem never was obvious with some other (or older) compiler; and/or
> c) people do not care about writing portable C and prefer writing in
> their own non-C dialect.


Note, this is not about your average C program. This is about the Linux
kernel, which already does a lot of tricks in C. There's a lot of code in
assembly that gets called from C (and vise versa). We modify code on the
fly (which tracepoints use two methods of that - with asm-goto/jump-labels
and static functions).

As for your point c), I'm not sure what you mean about portable C (stuck to
a single compiler, or stuck to a single architecture?). Linux obviously
supports multiple architectures (more than any other OS), but it is pretty
stuck to gcc as a compiler (with LLVM just starting to work too).

We are fine with being stuck to a compiler if it gives us what we want.

-- Steve


Re: violating function pointer signature

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 09:36:48AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> Ah, see, here I think we disagree. UB is a flaw of the spec, but the
> real world often has very sane behaviour there (sometimes also very
> much not).

That attitude summons ducks.

> In this particular instance the behaviour is UB because the C spec
> doesn't want to pin down the calling convention, which is something I
> can understand.

How do you know?  Were you at the meetings where this was decided?

The most frequent reason something is made UB is when there are multiple
existing implementations with irreconcilable differences.

> But once you combine the C spec with the ABI(s) at hand,
> there really isn't two ways about it. This has to work, under the
> premise that the ABI defines a caller cleanup calling convention.

This is not clear at all (and what "caller cleanup calling convention"
would mean isn't either).  A function call at the C level does not
necessarily correspond at all with a function call at the ABI level, to
begin with.

> So in the view that the compiler is a glorified assembler,

But it isn't.

> Note that we have a fairly extensive tradition of defining away UB with
> language extentions, -fno-strict-overflow, -fno-strict-aliasing,

These are options to make a large swath of not correct C programs
compile (and often work) anyway.  This is useful because there are so
many such programs, because a) people did not lint; and/or b) the
problem never was obvious with some other (or older) compiler; and/or
c) people do not care about writing portable C and prefer writing in
their own non-C dialect.

> -fno-delete-null-pointer-checks etc..

This was added as a security hardening feature.  It of course also is
useful for other things -- most flags are.  It was not added to make yet
another dialect.


Segher


Re: violating function pointer signature

2020-11-19 Thread Peter Zijlstra
On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> Calling this via a different declared function type is undefined
> behaviour, but that is independent of how the function is *defined*.
> Your program can make ducks appear from your nose even if that function
> is never called, if you do that.  Just don't do UB, not even once!

Ah, see, here I think we disagree. UB is a flaw of the spec, but the
real world often has very sane behaviour there (sometimes also very
much not).

In this particular instance the behaviour is UB because the C spec
doesn't want to pin down the calling convention, which is something I
can understand. But once you combine the C spec with the ABI(s) at hand,
there really isn't two ways about it. This has to work, under the
premise that the ABI defines a caller cleanup calling convention.

So in the view that the compiler is a glorified assembler, I'll take UB
every day if it means I can get the thing to do what I want it to.

Obviously in the interest of co-operation and longer term viability, it
would be nice if we can agree on the behaviour and get a language
extention covering it.

Note that we have a fairly extensive tradition of defining away UB with
language extentions, -fno-strict-overflow, -fno-strict-aliasing,
-fno-delete-null-pointer-checks etc..




Re: violating function pointer signature

2020-11-19 Thread Peter Zijlstra
On Wed, Nov 18, 2020 at 01:48:37PM -0600, Segher Boessenkool wrote:

> If you have at most four or so args, what you wnat to do will work on
> all systems the kernel currently supports, as far as I can tell.  It
> is not valid C, and none of the compilers have an extension for this
> either.  But it will likely work.

So this is where we rely on the calling convention being caller-cleanup
(cdecl has that).

I looked at the GCC preprocessor symbols but couldn't find anything that
seems relevant to the calling convention in use, so barring that, the
best option might to be have a boot-time self-test that triggers this.
Then we'll quickly know if all architectures handle this correctly.


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 13:48:37 -0600
Segher Boessenkool  wrote:

> > With it_func being the func from the struct tracepoint_func, which is a
> > void pointer, it is typecast to the function that is defined by the
> > tracepoint. args is defined as the arguments that match the proto.  
> 
> If you have at most four or so args, what you wnat to do will work on
> all systems the kernel currently supports, as far as I can tell.  It
> is not valid C, and none of the compilers have an extension for this
> either.  But it will likely work.

Well, unfortunately, there's tracepoints with many more than 4 arguments. I
think there's one with up to 13!

> 
> > The problem we are solving is on the removal case, if the memory is tight,
> > it is possible that the new array can not be allocated. But we must still
> > remove the called function. The idea in this case is to replace the
> > function saved with a stub. The above loop will call the stub and not the
> > removed function until another update happens.
> > 
> > This thread is about how safe is it to call:
> > 
> > void tp_stub_func(void) { return ; }
> > 
> > instead of the function that was removed?  
> 
> Exactly as safe as calling a stub defined in asm.  The undefined
> behaviour happens if your program has such a call, it doesn't matter
> how the called function is defined, it doesn't have to be C.
> 
> > Thus, we are indeed calling that stub function from a call site that is not
> > using the same parameters.
> > 
> > The question is, will this break?  
> 
> It is unlikely to break if you use just a few arguments, all of simple
> scalar types.  Just hope you will never encounter a crazy ABI :-)

But in most cases, all the arguments are of scaler types, as anything else
is not recommended, because copying is always slower than just passing a
pointer, especially since it would need to be copied for every instance of
that loop. I could do an audit to see if there's any that exist, and perhaps
even add some static checker to make sure they don't.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 11:46:02 -0800
Alexei Starovoitov  wrote:

> On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt  wrote:
> >
> > Thus, all functions will be non-variadic in these cases.  
> 
> That's not the only case where it will blow up.
> Try this on sparc:
> struct foo {
> int a;
> };
> 
> struct foo foo_struct(void) {
> struct foo f = {};
> return f;
> }
> int foo_int(void) {
> return 0;
> }
> or this link:
> https://godbolt.org/z/EdM47b
> 
> Notice:
> jmp %i7+12
> The function that returns small struct will jump to a different
> instruction in the caller.
> 
> I think none of the tracepoints return structs and void foo(void) is
> fine on x86.
> Just pointing out that it's more than just variadic.

I also said that this is limited to only functions that have void return.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 02:33:43PM -0500, Steven Rostedt wrote:
> On Wed, 18 Nov 2020 13:11:27 -0600
> Segher Boessenkool  wrote:
> 
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
> > Your program can make ducks appear from your nose even if that function
> > is never called, if you do that.  Just don't do UB, not even once!
> 
> But that's the whole point of this conversation. We are going to call this
> from functions that are going to have some random set of parameters.



> And you see the above, the macro does:
> 
>   ((void(*)(void *, proto))(it_func))(__data, args);

Yup.

> With it_func being the func from the struct tracepoint_func, which is a
> void pointer, it is typecast to the function that is defined by the
> tracepoint. args is defined as the arguments that match the proto.

If you have at most four or so args, what you wnat to do will work on
all systems the kernel currently supports, as far as I can tell.  It
is not valid C, and none of the compilers have an extension for this
either.  But it will likely work.

> The problem we are solving is on the removal case, if the memory is tight,
> it is possible that the new array can not be allocated. But we must still
> remove the called function. The idea in this case is to replace the
> function saved with a stub. The above loop will call the stub and not the
> removed function until another update happens.
> 
> This thread is about how safe is it to call:
> 
> void tp_stub_func(void) { return ; }
> 
> instead of the function that was removed?

Exactly as safe as calling a stub defined in asm.  The undefined
behaviour happens if your program has such a call, it doesn't matter
how the called function is defined, it doesn't have to be C.

> Thus, we are indeed calling that stub function from a call site that is not
> using the same parameters.
> 
> The question is, will this break?

It is unlikely to break if you use just a few arguments, all of simple
scalar types.  Just hope you will never encounter a crazy ABI :-)


Segher


Re: violating function pointer signature

2020-11-18 Thread Alexei Starovoitov
On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt  wrote:
>
> Thus, all functions will be non-variadic in these cases.

That's not the only case where it will blow up.
Try this on sparc:
struct foo {
int a;
};

struct foo foo_struct(void) {
struct foo f = {};
return f;
}
int foo_int(void) {
return 0;
}
or this link:
https://godbolt.org/z/EdM47b

Notice:
jmp %i7+12
The function that returns small struct will jump to a different
instruction in the caller.

I think none of the tracepoints return structs and void foo(void) is
fine on x86.
Just pointing out that it's more than just variadic.


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 13:11:27 -0600
Segher Boessenkool  wrote:

> Calling this via a different declared function type is undefined
> behaviour, but that is independent of how the function is *defined*.
> Your program can make ducks appear from your nose even if that function
> is never called, if you do that.  Just don't do UB, not even once!

But that's the whole point of this conversation. We are going to call this
from functions that are going to have some random set of parameters.

But there is a limit to that. All the callers will expect a void return,
and none of the callers will have a variable number of parameters.

The code in question is tracepoints and static calls. For this
conversation, I'll stick with tracepoints (even though static calls are
used too, but including that in the conversation is confusing).

Let me define what is happening:

We have a macro that creates a defined tracepoint with a defined set of
parameters. But each tracepoint can have a different set of parameters. All
of them will have "void *" as the first parameter, but what comes after
that is unique to each tracepoint (defined by a macro). None of them will
be a variadic function call.

The macro looks like this:

int __traceiter_##_name(void *__data, proto)\
{   \
struct tracepoint_func *it_func_ptr;\
void *it_func;  \
\
it_func_ptr =   \
rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
do {\
it_func = (it_func_ptr)->func;  \
__data = (it_func_ptr)->data;   \
((void(*)(void *, proto))(it_func))(__data, args); \
} while ((++it_func_ptr)->func);\
return 0;   \
}


There's an array of struct tracepoint_func pointers, which has the
definition of:

struct tracepoint_func {
void *func;
void *data;
int prio;
};


And you see the above, the macro does:

((void(*)(void *, proto))(it_func))(__data, args);

With it_func being the func from the struct tracepoint_func, which is a
void pointer, it is typecast to the function that is defined by the
tracepoint. args is defined as the arguments that match the proto.

The way the array is updated, is to use an RCU method, which is to create a
new array, copy the changes to the new array, then switch the "->funcs"
over to the new copy, and after a RCU grace period is finished, we can free
the old array.

The problem we are solving is on the removal case, if the memory is tight,
it is possible that the new array can not be allocated. But we must still
remove the called function. The idea in this case is to replace the
function saved with a stub. The above loop will call the stub and not the
removed function until another update happens.

This thread is about how safe is it to call:

void tp_stub_func(void) { return ; }

instead of the function that was removed?

Thus, we are indeed calling that stub function from a call site that is not
using the same parameters.

The question is, will this break?

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 01:58:23PM -0500, Steven Rostedt wrote:
> I wonder if we should define on all architectures a void void_stub(void),
> in assembly, that just does a return, an not worry about gcc messing up the
> creation of the stub function.
> 
> On x86_64:
> 
> GLOBAL(void_stub)
>   retq
> 
> 
> And so on.

I don't see how you imagine a compiler to mess this up?

void void_stub(void) { }

will do fine?

.globl  void_stub
.type   void_stub, @function
void_stub:
.LFB0:
.cfi_startproc
ret
.cfi_endproc
.LFE0:
.size   void_stub, .-void_stub


Calling this via a different declared function type is undefined
behaviour, but that is independent of how the function is *defined*.
Your program can make ducks appear from your nose even if that function
is never called, if you do that.  Just don't do UB, not even once!


Segher


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 07:31:50PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> >> I could change the stub from (void) to () if that would be better.
> >
> > Don't?  In a function definition they mean exactly the same thing (and
> > the kernel uses (void) everywhere else, which many people find clearer).
> 
> And I think () functions expected a caller-provided parameter save
> area on powerpc64le, while (void) functions do not.

Like I said (but you cut off, didn't realise it matters I guess):

> > In a function declaration that is not part of a definition it means no
> > information about the arguments is specified, a quite different thing.

Since the caller does not know if the callee will need a save area, it
has to assume it does.  Similar is true for many ABIs.

> It does not
> matter for an empty function, but GCC prefers to use the parameter
> save area instead of setting up a stack frame if it is present.  So
> you get stack corruption if you call a () function as a (void)
> function.  (The other way round is fine.)

If you have no prototype for a function, you have to assume worst case,
yes.  Calling things "a () function" can mean two things (a declaration
that is or isn't a definition, two very different things), so it helps
to be explicit about it.

Just use (void) and do not worry :-)


Segher


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 13:58:23 -0500
Steven Rostedt  wrote:

>  an not worry about gcc 

or LLVM, or whatever is used to build the kernel.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 19:31:50 +0100
Florian Weimer  wrote:

> * Segher Boessenkool:
> 
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:  
> >> I could change the stub from (void) to () if that would be better.  
> >
> > Don't?  In a function definition they mean exactly the same thing (and
> > the kernel uses (void) everywhere else, which many people find clearer).  
> 
> And I think () functions expected a caller-provided parameter save
> area on powerpc64le, while (void) functions do not.  It does not
> matter for an empty function, but GCC prefers to use the parameter
> save area instead of setting up a stack frame if it is present.  So
> you get stack corruption if you call a () function as a (void)
> function.  (The other way round is fine.)

I wonder if we should define on all architectures a void void_stub(void),
in assembly, that just does a return, an not worry about gcc messing up the
creation of the stub function.

On x86_64:

GLOBAL(void_stub)
retq


And so on.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Segher Boessenkool:

> On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
>> I could change the stub from (void) to () if that would be better.
>
> Don't?  In a function definition they mean exactly the same thing (and
> the kernel uses (void) everywhere else, which many people find clearer).

And I think () functions expected a caller-provided parameter save
area on powerpc64le, while (void) functions do not.  It does not
matter for an empty function, but GCC prefers to use the parameter
save area instead of setting up a stack frame if it is present.  So
you get stack corruption if you call a () function as a (void)
function.  (The other way round is fine.)


Re: violating function pointer signature

2020-11-18 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> I could change the stub from (void) to () if that would be better.

Don't?  In a function definition they mean exactly the same thing (and
the kernel uses (void) everywhere else, which many people find clearer).

In a function declaration that is not part of a definition it means no
information about the arguments is specified, a quite different thing.

This is an obsolescent feature, too.  Many many years from now it could
perhaps mean the same as (void), just like in C++, but not yet.


Segher


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 08:50:37 -0800
Nick Desaulniers  wrote:

> On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra  wrote:
> >
> > On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:
> >  
> > > > > Since all tracepoints callbacks have at least one parameter (__data), 
> > > > > we
> > > > > could declare tp_stub_func as:
> > > > >
> > > > > static void tp_stub_func(void *data, ...)
> > > > > {
> > > > >   return;
> > > > > }
> > > > >
> > > > > And now C knows that tp_stub_func() can be called with one or more
> > > > > parameters, and had better be able to deal with it!  
> > > >
> > > > AFAIU this won't work.
> > > >
> > > > C99 6.5.2.2 Function calls
> > > >
> > > > "If the function is defined with a type that is not compatible with the 
> > > > type (of the
> > > > expression) pointed to by the expression that denotes the called 
> > > > function, the behavior is
> > > > undefined."  
> > >
> > > But is it really a problem in practice. I'm sure we could create an 
> > > objtool
> > > function to check to make sure we don't break anything at build time.  
> >
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }  
> 
> or `void tp_nop_func()` if you plan to call it with different
> parameter types that are all unused in the body.  If you do plan to
> use them, maybe a pointer to a tagged union would be safer?

This stub function will never use the parameters passed to it.

You can see the patch I have for the tracepoint issue here:

 https://lore.kernel.org/r/20201118093405.7a6d2...@gandalf.local.home

I could change the stub from (void) to () if that would be better.

> 
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().
> >
> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.  
> 
> If you have instructions on how to exercise the code in question, we
> can help test it with CFI.  Better to find any potential issues before
> they get committed.

If you apply the patch to the Linux kernel, and then apply:

  https://lore.kernel.org/r/20201116181638.6b0de...@gandalf.local.home

Which will force the failed case (to use the stubs). And build and boot the
kernel with those patches applied, you can test it with:


 # mount -t tracefs nodev /sys/kernel/tracing
 # cd /sys/kernel/tracing
 # echo 1 > events/sched/sched_switch/enable
 # mkdir instances/foo
 # echo 1 > instances/foo/events/sched/sched_switch/enable
 # echo 0 > events/sched/sched_switch/enable

Which add two callbacks to the function array for the sched_switch
tracepoint. The remove the first one, which would add the stub instead.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Nick Desaulniers
On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra  wrote:
>
> On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:
>
> > > > Since all tracepoints callbacks have at least one parameter (__data), we
> > > > could declare tp_stub_func as:
> > > >
> > > > static void tp_stub_func(void *data, ...)
> > > > {
> > > >   return;
> > > > }
> > > >
> > > > And now C knows that tp_stub_func() can be called with one or more
> > > > parameters, and had better be able to deal with it!
> > >
> > > AFAIU this won't work.
> > >
> > > C99 6.5.2.2 Function calls
> > >
> > > "If the function is defined with a type that is not compatible with the 
> > > type (of the
> > > expression) pointed to by the expression that denotes the called 
> > > function, the behavior is
> > > undefined."
> >
> > But is it really a problem in practice. I'm sure we could create an objtool
> > function to check to make sure we don't break anything at build time.
>
> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
>
> That is:
>
>   void tp_nop_func(void) { }

or `void tp_nop_func()` if you plan to call it with different
parameter types that are all unused in the body.  If you do plan to
use them, maybe a pointer to a tagged union would be safer?

>
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().
>
> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

If you have instructions on how to exercise the code in question, we
can help test it with CFI.  Better to find any potential issues before
they get committed.
-- 
Thanks,
~Nick Desaulniers


RE: violating function pointer signature

2020-11-18 Thread David Laight
From: Mathieu Desnoyers
> Sent: 18 November 2020 16:01
...
> > If it is already done elsewhere in the kernel, then I will call this
> > precedence, and keep the original version.
> 
> It works for me. Bonus points if you can document in a comment that this
> trick depends on the cdecl calling convention.

It has nothing to do with 'cdecl' - which IIRC is a microsoft term.

Historically C just pushed arguments on the stack (no prototypes)
The calling code knew nothing about the called code or whether a
function might expect to have a variable number of arguments.
To stop this going horribly wrong the stack is tidied up by the caller.

PASCAL (which doesn't really support linking!) didn't support
variable argument lists and would get the called code to remove
the arguments (which is why x86 has a 'ret n' instruction).
In principle this generates smaller/faster code and many of the
32bit windows functions use it - probably due to turbo-pascal).

Modern calling conventions tend to pass some arguments in registers.
All the ones that get used (by default) on linux will get the
caller to tidy the stack.
Although some may use a simpler calling convention for varargs functions.

So a common 'return constant' function can be called from any call site.
But it you actually call a real function (that looks at the arguments)
you better have a matching prototype.
(eg cast the function pointer back to the correct one before the call.)

There are calling conventions where pointer and integer parameters
and results are passed in different registers.
The usual definition of ioctl() is typically broken.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: violating function pointer signature

2020-11-18 Thread Mathieu Desnoyers
- On Nov 18, 2020, at 9:02 AM, rostedt rost...@goodmis.org wrote:

> On Wed, 18 Nov 2020 14:21:36 +0100
> Peter Zijlstra  wrote:
> 
>> I think that as long as the function is completely empty (it never
>> touches any of the arguments) this should work in practise.
>> 
>> That is:
>> 
>>   void tp_nop_func(void) { }
> 
> My original version (the OP of this thread) had this:
> 
> +static void tp_stub_func(void)
> +{
> + return;
> +}
> 
>> 
>> can be used as an argument to any function pointer that has a void
>> return. In fact, I already do that, grep for __static_call_nop().
>> 
>> I'm not sure what the LLVM-CFI crud makes of it, but that's their
>> problem.
> 
> If it is already done elsewhere in the kernel, then I will call this
> precedence, and keep the original version.

It works for me. Bonus points if you can document in a comment that this
trick depends on the cdecl calling convention.

Thanks,

Mathieu

> 
> This way Alexei can't complain about adding a check in the fast path of
> more than one callback attached.
> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 14:59:29 +0100
Florian Weimer  wrote:

> * Peter Zijlstra:
> 
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().  
> 
> You can pass it as a function parameter, but in general, you cannot
> call the function with a different prototype.  Even trivial
> differences such as variadic vs non-variadic prototypes matter.

In this case, I don't believe we need to worry about that, for either
tracepoints or static calls. As both don't have any variadic functions.

The function prototypes are defined by macros. For tracepoints, it's
TP_PROTO() and they require matching arguments. And to top it off, the
functions defined, are added to an array of indirect functions and called
separately. It would take a bit of work to even allow tracepoint callbacks
to be variadic functions. The same is true for static calls I believe.

Thus, all functions will be non-variadic in these cases.

> 
> The default Linux calling conventions are all of the cdecl family,
> where the caller pops the argument off the stack.  You didn't quote
> enough to context to tell whether other calling conventions matter in
> your case.
> 
> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.  
> 
> LTO can cause problems as well, particularly with whole-program
> optimization.

Again, for tracepoints and static calls that will likely not be an issue.
Because tracepoint callbacks are function parameters. So are static calls.
What happens is, when you update these locations, you pass in a function
you want as a callback, and it's added to an array (and this code is used
for all tracepoints with all different kinds of prototypes, as the function
is simply a void pointer). Then at the call sites, the function pointers are
typecast to the type of the callback function needed, and called.

It basically can not be optimized even when looking at the entire kernel.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Peter Zijlstra:

>> The default Linux calling conventions are all of the cdecl family,
>> where the caller pops the argument off the stack.  You didn't quote
>> enough to context to tell whether other calling conventions matter in
>> your case.
>
> This is strictly in-kernel, and I think we're all cdecl, of which the
> important part is caller-cleanup. The function compiles to:
>
>   RET
>
> so whatever the arguments are is irrelevant.

Yes, then the stub is ABI-compatible, as far as I know.


Re: violating function pointer signature

2020-11-18 Thread Peter Zijlstra
On Wed, Nov 18, 2020 at 02:59:29PM +0100, Florian Weimer wrote:
> * Peter Zijlstra:
> 
> > I think that as long as the function is completely empty (it never
> > touches any of the arguments) this should work in practise.
> >
> > That is:
> >
> >   void tp_nop_func(void) { }
> >
> > can be used as an argument to any function pointer that has a void
> > return. In fact, I already do that, grep for __static_call_nop().
> 
> You can pass it as a function parameter, but in general, you cannot
> call the function with a different prototype.  Even trivial
> differences such as variadic vs non-variadic prototypes matter.

I don't think any tracepoint uses variadic argument.

> The default Linux calling conventions are all of the cdecl family,
> where the caller pops the argument off the stack.  You didn't quote
> enough to context to tell whether other calling conventions matter in
> your case.

This is strictly in-kernel, and I think we're all cdecl, of which the
important part is caller-cleanup. The function compiles to:

RET

so whatever the arguments are is irrelevant.

> > I'm not sure what the LLVM-CFI crud makes of it, but that's their
> > problem.
> 
> LTO can cause problems as well, particularly with whole-program
> optimization.

I don't think LTO can de-virtualize a dynamic array of function
pointers, so there's very little risk. That said, the __static_call_nop
case, where everything is inlined, is compiled sub-optimally for both
LLVM and GCC.


Re: violating function pointer signature

2020-11-18 Thread Steven Rostedt
On Wed, 18 Nov 2020 14:21:36 +0100
Peter Zijlstra  wrote:

> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
> 
> That is:
> 
>   void tp_nop_func(void) { }

My original version (the OP of this thread) had this:

+static void tp_stub_func(void)
+{
+   return;
+}

> 
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().
> 
> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

If it is already done elsewhere in the kernel, then I will call this
precedence, and keep the original version.

This way Alexei can't complain about adding a check in the fast path of
more than one callback attached.

-- Steve


Re: violating function pointer signature

2020-11-18 Thread Florian Weimer
* Peter Zijlstra:

> I think that as long as the function is completely empty (it never
> touches any of the arguments) this should work in practise.
>
> That is:
>
>   void tp_nop_func(void) { }
>
> can be used as an argument to any function pointer that has a void
> return. In fact, I already do that, grep for __static_call_nop().

You can pass it as a function parameter, but in general, you cannot
call the function with a different prototype.  Even trivial
differences such as variadic vs non-variadic prototypes matter.

The default Linux calling conventions are all of the cdecl family,
where the caller pops the argument off the stack.  You didn't quote
enough to context to tell whether other calling conventions matter in
your case.

> I'm not sure what the LLVM-CFI crud makes of it, but that's their
> problem.

LTO can cause problems as well, particularly with whole-program
optimization.


violating function pointer signature

2020-11-18 Thread Peter Zijlstra
On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote:

> > > Since all tracepoints callbacks have at least one parameter (__data), we
> > > could declare tp_stub_func as:
> > > 
> > > static void tp_stub_func(void *data, ...)
> > > {
> > >   return;
> > > }
> > > 
> > > And now C knows that tp_stub_func() can be called with one or more
> > > parameters, and had better be able to deal with it!  
> > 
> > AFAIU this won't work.
> > 
> > C99 6.5.2.2 Function calls
> > 
> > "If the function is defined with a type that is not compatible with the 
> > type (of the
> > expression) pointed to by the expression that denotes the called function, 
> > the behavior is
> > undefined."
> 
> But is it really a problem in practice. I'm sure we could create an objtool
> function to check to make sure we don't break anything at build time.

I think that as long as the function is completely empty (it never
touches any of the arguments) this should work in practise.

That is:

  void tp_nop_func(void) { }

can be used as an argument to any function pointer that has a void
return. In fact, I already do that, grep for __static_call_nop().

I'm not sure what the LLVM-CFI crud makes of it, but that's their
problem.