Re: violating function pointer signature
On Thu, Nov 19, 2020 at 9:05 AM Alexei Starovoitov wrote: > > On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt wrote: > > Linux obviously > > supports multiple architectures (more than any other OS), but it is pretty > > stuck to gcc as a compiler (with LLVM just starting to work too). > > > > We are fine with being stuck to a compiler if it gives us what we want. > > I beg to disagree. > android, chrome and others changed their kernel builds to > "make LLVM=1" some time ago. > It's absolutely vital for the health of the kernel to be built with > both gcc and llvm. Our fleet of machines in the data centers is currently mid-ramp, at around or slightly just over 50% of kernels built with Clang. Soon to be 100%. So "a good chunk of Google services," too, FWIW. OpenMandriva is on track for their 4.2 release to use LLVM for their kernels. -- Thanks, ~Nick Desaulniers
Re: violating function pointer signature
On Thu, Nov 19, 2020 at 05:42:34PM +, David Laight wrote: > From: Segher Boessenkool > > Sent: 19 November 2020 16:35 > > I just meant "valid C language code as defined by the standards". Many > > people want all UB to just go away, while that is *impossible* to do for > > many compilers: for example where different architectures or different > > ABIs have contradictory requirements. > > Some of the UB in the C language are (probably) there because > certain (now obscure) hardware behaved that way. Yes. > For instance integer arithmetic may saturate on overflow > (or do even stranger things if the sign is a separate bit). And some still does! > I'm not quite sure it was ever possible to write a C compiler > for a cpu that processed numbers in ASCII (up to 10 digits), > binary arithmetic was almost impossible. A machine that really stores decimal numbers? Not BCD or the like? Yeah wow, that will be hard. > There are also the CPU that only have 'word' addressing - so > that 'pointers to characters' take extra instructions. Such machines are still made, and are programmed in C as well. > ISTM that a few years ago the gcc developers started looking > at some of these 'UB' and decided they could make use of > them to make some code faster (and break other code). When UB would happen in some situation, the compiler can simply assume that situation does not happen. This makes it possible to do a lot of optimisations (many to do with loops) that cannot be done otherwise (including those to do with signed overflow). And many of those optimisations are worthwhile. > One of the problems with UB is that whereas you might expect > UB arithmetic to generate an unexpected result and/or signal > it is completely open-ended and could fire an ICBM at the coder. Yes, UB is undefined behaviour. Unspecified is something else (and C has that as well, also implementation-defined, etc.) In some cases GCC (and any other modern compiler) can make UB be IB instead, with some flag for example, like -fno-strict-* does. In other cases it isn't so easy at all. In cases like you have here (where the validity of what you want to do depends on the ABI in effect) things are not easy :-/ Segher
RE: violating function pointer signature
From: Segher Boessenkool > Sent: 19 November 2020 16:35 ... > I just meant "valid C language code as defined by the standards". Many > people want all UB to just go away, while that is *impossible* to do for > many compilers: for example where different architectures or different > ABIs have contradictory requirements. Some of the UB in the C language are (probably) there because certain (now obscure) hardware behaved that way. For instance integer arithmetic may saturate on overflow (or do even stranger things if the sign is a separate bit). I'm not quite sure it was ever possible to write a C compiler for a cpu that processed numbers in ASCII (up to 10 digits), binary arithmetic was almost impossible. There are also the CPU that only have 'word' addressing - so that 'pointers to characters' take extra instructions. ISTM that a few years ago the gcc developers started looking at some of these 'UB' and decided they could make use of them to make some code faster (and break other code). One of the problems with UB is that whereas you might expect UB arithmetic to generate an unexpected result and/or signal it is completely open-ended and could fire an ICBM at the coder. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: violating function pointer signature
On Thu, 19 Nov 2020 09:04:57 -0800 Alexei Starovoitov wrote: > On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt wrote: > > Linux obviously > > supports multiple architectures (more than any other OS), but it is pretty > > stuck to gcc as a compiler (with LLVM just starting to work too). > > > > We are fine with being stuck to a compiler if it gives us what we want. > > I beg to disagree. I think you misunderstood. > android, chrome and others changed their kernel builds to > "make LLVM=1" some time ago. > It's absolutely vital for the health of the kernel to be built with > both gcc and llvm. That's what I meant with "LLVM just starting to work too". LLVM has been working hard to make sure that it can do the same tricks that the kernel depends on gcc for. And LLVM appears to be working fine with the nop stub logic (it's already in 5.10-rc with with the static callers). We can easily create a boot up test (config option) that will test it, and if a compiler breaks it, this test would trigger a failure. Again, both static calls and tracepoint callbacks are limited in what they can do. Both return void, and both do are not variadic functions. Although passing in a struct as a parameter is possible, we could add testing to detect this, as that's rather slow to begin with. -- Steve
Re: violating function pointer signature
On Thu, Nov 19, 2020 at 6:59 AM Steven Rostedt wrote: > Linux obviously > supports multiple architectures (more than any other OS), but it is pretty > stuck to gcc as a compiler (with LLVM just starting to work too). > > We are fine with being stuck to a compiler if it gives us what we want. I beg to disagree. android, chrome and others changed their kernel builds to "make LLVM=1" some time ago. It's absolutely vital for the health of the kernel to be built with both gcc and llvm.
Re: violating function pointer signature
On Thu, Nov 19, 2020 at 09:59:51AM -0500, Steven Rostedt wrote: > On Thu, 19 Nov 2020 08:37:35 -0600 > Segher Boessenkool wrote: > > > Note that we have a fairly extensive tradition of defining away UB with > > > language extentions, -fno-strict-overflow, -fno-strict-aliasing, > > > > These are options to make a large swath of not correct C programs > > compile (and often work) anyway. This is useful because there are so > > many such programs, because a) people did not lint; and/or b) the > > problem never was obvious with some other (or older) compiler; and/or > > c) people do not care about writing portable C and prefer writing in > > their own non-C dialect. > > Note, this is not about your average C program. This is about the Linux > kernel, which already does a lot of tricks in C. Yes, I know. And some of that can be supported just fine (usually because the compiler explicitly supports it); some of that will not cause problems in practice (e.g. the scope it could cause problems in is very limited); and some of that is just waiting to break down. The question is what category your situation here is in. The middle one I think. > There's a lot of code in > assembly that gets called from C (and vise versa). That is just fine, that is what ABIs are for :-) > We modify code on the > fly (which tracepoints use two methods of that - with asm-goto/jump-labels > and static functions). And that is a lot trickier. It can be made to work, but there are many dark nooks and crannies problems can hide in. > As for your point c), I'm not sure what you mean about portable C I just meant "valid C language code as defined by the standards". Many people want all UB to just go away, while that is *impossible* to do for many compilers: for example where different architectures or different ABIs have contradictory requirements. > (stuck to > a single compiler, or stuck to a single architecture?). Linux obviously > supports multiple architectures (more than any other OS), but it is pretty > stuck to gcc as a compiler (with LLVM just starting to work too). > > We are fine with being stuck to a compiler if it gives us what we want. Right. Just know that a compiler can make defined behaviour for *some* things that are UB in standard C (possibly at a runtime performance cost), but most things are not feasible. Alexei's SPARC example (thanks!) shows that even "obvious" things are not so obviously (or at all) implemented the way you expect on all systems you care about. Segher
Re: violating function pointer signature
On Thu, 19 Nov 2020 08:37:35 -0600 Segher Boessenkool wrote: > > Note that we have a fairly extensive tradition of defining away UB with > > language extentions, -fno-strict-overflow, -fno-strict-aliasing, > > These are options to make a large swath of not correct C programs > compile (and often work) anyway. This is useful because there are so > many such programs, because a) people did not lint; and/or b) the > problem never was obvious with some other (or older) compiler; and/or > c) people do not care about writing portable C and prefer writing in > their own non-C dialect. Note, this is not about your average C program. This is about the Linux kernel, which already does a lot of tricks in C. There's a lot of code in assembly that gets called from C (and vise versa). We modify code on the fly (which tracepoints use two methods of that - with asm-goto/jump-labels and static functions). As for your point c), I'm not sure what you mean about portable C (stuck to a single compiler, or stuck to a single architecture?). Linux obviously supports multiple architectures (more than any other OS), but it is pretty stuck to gcc as a compiler (with LLVM just starting to work too). We are fine with being stuck to a compiler if it gives us what we want. -- Steve
Re: violating function pointer signature
On Thu, Nov 19, 2020 at 09:36:48AM +0100, Peter Zijlstra wrote: > On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote: > > Calling this via a different declared function type is undefined > > behaviour, but that is independent of how the function is *defined*. > > Your program can make ducks appear from your nose even if that function > > is never called, if you do that. Just don't do UB, not even once! > > Ah, see, here I think we disagree. UB is a flaw of the spec, but the > real world often has very sane behaviour there (sometimes also very > much not). That attitude summons ducks. > In this particular instance the behaviour is UB because the C spec > doesn't want to pin down the calling convention, which is something I > can understand. How do you know? Were you at the meetings where this was decided? The most frequent reason something is made UB is when there are multiple existing implementations with irreconcilable differences. > But once you combine the C spec with the ABI(s) at hand, > there really isn't two ways about it. This has to work, under the > premise that the ABI defines a caller cleanup calling convention. This is not clear at all (and what "caller cleanup calling convention" would mean isn't either). A function call at the C level does not necessarily correspond at all with a function call at the ABI level, to begin with. > So in the view that the compiler is a glorified assembler, But it isn't. > Note that we have a fairly extensive tradition of defining away UB with > language extentions, -fno-strict-overflow, -fno-strict-aliasing, These are options to make a large swath of not correct C programs compile (and often work) anyway. This is useful because there are so many such programs, because a) people did not lint; and/or b) the problem never was obvious with some other (or older) compiler; and/or c) people do not care about writing portable C and prefer writing in their own non-C dialect. > -fno-delete-null-pointer-checks etc.. This was added as a security hardening feature. It of course also is useful for other things -- most flags are. It was not added to make yet another dialect. Segher
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote: > Calling this via a different declared function type is undefined > behaviour, but that is independent of how the function is *defined*. > Your program can make ducks appear from your nose even if that function > is never called, if you do that. Just don't do UB, not even once! Ah, see, here I think we disagree. UB is a flaw of the spec, but the real world often has very sane behaviour there (sometimes also very much not). In this particular instance the behaviour is UB because the C spec doesn't want to pin down the calling convention, which is something I can understand. But once you combine the C spec with the ABI(s) at hand, there really isn't two ways about it. This has to work, under the premise that the ABI defines a caller cleanup calling convention. So in the view that the compiler is a glorified assembler, I'll take UB every day if it means I can get the thing to do what I want it to. Obviously in the interest of co-operation and longer term viability, it would be nice if we can agree on the behaviour and get a language extention covering it. Note that we have a fairly extensive tradition of defining away UB with language extentions, -fno-strict-overflow, -fno-strict-aliasing, -fno-delete-null-pointer-checks etc..
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 01:48:37PM -0600, Segher Boessenkool wrote: > If you have at most four or so args, what you wnat to do will work on > all systems the kernel currently supports, as far as I can tell. It > is not valid C, and none of the compilers have an extension for this > either. But it will likely work. So this is where we rely on the calling convention being caller-cleanup (cdecl has that). I looked at the GCC preprocessor symbols but couldn't find anything that seems relevant to the calling convention in use, so barring that, the best option might to be have a boot-time self-test that triggers this. Then we'll quickly know if all architectures handle this correctly.
Re: violating function pointer signature
On Wed, 18 Nov 2020 13:48:37 -0600 Segher Boessenkool wrote: > > With it_func being the func from the struct tracepoint_func, which is a > > void pointer, it is typecast to the function that is defined by the > > tracepoint. args is defined as the arguments that match the proto. > > If you have at most four or so args, what you wnat to do will work on > all systems the kernel currently supports, as far as I can tell. It > is not valid C, and none of the compilers have an extension for this > either. But it will likely work. Well, unfortunately, there's tracepoints with many more than 4 arguments. I think there's one with up to 13! > > > The problem we are solving is on the removal case, if the memory is tight, > > it is possible that the new array can not be allocated. But we must still > > remove the called function. The idea in this case is to replace the > > function saved with a stub. The above loop will call the stub and not the > > removed function until another update happens. > > > > This thread is about how safe is it to call: > > > > void tp_stub_func(void) { return ; } > > > > instead of the function that was removed? > > Exactly as safe as calling a stub defined in asm. The undefined > behaviour happens if your program has such a call, it doesn't matter > how the called function is defined, it doesn't have to be C. > > > Thus, we are indeed calling that stub function from a call site that is not > > using the same parameters. > > > > The question is, will this break? > > It is unlikely to break if you use just a few arguments, all of simple > scalar types. Just hope you will never encounter a crazy ABI :-) But in most cases, all the arguments are of scaler types, as anything else is not recommended, because copying is always slower than just passing a pointer, especially since it would need to be copied for every instance of that loop. I could do an audit to see if there's any that exist, and perhaps even add some static checker to make sure they don't. -- Steve
Re: violating function pointer signature
On Wed, 18 Nov 2020 11:46:02 -0800 Alexei Starovoitov wrote: > On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt wrote: > > > > Thus, all functions will be non-variadic in these cases. > > That's not the only case where it will blow up. > Try this on sparc: > struct foo { > int a; > }; > > struct foo foo_struct(void) { > struct foo f = {}; > return f; > } > int foo_int(void) { > return 0; > } > or this link: > https://godbolt.org/z/EdM47b > > Notice: > jmp %i7+12 > The function that returns small struct will jump to a different > instruction in the caller. > > I think none of the tracepoints return structs and void foo(void) is > fine on x86. > Just pointing out that it's more than just variadic. I also said that this is limited to only functions that have void return. -- Steve
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 02:33:43PM -0500, Steven Rostedt wrote: > On Wed, 18 Nov 2020 13:11:27 -0600 > Segher Boessenkool wrote: > > > Calling this via a different declared function type is undefined > > behaviour, but that is independent of how the function is *defined*. > > Your program can make ducks appear from your nose even if that function > > is never called, if you do that. Just don't do UB, not even once! > > But that's the whole point of this conversation. We are going to call this > from functions that are going to have some random set of parameters. > And you see the above, the macro does: > > ((void(*)(void *, proto))(it_func))(__data, args); Yup. > With it_func being the func from the struct tracepoint_func, which is a > void pointer, it is typecast to the function that is defined by the > tracepoint. args is defined as the arguments that match the proto. If you have at most four or so args, what you wnat to do will work on all systems the kernel currently supports, as far as I can tell. It is not valid C, and none of the compilers have an extension for this either. But it will likely work. > The problem we are solving is on the removal case, if the memory is tight, > it is possible that the new array can not be allocated. But we must still > remove the called function. The idea in this case is to replace the > function saved with a stub. The above loop will call the stub and not the > removed function until another update happens. > > This thread is about how safe is it to call: > > void tp_stub_func(void) { return ; } > > instead of the function that was removed? Exactly as safe as calling a stub defined in asm. The undefined behaviour happens if your program has such a call, it doesn't matter how the called function is defined, it doesn't have to be C. > Thus, we are indeed calling that stub function from a call site that is not > using the same parameters. > > The question is, will this break? It is unlikely to break if you use just a few arguments, all of simple scalar types. Just hope you will never encounter a crazy ABI :-) Segher
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 6:22 AM Steven Rostedt wrote: > > Thus, all functions will be non-variadic in these cases. That's not the only case where it will blow up. Try this on sparc: struct foo { int a; }; struct foo foo_struct(void) { struct foo f = {}; return f; } int foo_int(void) { return 0; } or this link: https://godbolt.org/z/EdM47b Notice: jmp %i7+12 The function that returns small struct will jump to a different instruction in the caller. I think none of the tracepoints return structs and void foo(void) is fine on x86. Just pointing out that it's more than just variadic.
Re: violating function pointer signature
On Wed, 18 Nov 2020 13:11:27 -0600 Segher Boessenkool wrote: > Calling this via a different declared function type is undefined > behaviour, but that is independent of how the function is *defined*. > Your program can make ducks appear from your nose even if that function > is never called, if you do that. Just don't do UB, not even once! But that's the whole point of this conversation. We are going to call this from functions that are going to have some random set of parameters. But there is a limit to that. All the callers will expect a void return, and none of the callers will have a variable number of parameters. The code in question is tracepoints and static calls. For this conversation, I'll stick with tracepoints (even though static calls are used too, but including that in the conversation is confusing). Let me define what is happening: We have a macro that creates a defined tracepoint with a defined set of parameters. But each tracepoint can have a different set of parameters. All of them will have "void *" as the first parameter, but what comes after that is unique to each tracepoint (defined by a macro). None of them will be a variadic function call. The macro looks like this: int __traceiter_##_name(void *__data, proto)\ { \ struct tracepoint_func *it_func_ptr;\ void *it_func; \ \ it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ do {\ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ ((void(*)(void *, proto))(it_func))(__data, args); \ } while ((++it_func_ptr)->func);\ return 0; \ } There's an array of struct tracepoint_func pointers, which has the definition of: struct tracepoint_func { void *func; void *data; int prio; }; And you see the above, the macro does: ((void(*)(void *, proto))(it_func))(__data, args); With it_func being the func from the struct tracepoint_func, which is a void pointer, it is typecast to the function that is defined by the tracepoint. args is defined as the arguments that match the proto. The way the array is updated, is to use an RCU method, which is to create a new array, copy the changes to the new array, then switch the "->funcs" over to the new copy, and after a RCU grace period is finished, we can free the old array. The problem we are solving is on the removal case, if the memory is tight, it is possible that the new array can not be allocated. But we must still remove the called function. The idea in this case is to replace the function saved with a stub. The above loop will call the stub and not the removed function until another update happens. This thread is about how safe is it to call: void tp_stub_func(void) { return ; } instead of the function that was removed? Thus, we are indeed calling that stub function from a call site that is not using the same parameters. The question is, will this break? -- Steve
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 01:58:23PM -0500, Steven Rostedt wrote: > I wonder if we should define on all architectures a void void_stub(void), > in assembly, that just does a return, an not worry about gcc messing up the > creation of the stub function. > > On x86_64: > > GLOBAL(void_stub) > retq > > > And so on. I don't see how you imagine a compiler to mess this up? void void_stub(void) { } will do fine? .globl void_stub .type void_stub, @function void_stub: .LFB0: .cfi_startproc ret .cfi_endproc .LFE0: .size void_stub, .-void_stub Calling this via a different declared function type is undefined behaviour, but that is independent of how the function is *defined*. Your program can make ducks appear from your nose even if that function is never called, if you do that. Just don't do UB, not even once! Segher
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 07:31:50PM +0100, Florian Weimer wrote: > * Segher Boessenkool: > > > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote: > >> I could change the stub from (void) to () if that would be better. > > > > Don't? In a function definition they mean exactly the same thing (and > > the kernel uses (void) everywhere else, which many people find clearer). > > And I think () functions expected a caller-provided parameter save > area on powerpc64le, while (void) functions do not. Like I said (but you cut off, didn't realise it matters I guess): > > In a function declaration that is not part of a definition it means no > > information about the arguments is specified, a quite different thing. Since the caller does not know if the callee will need a save area, it has to assume it does. Similar is true for many ABIs. > It does not > matter for an empty function, but GCC prefers to use the parameter > save area instead of setting up a stack frame if it is present. So > you get stack corruption if you call a () function as a (void) > function. (The other way round is fine.) If you have no prototype for a function, you have to assume worst case, yes. Calling things "a () function" can mean two things (a declaration that is or isn't a definition, two very different things), so it helps to be explicit about it. Just use (void) and do not worry :-) Segher
Re: violating function pointer signature
On Wed, 18 Nov 2020 13:58:23 -0500 Steven Rostedt wrote: > an not worry about gcc or LLVM, or whatever is used to build the kernel. -- Steve
Re: violating function pointer signature
On Wed, 18 Nov 2020 19:31:50 +0100 Florian Weimer wrote: > * Segher Boessenkool: > > > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote: > >> I could change the stub from (void) to () if that would be better. > > > > Don't? In a function definition they mean exactly the same thing (and > > the kernel uses (void) everywhere else, which many people find clearer). > > And I think () functions expected a caller-provided parameter save > area on powerpc64le, while (void) functions do not. It does not > matter for an empty function, but GCC prefers to use the parameter > save area instead of setting up a stack frame if it is present. So > you get stack corruption if you call a () function as a (void) > function. (The other way round is fine.) I wonder if we should define on all architectures a void void_stub(void), in assembly, that just does a return, an not worry about gcc messing up the creation of the stub function. On x86_64: GLOBAL(void_stub) retq And so on. -- Steve
Re: violating function pointer signature
* Segher Boessenkool: > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote: >> I could change the stub from (void) to () if that would be better. > > Don't? In a function definition they mean exactly the same thing (and > the kernel uses (void) everywhere else, which many people find clearer). And I think () functions expected a caller-provided parameter save area on powerpc64le, while (void) functions do not. It does not matter for an empty function, but GCC prefers to use the parameter save area instead of setting up a stack frame if it is present. So you get stack corruption if you call a () function as a (void) function. (The other way round is fine.)
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote: > I could change the stub from (void) to () if that would be better. Don't? In a function definition they mean exactly the same thing (and the kernel uses (void) everywhere else, which many people find clearer). In a function declaration that is not part of a definition it means no information about the arguments is specified, a quite different thing. This is an obsolescent feature, too. Many many years from now it could perhaps mean the same as (void), just like in C++, but not yet. Segher
Re: violating function pointer signature
On Wed, 18 Nov 2020 08:50:37 -0800 Nick Desaulniers wrote: > On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra wrote: > > > > On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote: > > > > > > > Since all tracepoints callbacks have at least one parameter (__data), > > > > > we > > > > > could declare tp_stub_func as: > > > > > > > > > > static void tp_stub_func(void *data, ...) > > > > > { > > > > > return; > > > > > } > > > > > > > > > > And now C knows that tp_stub_func() can be called with one or more > > > > > parameters, and had better be able to deal with it! > > > > > > > > AFAIU this won't work. > > > > > > > > C99 6.5.2.2 Function calls > > > > > > > > "If the function is defined with a type that is not compatible with the > > > > type (of the > > > > expression) pointed to by the expression that denotes the called > > > > function, the behavior is > > > > undefined." > > > > > > But is it really a problem in practice. I'm sure we could create an > > > objtool > > > function to check to make sure we don't break anything at build time. > > > > I think that as long as the function is completely empty (it never > > touches any of the arguments) this should work in practise. > > > > That is: > > > > void tp_nop_func(void) { } > > or `void tp_nop_func()` if you plan to call it with different > parameter types that are all unused in the body. If you do plan to > use them, maybe a pointer to a tagged union would be safer? This stub function will never use the parameters passed to it. You can see the patch I have for the tracepoint issue here: https://lore.kernel.org/r/20201118093405.7a6d2...@gandalf.local.home I could change the stub from (void) to () if that would be better. > > > > > can be used as an argument to any function pointer that has a void > > return. In fact, I already do that, grep for __static_call_nop(). > > > > I'm not sure what the LLVM-CFI crud makes of it, but that's their > > problem. > > If you have instructions on how to exercise the code in question, we > can help test it with CFI. Better to find any potential issues before > they get committed. If you apply the patch to the Linux kernel, and then apply: https://lore.kernel.org/r/20201116181638.6b0de...@gandalf.local.home Which will force the failed case (to use the stubs). And build and boot the kernel with those patches applied, you can test it with: # mount -t tracefs nodev /sys/kernel/tracing # cd /sys/kernel/tracing # echo 1 > events/sched/sched_switch/enable # mkdir instances/foo # echo 1 > instances/foo/events/sched/sched_switch/enable # echo 0 > events/sched/sched_switch/enable Which add two callbacks to the function array for the sched_switch tracepoint. The remove the first one, which would add the stub instead. -- Steve
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 5:23 AM Peter Zijlstra wrote: > > On Tue, Nov 17, 2020 at 03:34:51PM -0500, Steven Rostedt wrote: > > > > > Since all tracepoints callbacks have at least one parameter (__data), we > > > > could declare tp_stub_func as: > > > > > > > > static void tp_stub_func(void *data, ...) > > > > { > > > > return; > > > > } > > > > > > > > And now C knows that tp_stub_func() can be called with one or more > > > > parameters, and had better be able to deal with it! > > > > > > AFAIU this won't work. > > > > > > C99 6.5.2.2 Function calls > > > > > > "If the function is defined with a type that is not compatible with the > > > type (of the > > > expression) pointed to by the expression that denotes the called > > > function, the behavior is > > > undefined." > > > > But is it really a problem in practice. I'm sure we could create an objtool > > function to check to make sure we don't break anything at build time. > > I think that as long as the function is completely empty (it never > touches any of the arguments) this should work in practise. > > That is: > > void tp_nop_func(void) { } or `void tp_nop_func()` if you plan to call it with different parameter types that are all unused in the body. If you do plan to use them, maybe a pointer to a tagged union would be safer? > > can be used as an argument to any function pointer that has a void > return. In fact, I already do that, grep for __static_call_nop(). > > I'm not sure what the LLVM-CFI crud makes of it, but that's their > problem. If you have instructions on how to exercise the code in question, we can help test it with CFI. Better to find any potential issues before they get committed. -- Thanks, ~Nick Desaulniers
RE: violating function pointer signature
From: Mathieu Desnoyers > Sent: 18 November 2020 16:01 ... > > If it is already done elsewhere in the kernel, then I will call this > > precedence, and keep the original version. > > It works for me. Bonus points if you can document in a comment that this > trick depends on the cdecl calling convention. It has nothing to do with 'cdecl' - which IIRC is a microsoft term. Historically C just pushed arguments on the stack (no prototypes) The calling code knew nothing about the called code or whether a function might expect to have a variable number of arguments. To stop this going horribly wrong the stack is tidied up by the caller. PASCAL (which doesn't really support linking!) didn't support variable argument lists and would get the called code to remove the arguments (which is why x86 has a 'ret n' instruction). In principle this generates smaller/faster code and many of the 32bit windows functions use it - probably due to turbo-pascal). Modern calling conventions tend to pass some arguments in registers. All the ones that get used (by default) on linux will get the caller to tidy the stack. Although some may use a simpler calling convention for varargs functions. So a common 'return constant' function can be called from any call site. But it you actually call a real function (that looks at the arguments) you better have a matching prototype. (eg cast the function pointer back to the correct one before the call.) There are calling conventions where pointer and integer parameters and results are passed in different registers. The usual definition of ioctl() is typically broken. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: violating function pointer signature
- On Nov 18, 2020, at 9:02 AM, rostedt rost...@goodmis.org wrote: > On Wed, 18 Nov 2020 14:21:36 +0100 > Peter Zijlstra wrote: > >> I think that as long as the function is completely empty (it never >> touches any of the arguments) this should work in practise. >> >> That is: >> >> void tp_nop_func(void) { } > > My original version (the OP of this thread) had this: > > +static void tp_stub_func(void) > +{ > + return; > +} > >> >> can be used as an argument to any function pointer that has a void >> return. In fact, I already do that, grep for __static_call_nop(). >> >> I'm not sure what the LLVM-CFI crud makes of it, but that's their >> problem. > > If it is already done elsewhere in the kernel, then I will call this > precedence, and keep the original version. It works for me. Bonus points if you can document in a comment that this trick depends on the cdecl calling convention. Thanks, Mathieu > > This way Alexei can't complain about adding a check in the fast path of > more than one callback attached. > > -- Steve -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: violating function pointer signature
On Wed, 18 Nov 2020 14:59:29 +0100 Florian Weimer wrote: > * Peter Zijlstra: > > > I think that as long as the function is completely empty (it never > > touches any of the arguments) this should work in practise. > > > > That is: > > > > void tp_nop_func(void) { } > > > > can be used as an argument to any function pointer that has a void > > return. In fact, I already do that, grep for __static_call_nop(). > > You can pass it as a function parameter, but in general, you cannot > call the function with a different prototype. Even trivial > differences such as variadic vs non-variadic prototypes matter. In this case, I don't believe we need to worry about that, for either tracepoints or static calls. As both don't have any variadic functions. The function prototypes are defined by macros. For tracepoints, it's TP_PROTO() and they require matching arguments. And to top it off, the functions defined, are added to an array of indirect functions and called separately. It would take a bit of work to even allow tracepoint callbacks to be variadic functions. The same is true for static calls I believe. Thus, all functions will be non-variadic in these cases. > > The default Linux calling conventions are all of the cdecl family, > where the caller pops the argument off the stack. You didn't quote > enough to context to tell whether other calling conventions matter in > your case. > > > I'm not sure what the LLVM-CFI crud makes of it, but that's their > > problem. > > LTO can cause problems as well, particularly with whole-program > optimization. Again, for tracepoints and static calls that will likely not be an issue. Because tracepoint callbacks are function parameters. So are static calls. What happens is, when you update these locations, you pass in a function you want as a callback, and it's added to an array (and this code is used for all tracepoints with all different kinds of prototypes, as the function is simply a void pointer). Then at the call sites, the function pointers are typecast to the type of the callback function needed, and called. It basically can not be optimized even when looking at the entire kernel. -- Steve
Re: violating function pointer signature
* Peter Zijlstra: >> The default Linux calling conventions are all of the cdecl family, >> where the caller pops the argument off the stack. You didn't quote >> enough to context to tell whether other calling conventions matter in >> your case. > > This is strictly in-kernel, and I think we're all cdecl, of which the > important part is caller-cleanup. The function compiles to: > > RET > > so whatever the arguments are is irrelevant. Yes, then the stub is ABI-compatible, as far as I know.
Re: violating function pointer signature
On Wed, Nov 18, 2020 at 02:59:29PM +0100, Florian Weimer wrote: > * Peter Zijlstra: > > > I think that as long as the function is completely empty (it never > > touches any of the arguments) this should work in practise. > > > > That is: > > > > void tp_nop_func(void) { } > > > > can be used as an argument to any function pointer that has a void > > return. In fact, I already do that, grep for __static_call_nop(). > > You can pass it as a function parameter, but in general, you cannot > call the function with a different prototype. Even trivial > differences such as variadic vs non-variadic prototypes matter. I don't think any tracepoint uses variadic argument. > The default Linux calling conventions are all of the cdecl family, > where the caller pops the argument off the stack. You didn't quote > enough to context to tell whether other calling conventions matter in > your case. This is strictly in-kernel, and I think we're all cdecl, of which the important part is caller-cleanup. The function compiles to: RET so whatever the arguments are is irrelevant. > > I'm not sure what the LLVM-CFI crud makes of it, but that's their > > problem. > > LTO can cause problems as well, particularly with whole-program > optimization. I don't think LTO can de-virtualize a dynamic array of function pointers, so there's very little risk. That said, the __static_call_nop case, where everything is inlined, is compiled sub-optimally for both LLVM and GCC.
Re: violating function pointer signature
On Wed, 18 Nov 2020 14:21:36 +0100 Peter Zijlstra wrote: > I think that as long as the function is completely empty (it never > touches any of the arguments) this should work in practise. > > That is: > > void tp_nop_func(void) { } My original version (the OP of this thread) had this: +static void tp_stub_func(void) +{ + return; +} > > can be used as an argument to any function pointer that has a void > return. In fact, I already do that, grep for __static_call_nop(). > > I'm not sure what the LLVM-CFI crud makes of it, but that's their > problem. If it is already done elsewhere in the kernel, then I will call this precedence, and keep the original version. This way Alexei can't complain about adding a check in the fast path of more than one callback attached. -- Steve
Re: violating function pointer signature
* Peter Zijlstra: > I think that as long as the function is completely empty (it never > touches any of the arguments) this should work in practise. > > That is: > > void tp_nop_func(void) { } > > can be used as an argument to any function pointer that has a void > return. In fact, I already do that, grep for __static_call_nop(). You can pass it as a function parameter, but in general, you cannot call the function with a different prototype. Even trivial differences such as variadic vs non-variadic prototypes matter. The default Linux calling conventions are all of the cdecl family, where the caller pops the argument off the stack. You didn't quote enough to context to tell whether other calling conventions matter in your case. > I'm not sure what the LLVM-CFI crud makes of it, but that's their > problem. LTO can cause problems as well, particularly with whole-program optimization.