Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread H. Peter Anvin
On 1/14/19 9:01 PM, H. Peter Anvin wrote: > > This could be as simple as spinning for a limited time waiting for > states 0 or 3 if we are not the patching CPU. It is also not necessary > to wait for the mask to become zero for the first sync if we find > ourselves suddenly in state 4. > So

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread H. Peter Anvin
On 1/14/19 7:05 PM, Andy Lutomirski wrote: > On Mon, Jan 14, 2019 at 2:55 PM H. Peter Anvin wrote: >> >> I think this sequence ought to work (keep in mind we are already under a >> mutex, so the global data is safe even if we are preempted): > > I'm trying to wrap my head around this. The

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread Andy Lutomirski
On Mon, Jan 14, 2019 at 2:55 PM H. Peter Anvin wrote: > > I think this sequence ought to work (keep in mind we are already under a > mutex, so the global data is safe even if we are preempted): I'm trying to wrap my head around this. The states are: 0: normal operation 1: writing 0xcc, can be

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread hpa
On January 14, 2019 3:27:55 PM PST, Andy Lutomirski wrote: >On Mon, Jan 14, 2019 at 2:01 PM H. Peter Anvin wrote: >> >> So I was already in the middle of composing this message when Andy >posted: >> >> > I don't even think this is sufficient. I think we also need >everyone >> > who clears the

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread Nadav Amit
> On Jan 14, 2019, at 3:27 PM, Andy Lutomirski wrote: > > On Mon, Jan 14, 2019 at 2:01 PM H. Peter Anvin wrote: >> So I was already in the middle of composing this message when Andy posted: >> >>> I don't even think this is sufficient. I think we also need everyone >>> who clears the bit to

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread Andy Lutomirski
On Mon, Jan 14, 2019 at 2:01 PM H. Peter Anvin wrote: > > So I was already in the middle of composing this message when Andy posted: > > > I don't even think this is sufficient. I think we also need everyone > > who clears the bit to check if all bits are clear and, if so, remove > > the

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread H. Peter Anvin
I think this sequence ought to work (keep in mind we are already under a mutex, so the global data is safe even if we are preempted): set up page table entries invlpg set up bp patching global data cpu = get_cpu() bp_old_value = atomic_read(bp_write_addr)

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread H. Peter Anvin
So I was already in the middle of composing this message when Andy posted: > I don't even think this is sufficient. I think we also need everyone > who clears the bit to check if all bits are clear and, if so, remove > the breakpoint. Otherwise we have a situation where, if you are in >

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread Andy Lutomirski
On Sun, Jan 13, 2019 at 6:41 PM H. Peter Anvin wrote: > > On 1/13/19 6:31 PM, H. Peter Anvin wrote: > > > > static cpumask_t text_poke_cpumask; > > > > static void text_poke_sync(void) > > { > > smp_wmb(); > > text_poke_cpumask = cpu_online_mask; > > smp_wmb(); /* Should be

Re: [PATCH v3 0/6] Static calls

2019-01-14 Thread Peter Zijlstra
On Fri, Jan 11, 2019 at 01:05:20PM -0800, Andy Lutomirski wrote: > On Fri, Jan 11, 2019 at 12:54 PM Linus Torvalds > wrote: > > > > On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: > > > > > > I was referring to the fact that a single static call key update will > > > usually result in

Re: [PATCH v3 0/6] Static calls

2019-01-13 Thread H. Peter Anvin
On 1/13/19 6:31 PM, H. Peter Anvin wrote: > > static cpumask_t text_poke_cpumask; > > static void text_poke_sync(void) > { > smp_wmb(); > text_poke_cpumask = cpu_online_mask; > smp_wmb(); /* Should be optional on x86 */ > cpumask_clear_cpu(_poke_cpumask,

Re: [PATCH v3 0/6] Static calls

2019-01-13 Thread H. Peter Anvin
On 1/11/19 11:39 AM, Jiri Kosina wrote: > On Fri, 11 Jan 2019, h...@zytor.com wrote: > >> I still don't see why can't simply spin in the #BP handler until the >> patch is complete. > > I think this brings us to the already discussed possible deadlock, when > one CPU#0 is in the middle of

Re: [PATCH v3 0/6] Static calls

2019-01-12 Thread hpa
On January 11, 2019 11:34:34 AM PST, Linus Torvalds wrote: >On Fri, Jan 11, 2019 at 11:24 AM wrote: >> >> I still don't see why can't simply spin in the #BP handler until the >patch is complete. > >So here's at least one problem: > >text_poke_bp() > text_poke(addr, , sizeof(int3)); >

Re: [PATCH v3 0/6] Static calls

2019-01-12 Thread hpa
On January 11, 2019 11:34:34 AM PST, Linus Torvalds wrote: >On Fri, Jan 11, 2019 at 11:24 AM wrote: >> >> I still don't see why can't simply spin in the #BP handler until the >patch is complete. > >So here's at least one problem: > >text_poke_bp() > text_poke(addr, , sizeof(int3)); >

Re: [PATCH v3 0/6] Static calls

2019-01-12 Thread Andy Lutomirski
On Fri, Jan 11, 2019 at 1:22 PM Josh Poimboeuf wrote: > > On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: > > On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: > > > > > > I was referring to the fact that a single static call key update will > > > usually result in patching

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 1:55 PM, Steven Rostedt wrote: > > On Fri, 11 Jan 2019 15:41:22 -0600 > Josh Poimboeuf wrote: > >>> I don’t see RCU-sched solves the problem if you don’t disable preemption. On >>> a fully preemptable kernel, you can get preempted between the push and the >>> call (jmp)

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 1:41 PM, Josh Poimboeuf wrote: > > On Fri, Jan 11, 2019 at 09:36:59PM +, Nadav Amit wrote: >>> On Jan 11, 2019, at 1:22 PM, Josh Poimboeuf wrote: >>> >>> On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: On Fri, Jan 11, 2019 at 12:31 PM Josh

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Steven Rostedt
On Fri, 11 Jan 2019 15:41:22 -0600 Josh Poimboeuf wrote: > > I don’t see RCU-sched solves the problem if you don’t disable preemption. On > > a fully preemptable kernel, you can get preempted between the push and the > > call (jmp) or before the push. RCU-sched can then finish, and the preempted

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 09:36:59PM +, Nadav Amit wrote: > > On Jan 11, 2019, at 1:22 PM, Josh Poimboeuf wrote: > > > > On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: > >> On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf > >> wrote: > >>> I was referring to the fact that a

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 1:22 PM, Josh Poimboeuf wrote: > > On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: >> On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: >>> I was referring to the fact that a single static call key update will >>> usually result in patching multiple

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 01:10:47PM -0800, Linus Torvalds wrote: > On Fri, Jan 11, 2019 at 1:05 PM Andy Lutomirski wrote: > > > > > Yeah, my suggestion doesn't allow for batching, since it would > > > basically generate one trampoline for every rewritten instruction. > > > > Sure it does. Just

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 03:22:10PM -0600, Josh Poimboeuf wrote: > b) Adding a gap to the #DB entry stack And that should be #BP of course... Not sure why my fingers keep doing that! -- Josh

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 03:22:10PM -0600, Josh Poimboeuf wrote: > trampoline: > push %rax > call patched-dest That should be a JMP of course. -- Josh

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: > On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: > > > > I was referring to the fact that a single static call key update will > > usually result in patching multiple call sites. But you're right, it's > > only 1-2

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Linus Torvalds
On Fri, Jan 11, 2019 at 1:05 PM Andy Lutomirski wrote: > > > Yeah, my suggestion doesn't allow for batching, since it would > > basically generate one trampoline for every rewritten instruction. > > Sure it does. Just make 1000 trampolines and patch 1000 sites in a > batch :) As long as the

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Andy Lutomirski
On Fri, Jan 11, 2019 at 12:54 PM Linus Torvalds wrote: > > On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: > > > > I was referring to the fact that a single static call key update will > > usually result in patching multiple call sites. But you're right, it's > > only 1-2 trampolines per

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Linus Torvalds
On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf wrote: > > I was referring to the fact that a single static call key update will > usually result in patching multiple call sites. But you're right, it's > only 1-2 trampolines per text_poke_bp() invocation. Though eventually > we may want to

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 12:12:30PM -0800, Linus Torvalds wrote: > On Fri, Jan 11, 2019 at 12:04 PM Josh Poimboeuf wrote: > > > > But really, to me, having to create and manage all those custom > > trampolines still feels a lot more complex than just making a gap on the > > stack. > > There are

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Linus Torvalds
On Fri, Jan 11, 2019 at 12:04 PM Josh Poimboeuf wrote: > > But really, to me, having to create and manage all those custom > trampolines still feels a lot more complex than just making a gap on the > stack. There are no "all those custom trampolines". There is literally *one* custom trampoline

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 11:03:30AM -0800, Linus Torvalds wrote: > The we'd change the end of poke_int3_handler() to do something like > this instead: > > void *newip = bp_int3_handler; > .. > if (new == magic_static_call_bp_int3_handler) { > if (regs->flags

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Jiri Kosina
On Fri, 11 Jan 2019, h...@zytor.com wrote: > I still don't see why can't simply spin in the #BP handler until the > patch is complete. I think this brings us to the already discussed possible deadlock, when one CPU#0 is in the middle of text_poke_bp(), CPU#1 is spinning inside

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Linus Torvalds
On Fri, Jan 11, 2019 at 11:24 AM wrote: > > I still don't see why can't simply spin in the #BP handler until the patch is > complete. So here's at least one problem: text_poke_bp() text_poke(addr, , sizeof(int3)); *interrupt* interrupt has a static call *BP*

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 11:23 AM, h...@zytor.com wrote: > > On January 11, 2019 11:03:30 AM PST, Linus Torvalds > wrote: >> On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf >> wrote: Now, in the int3 handler can you take the faulting RIP and search >> for it in the “static-calls” table,

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread hpa
On January 11, 2019 11:03:30 AM PST, Linus Torvalds wrote: >On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf >wrote: >> >> > >> > Now, in the int3 handler can you take the faulting RIP and search >for it in >> > the “static-calls” table, writing the RIP+5 (offset) into R10 >(return >> > address)

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 11:03 AM, Linus Torvalds > wrote: > > On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf wrote: >>> Now, in the int3 handler can you take the faulting RIP and search for it in >>> the “static-calls” table, writing the RIP+5 (offset) into R10 (return >>> address) and the

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Linus Torvalds
On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf wrote: > > > > > Now, in the int3 handler can you take the faulting RIP and search for it in > > the “static-calls” table, writing the RIP+5 (offset) into R10 (return > > address) and the target into R11. You make the int3 handler to divert the > >

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 8:07 AM, Josh Poimboeuf wrote: > > On Fri, Jan 11, 2019 at 03:48:59PM +, Nadav Amit wrote: >>> I liked the idea, BUT, how would it work for callee-saved PV ops? In >>> that case there's only one clobbered register to work with (rax). >> >> That’s would be more tricky.

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 03:48:59PM +, Nadav Amit wrote: > > I liked the idea, BUT, how would it work for callee-saved PV ops? In > > that case there's only one clobbered register to work with (rax). > > That’s would be more tricky. How about using a per-CPU trampoline code to > hold a direct

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Nadav Amit
> On Jan 11, 2019, at 7:15 AM, Josh Poimboeuf wrote: > > On Fri, Jan 11, 2019 at 01:47:01AM +, Nadav Amit wrote: >> Here is an alternative idea (although similar to Steven’s and my code). >> >> Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow >> should be done by

Re: [PATCH v3 0/6] Static calls

2019-01-11 Thread Josh Poimboeuf
On Fri, Jan 11, 2019 at 01:47:01AM +, Nadav Amit wrote: > Here is an alternative idea (although similar to Steven’s and my code). > > Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow > should be done by the calling convention (and gcc plugin should allow us to >

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 4:56 PM, Andy Lutomirski wrote: > > On Thu, Jan 10, 2019 at 3:02 PM Linus Torvalds > wrote: >> On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf wrote: >>> Right, emulating a call instruction from the #BP handler is ugly, >>> because you have to somehow grow the stack to

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Andy Lutomirski
On Thu, Jan 10, 2019 at 3:02 PM Linus Torvalds wrote: > > On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf wrote: > > > > Right, emulating a call instruction from the #BP handler is ugly, > > because you have to somehow grow the stack to make room for the return > > address. Personally I liked

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf wrote: > > Right, emulating a call instruction from the #BP handler is ugly, > because you have to somehow grow the stack to make room for the return > address. Personally I liked the idea of shifting the iret frame by 16 > bytes in the #DB entry

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 12:57 PM, Josh Poimboeuf wrote: > > On Thu, Jan 10, 2019 at 08:48:31PM +, Nadav Amit wrote: >>> On Jan 10, 2019, at 12:32 PM, Josh Poimboeuf wrote: >>> >>> On Thu, Jan 10, 2019 at 07:45:26PM +, Nadav Amit wrote: >> I’m not GCC expert either and writing this

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Josh Poimboeuf
On Thu, Jan 10, 2019 at 08:48:31PM +, Nadav Amit wrote: > > On Jan 10, 2019, at 12:32 PM, Josh Poimboeuf wrote: > > > > On Thu, Jan 10, 2019 at 07:45:26PM +, Nadav Amit wrote: > I’m not GCC expert either and writing this code was not making me full of > joy, etc.. I’ll be happy

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread H. Peter Anvin
On 1/10/19 9:31 AM, Linus Torvalds wrote: > On Wed, Jan 9, 2019 at 2:59 PM Josh Poimboeuf wrote: >> >> NOTE: At least experimentally, the call destination writes seem to be >> atomic with respect to instruction fetching. On Nehalem I can easily >> trigger crashes when writing a call destination

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Josh Poimboeuf
On Thu, Jan 10, 2019 at 09:30:23PM +0100, Peter Zijlstra wrote: > On Wed, Jan 09, 2019 at 04:59:35PM -0600, Josh Poimboeuf wrote: > > With this version, I stopped trying to use text_poke_bp(), and instead > > went with a different approach: if the call site destination doesn't > > cross a

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 12:32 PM, Josh Poimboeuf wrote: > > On Thu, Jan 10, 2019 at 07:45:26PM +, Nadav Amit wrote: I’m not GCC expert either and writing this code was not making me full of joy, etc.. I’ll be happy that my code would be reviewed, but it does work. I don’t

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Josh Poimboeuf
On Thu, Jan 10, 2019 at 07:45:26PM +, Nadav Amit wrote: > >> I’m not GCC expert either and writing this code was not making me full of > >> joy, etc.. I’ll be happy that my code would be reviewed, but it does work. > >> I > >> don’t think an early pass is needed, as long as hardware registers

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Peter Zijlstra
On Wed, Jan 09, 2019 at 04:59:35PM -0600, Josh Poimboeuf wrote: > With this version, I stopped trying to use text_poke_bp(), and instead > went with a different approach: if the call site destination doesn't > cross a cacheline boundary, just do an atomic write. Otherwise, keep > using the

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 10:18 AM, Josh Poimboeuf wrote: > > On Thu, Jan 10, 2019 at 05:32:08PM +, Nadav Amit wrote: >>> On Jan 10, 2019, at 8:44 AM, Josh Poimboeuf wrote: >>> >>> On Thu, Jan 10, 2019 at 01:21:00AM +, Nadav Amit wrote: > On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Josh Poimboeuf
On Thu, Jan 10, 2019 at 05:32:08PM +, Nadav Amit wrote: > > On Jan 10, 2019, at 8:44 AM, Josh Poimboeuf wrote: > > > > On Thu, Jan 10, 2019 at 01:21:00AM +, Nadav Amit wrote: > >>> On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf wrote: > >>> > >>> With this version, I stopped trying to use

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 8:44 AM, Josh Poimboeuf wrote: > > On Thu, Jan 10, 2019 at 01:21:00AM +, Nadav Amit wrote: >>> On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf wrote: >>> >>> With this version, I stopped trying to use text_poke_bp(), and instead >>> went with a different approach: if the

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Linus Torvalds
On Wed, Jan 9, 2019 at 2:59 PM Josh Poimboeuf wrote: > > NOTE: At least experimentally, the call destination writes seem to be > atomic with respect to instruction fetching. On Nehalem I can easily > trigger crashes when writing a call destination across cachelines while > reading the

Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Josh Poimboeuf
On Thu, Jan 10, 2019 at 01:21:00AM +, Nadav Amit wrote: > > On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf wrote: > > > > With this version, I stopped trying to use text_poke_bp(), and instead > > went with a different approach: if the call site destination doesn't > > cross a cacheline

Re: [PATCH v3 0/6] Static calls

2019-01-09 Thread Nadav Amit
> On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf wrote: > > With this version, I stopped trying to use text_poke_bp(), and instead > went with a different approach: if the call site destination doesn't > cross a cacheline boundary, just do an atomic write. Otherwise, keep > using the trampoline