Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2018-01-25 Thread Peter Zijlstra
On Thu, Jan 25, 2018 at 01:13:21PM +0100, Petr Mladek wrote:
> > What I was getting at, the klp stuff is the very first thing we run when
> > we schedule the idle task, but its placed at the very end of the
> > function. This is confusing.
> 
> I see.
> 
> 
> > The above still doesn't help with solving that. Do you want to run
> > something before we go idle, or before we leave idle, in neither cases
> > would I place it where it is.
> 
> In fact, both ways are fine. We require going the idle task
> through the entire cycle anyway. It is because both situations,
> too long idling or non-idling, would block finishing the patch
> transition.
> 
> Feel free to move it right before schedule_idle() or
> __current_set_polling().

OK, I'll move it. Thanks!


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2018-01-25 Thread Petr Mladek
On Thu 2018-01-25 11:38:55, Peter Zijlstra wrote:
> On Thu, Jan 25, 2018 at 11:24:14AM +0100, Petr Mladek wrote:
> > On Thu 2018-01-25 10:04:44, Peter Zijlstra wrote:
> > > On Mon, Feb 13, 2017 at 07:42:40PM -0600, Josh Poimboeuf wrote:
> > > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > > > index 6a4bae0..a8b3f1a 100644
> > > > --- a/kernel/sched/idle.c
> > > > +++ b/kernel/sched/idle.c
> > > > @@ -9,6 +9,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #include 
> > > >  
> > > > @@ -264,6 +265,9 @@ static void do_idle(void)
> > > >  
> > > > sched_ttwu_pending();
> > > > schedule_preempt_disabled();
> > > > +
> > > > +   if (unlikely(klp_patch_pending(current)))
> > > > +   klp_update_patch_state(current);
> > > >  }
> > > 
> > > Can someone explain this one? This is a very weird place to add things.
> > > What was the expectation?
> > 
> > AFAIK, it was the least ugly and minimalist solution that we came with.
> > 
> > The tasks are migrated to the new patch when neither of the patched
> > functions is on the stack. The stack can be checked safely only when
> > the task is not running. It might be very hard to catch the idle
> > task on a such a place if we patch something that is used there.
> > 
> > If the idle task is scheduled, you would need to create some fake
> > load on the system, try to migrate the idle task, stop the fake load
> > on the CPU.
> > 
> > The above code makes the idle task to migrate itself on a sane place.
> > You just need to schedule some minimalist job on the CPU. The idle
> > task will do one loop, gets migrated, and might be scheduled again
> > immediately.
> 
> What I was getting at, the klp stuff is the very first thing we run when
> we schedule the idle task, but its placed at the very end of the
> function. This is confusing.

I see.


> The above still doesn't help with solving that. Do you want to run
> something before we go idle, or before we leave idle, in neither cases
> would I place it where it is.

In fact, both ways are fine. We require going the idle task
through the entire cycle anyway. It is because both situations,
too long idling or non-idling, would block finishing the patch
transition.

Feel free to move it right before schedule_idle() or
__current_set_polling().

Or should I send a patch?

Best Regards,
Petr


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2018-01-25 Thread Peter Zijlstra
On Thu, Jan 25, 2018 at 11:24:14AM +0100, Petr Mladek wrote:
> On Thu 2018-01-25 10:04:44, Peter Zijlstra wrote:
> > On Mon, Feb 13, 2017 at 07:42:40PM -0600, Josh Poimboeuf wrote:
> > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > > index 6a4bae0..a8b3f1a 100644
> > > --- a/kernel/sched/idle.c
> > > +++ b/kernel/sched/idle.c
> > > @@ -9,6 +9,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include 
> > >  
> > > @@ -264,6 +265,9 @@ static void do_idle(void)
> > >  
> > >   sched_ttwu_pending();
> > >   schedule_preempt_disabled();
> > > +
> > > + if (unlikely(klp_patch_pending(current)))
> > > + klp_update_patch_state(current);
> > >  }
> > 
> > Can someone explain this one? This is a very weird place to add things.
> > What was the expectation?
> 
> AFAIK, it was the least ugly and minimalist solution that we came with.
> 
> The tasks are migrated to the new patch when neither of the patched
> functions is on the stack. The stack can be checked safely only when
> the task is not running. It might be very hard to catch the idle
> task on a such a place if we patch something that is used there.
> 
> If the idle task is scheduled, you would need to create some fake
> load on the system, try to migrate the idle task, stop the fake load
> on the CPU.
> 
> The above code makes the idle task to migrate itself on a sane place.
> You just need to schedule some minimalist job on the CPU. The idle
> task will do one loop, gets migrated, and might be scheduled again
> immediately.

What I was getting at, the klp stuff is the very first thing we run when
we schedule the idle task, but its placed at the very end of the
function. This is confusing.

The above still doesn't help with solving that. Do you want to run
something before we go idle, or before we leave idle, in neither cases
would I place it where it is.


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2018-01-25 Thread Petr Mladek
On Thu 2018-01-25 10:04:44, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 07:42:40PM -0600, Josh Poimboeuf wrote:
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index 6a4bae0..a8b3f1a 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  
> > @@ -264,6 +265,9 @@ static void do_idle(void)
> >  
> > sched_ttwu_pending();
> > schedule_preempt_disabled();
> > +
> > +   if (unlikely(klp_patch_pending(current)))
> > +   klp_update_patch_state(current);
> >  }
> 
> Can someone explain this one? This is a very weird place to add things.
> What was the expectation?

AFAIK, it was the least ugly and minimalist solution that we came with.

The tasks are migrated to the new patch when neither of the patched
functions is on the stack. The stack can be checked safely only when
the task is not running. It might be very hard to catch the idle
task on a such a place if we patch something that is used there.

If the idle task is scheduled, you would need to create some fake
load on the system, try to migrate the idle task, stop the fake load
on the CPU.

The above code makes the idle task to migrate itself on a sane place.
You just need to schedule some minimalist job on the CPU. The idle
task will do one loop, gets migrated, and might be scheduled again
immediately.

Best Regards,
Petr


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2018-01-25 Thread Peter Zijlstra
On Mon, Feb 13, 2017 at 07:42:40PM -0600, Josh Poimboeuf wrote:
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 6a4bae0..a8b3f1a 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -264,6 +265,9 @@ static void do_idle(void)
>  
>   sched_ttwu_pending();
>   schedule_preempt_disabled();
> +
> + if (unlikely(klp_patch_pending(current)))
> + klp_update_patch_state(current);
>  }

Can someone explain this one? This is a very weird place to add things.
What was the expectation?


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-04-11 Thread Petr Mladek
On Mon 2017-02-13 19:42:40, Josh Poimboeuf wrote:
> Change livepatch to use a basic per-task consistency model.  This is the
> foundation which will eventually enable us to patch those ~10% of
> security patches which change function or data semantics.  This is the
> biggest remaining piece needed to make livepatch more generally useful.
> 
> 
> Signed-off-by: Josh Poimboeuf 

Just for record, this last version looks fine to me. I do not see
problems any longer. Everything looks consistent now ;-)
It is a great work. Feel free to use:

Reviewed-by: Petr Mladek 

Thanks a lot for patience.

Best Regards,
Petr


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-03-07 Thread Miroslav Benes
On Mon, 13 Feb 2017, Josh Poimboeuf wrote:

> Change livepatch to use a basic per-task consistency model.  This is the
> foundation which will eventually enable us to patch those ~10% of
> security patches which change function or data semantics.  This is the
> biggest remaining piece needed to make livepatch more generally useful.
> 
> This code stems from the design proposal made by Vojtech [1] in November
> 2014.  It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
> consistency and syscall barrier switching combined with kpatch's stack
> trace switching.  There are also a number of fallback options which make
> it quite flexible.
> 
> Patches are applied on a per-task basis, when the task is deemed safe to
> switch over.  When a patch is enabled, livepatch enters into a
> transition state where tasks are converging to the patched state.
> Usually this transition state can complete in a few seconds.  The same
> sequence occurs when a patch is disabled, except the tasks converge from
> the patched state to the unpatched state.
> 
> An interrupt handler inherits the patched state of the task it
> interrupts.  The same is true for forked tasks: the child inherits the
> patched state of the parent.
> 
> Livepatch uses several complementary approaches to determine when it's
> safe to patch tasks:
> 
> 1. The first and most effective approach is stack checking of sleeping
>tasks.  If no affected functions are on the stack of a given task,
>the task is patched.  In most cases this will patch most or all of
>the tasks on the first try.  Otherwise it'll keep trying
>periodically.  This option is only available if the architecture has
>reliable stacks (HAVE_RELIABLE_STACKTRACE).
> 
> 2. The second approach, if needed, is kernel exit switching.  A
>task is switched when it returns to user space from a system call, a
>user space IRQ, or a signal.  It's useful in the following cases:
> 
>a) Patching I/O-bound user tasks which are sleeping on an affected
>   function.  In this case you have to send SIGSTOP and SIGCONT to
>   force it to exit the kernel and be patched.
>b) Patching CPU-bound user tasks.  If the task is highly CPU-bound
>   then it will get patched the next time it gets interrupted by an
>   IRQ.
>c) In the future it could be useful for applying patches for
>   architectures which don't yet have HAVE_RELIABLE_STACKTRACE.  In
>   this case you would have to signal most of the tasks on the
>   system.  However this isn't supported yet because there's
>   currently no way to patch kthreads without
>   HAVE_RELIABLE_STACKTRACE.
> 
> 3. For idle "swapper" tasks, since they don't ever exit the kernel, they
>instead have a klp_update_patch_state() call in the idle loop which
>allows them to be patched before the CPU enters the idle state.
> 
>(Note there's not yet such an approach for kthreads.)
> 
> All the above approaches may be skipped by setting the 'immediate' flag
> in the 'klp_patch' struct, which will disable per-task consistency and
> patch all tasks immediately.  This can be useful if the patch doesn't
> change any function or data semantics.  Note that, even with this flag
> set, it's possible that some tasks may still be running with an old
> version of the function, until that function returns.
> 
> There's also an 'immediate' flag in the 'klp_func' struct which allows
> you to specify that certain functions in the patch can be applied
> without per-task consistency.  This might be useful if you want to patch
> a common function like schedule(), and the function change doesn't need
> consistency but the rest of the patch does.
> 
> For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
> must set patch->immediate which causes all tasks to be patched
> immediately.  This option should be used with care, only when the patch
> doesn't change any function or data semantics.
> 
> In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
> may be allowed to use per-task consistency if we can come up with
> another way to patch kthreads.
> 
> The /sys/kernel/livepatch//transition file shows whether a patch
> is in transition.  Only a single patch (the topmost patch on the stack)
> can be in transition at a given time.  A patch can remain in transition
> indefinitely, if any of the tasks are stuck in the initial patch state.
> 
> A transition can be reversed and effectively canceled by writing the
> opposite value to the /sys/kernel/livepatch//enabled file while
> the transition is in progress.  Then all the tasks will attempt to
> converge back to the original patch state.
> 
> [1] https://lkml.kernel.org/r/20141107140458.ga21...@suse.cz
> 
> Signed-off-by: Josh Poimboeuf 

I looked at the patch again and could not see any problem with it. I 
tested it with a couple of live patches too and it worked as expected. 
Good job.

Acked-by: Miroslav Benes 

Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-22 Thread Miroslav Benes
On Tue, 21 Feb 2017, Josh Poimboeuf wrote:

> On Fri, Feb 17, 2017 at 09:51:29AM +0100, Miroslav Benes wrote:
> > On Thu, 16 Feb 2017, Josh Poimboeuf wrote:
> > > What do you think about the following?  I tried to put the logic in
> > > klp_complete_transition(), so the module_put()'s would be in one place.
> > > But it was too messy, so I put it in klp_cancel_transition() instead.
> > >
> > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > index e96346e..bd1c1fd 100644
> > > --- a/kernel/livepatch/transition.c
> > > +++ b/kernel/livepatch/transition.c
> > > @@ -121,8 +121,28 @@ static void klp_complete_transition(void)
> > >   */
> > >  void klp_cancel_transition(void)
> > >  {
> > > + bool immediate_func = false;
> > > +
> > >   klp_target_state = !klp_target_state;
> > >   klp_complete_transition();
> > > +
> > > + if (klp_target_state == KLP_PATCHED)
> > > + return;
> > 
> > This is not needed, I think. We call klp_cancel_transition() in 
> > __klp_enable_patch() only. klp_target_state is set to KLP_PATCHED there 
> > (through klp_init_transition()) and negated here. We know it must be 
> > KLP_UNPATCHED.
> 
> Yeah, I was trying to hedge against the possibility of future code
> calling this function in the disable path.  But that probably won't
> happen and it would probably be cleaner to just add a warning if
> klp_target_state isn't KLP_PATCHED.
> 
> > Moreover, due to klp_complete_transition() klp_target_state is always 
> > KLP_UNDEFINED after it.
> > 
> > > +
> > > + /*
> > > +  * In the enable error path, even immediate patches can be safely
> > > +  * removed because the transition hasn't been started yet.
> > > +  *
> > > +  * klp_complete_transition() doesn't have a module_put() for immediate
> > > +  * patches, so do it here.
> > > +  */
> > > + klp_for_each_object(klp_transition_patch, obj)
> > > + klp_for_each_func(obj, func)
> > > + if (func->immediate)
> > > + immediate_func = true;
> > > +
> > > + if (klp_transition_patch->immediate || immediate_func)
> > > + module_put(klp_transition_patch->mod);
> > 
> > Almost correct. The only problem is that klp_transition_patch is NULL at 
> > this point. klp_complete_transition() does that and it should stay there 
> > in my opinion to keep it simple.
> > 
> > So we could either move all this to __klp_enable_patch(), where patch 
> > variable is defined, or we could store klp_transition_patch to a local 
> > variable here in klp_cancel_transition() before klp_complete_transition() 
> > is called. That should be safe. I like the latter more, because it keeps 
> > the code in klp_cancel_transition() where it belongs.
> 
> Good points.  Here's another try:
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index e96346e..a23c63c 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -121,8 +121,31 @@ static void klp_complete_transition(void)
>   */
>  void klp_cancel_transition(void)
>  {
> - klp_target_state = !klp_target_state;
> + struct klp_patch *patch = klp_transition_patch;
> + struct klp_object *obj;
> + struct klp_func *func;
> + bool immediate_func = false;
> +
> + if (WARN_ON_ONCE(klp_target_state != KLP_PATCHED))
> + return;
> +
> + klp_target_state = KLP_UNPATCHED;
>   klp_complete_transition();
> +
> + /*
> +  * In the enable error path, even immediate patches can be safely
> +  * removed because the transition hasn't been started yet.
> +  *
> +  * klp_complete_transition() doesn't have a module_put() for immediate
> +  * patches, so do it here.
> +  */
> + klp_for_each_object(patch, obj)
> + klp_for_each_func(obj, func)
> + if (func->immediate)
> + immediate_func = true;
> +
> + if (patch->immediate || immediate_func)
> + module_put(patch->mod);
>  }

Looks ok.

Thanks,
Miroslav


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-21 Thread Josh Poimboeuf
On Fri, Feb 17, 2017 at 09:51:29AM +0100, Miroslav Benes wrote:
> On Thu, 16 Feb 2017, Josh Poimboeuf wrote:
> > What do you think about the following?  I tried to put the logic in
> > klp_complete_transition(), so the module_put()'s would be in one place.
> > But it was too messy, so I put it in klp_cancel_transition() instead.
> >
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index e96346e..bd1c1fd 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -121,8 +121,28 @@ static void klp_complete_transition(void)
> >   */
> >  void klp_cancel_transition(void)
> >  {
> > +   bool immediate_func = false;
> > +
> > klp_target_state = !klp_target_state;
> > klp_complete_transition();
> > +
> > +   if (klp_target_state == KLP_PATCHED)
> > +   return;
> 
> This is not needed, I think. We call klp_cancel_transition() in 
> __klp_enable_patch() only. klp_target_state is set to KLP_PATCHED there 
> (through klp_init_transition()) and negated here. We know it must be 
> KLP_UNPATCHED.

Yeah, I was trying to hedge against the possibility of future code
calling this function in the disable path.  But that probably won't
happen and it would probably be cleaner to just add a warning if
klp_target_state isn't KLP_PATCHED.

> Moreover, due to klp_complete_transition() klp_target_state is always 
> KLP_UNDEFINED after it.
> 
> > +
> > +   /*
> > +* In the enable error path, even immediate patches can be safely
> > +* removed because the transition hasn't been started yet.
> > +*
> > +* klp_complete_transition() doesn't have a module_put() for immediate
> > +* patches, so do it here.
> > +*/
> > +   klp_for_each_object(klp_transition_patch, obj)
> > +   klp_for_each_func(obj, func)
> > +   if (func->immediate)
> > +   immediate_func = true;
> > +
> > +   if (klp_transition_patch->immediate || immediate_func)
> > +   module_put(klp_transition_patch->mod);
> 
> Almost correct. The only problem is that klp_transition_patch is NULL at 
> this point. klp_complete_transition() does that and it should stay there 
> in my opinion to keep it simple.
> 
> So we could either move all this to __klp_enable_patch(), where patch 
> variable is defined, or we could store klp_transition_patch to a local 
> variable here in klp_cancel_transition() before klp_complete_transition() 
> is called. That should be safe. I like the latter more, because it keeps 
> the code in klp_cancel_transition() where it belongs.

Good points.  Here's another try:

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index e96346e..a23c63c 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -121,8 +121,31 @@ static void klp_complete_transition(void)
  */
 void klp_cancel_transition(void)
 {
-   klp_target_state = !klp_target_state;
+   struct klp_patch *patch = klp_transition_patch;
+   struct klp_object *obj;
+   struct klp_func *func;
+   bool immediate_func = false;
+
+   if (WARN_ON_ONCE(klp_target_state != KLP_PATCHED))
+   return;
+
+   klp_target_state = KLP_UNPATCHED;
klp_complete_transition();
+
+   /*
+* In the enable error path, even immediate patches can be safely
+* removed because the transition hasn't been started yet.
+*
+* klp_complete_transition() doesn't have a module_put() for immediate
+* patches, so do it here.
+*/
+   klp_for_each_object(patch, obj)
+   klp_for_each_func(obj, func)
+   if (func->immediate)
+   immediate_func = true;
+
+   if (patch->immediate || immediate_func)
+   module_put(patch->mod);
 }
 
 /*


Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-17 Thread Miroslav Benes
On Thu, 16 Feb 2017, Josh Poimboeuf wrote:

> On Thu, Feb 16, 2017 at 03:33:26PM +0100, Miroslav Benes wrote:
> > 
> > > @@ -347,22 +356,36 @@ static int __klp_enable_patch(struct klp_patch 
> > > *patch)
> > >  
> > >   pr_notice("enabling patch '%s'\n", patch->mod->name);
> > >  
> > > + klp_init_transition(patch, KLP_PATCHED);
> > > +
> > > + /*
> > > +  * Enforce the order of the func->transition writes in
> > > +  * klp_init_transition() and the ops->func_stack writes in
> > > +  * klp_patch_object(), so that klp_ftrace_handler() will see the
> > > +  * func->transition updates before the handler is registered and the
> > > +  * new funcs become visible to the handler.
> > > +  */
> > > + smp_wmb();
> > > +
> > >   klp_for_each_object(patch, obj) {
> > >   if (!klp_is_object_loaded(obj))
> > >   continue;
> > >  
> > >   ret = klp_patch_object(obj);
> > > - if (ret)
> > > - goto unregister;
> > > + if (ret) {
> > > + pr_warn("failed to enable patch '%s'\n",
> > > + patch->mod->name);
> > > +
> > > + klp_cancel_transition();
> > > + return ret;
> > > + }
> > 
> > [...]
> > 
> > > +static void klp_complete_transition(void)
> > > +{
> > > + struct klp_object *obj;
> > > + struct klp_func *func;
> > > + struct task_struct *g, *task;
> > > + unsigned int cpu;
> > > +
> > > + if (klp_target_state == KLP_UNPATCHED) {
> > > + /*
> > > +  * All tasks have transitioned to KLP_UNPATCHED so we can now
> > > +  * remove the new functions from the func_stack.
> > > +  */
> > > + klp_unpatch_objects(klp_transition_patch);
> > > +
> > > + /*
> > > +  * Make sure klp_ftrace_handler() can no longer see functions
> > > +  * from this patch on the ops->func_stack.  Otherwise, after
> > > +  * func->transition gets cleared, the handler may choose a
> > > +  * removed function.
> > > +  */
> > > + synchronize_rcu();
> > > + }
> > > +
> > > + if (klp_transition_patch->immediate)
> > > + goto done;
> > > +
> > > + klp_for_each_object(klp_transition_patch, obj)
> > > + klp_for_each_func(obj, func)
> > > + func->transition = false;
> > > +
> > > + /* Prevent klp_ftrace_handler() from seeing KLP_UNDEFINED state */
> > > + if (klp_target_state == KLP_PATCHED)
> > > + synchronize_rcu();
> > > +
> > > + read_lock(_lock);
> > > + for_each_process_thread(g, task) {
> > > + WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> > > + task->patch_state = KLP_UNDEFINED;
> > > + }
> > > + read_unlock(_lock);
> > > +
> > > + for_each_possible_cpu(cpu) {
> > > + task = idle_task(cpu);
> > > + WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> > > + task->patch_state = KLP_UNDEFINED;
> > > + }
> > > +
> > > +done:
> > > + klp_target_state = KLP_UNDEFINED;
> > > + klp_transition_patch = NULL;
> > > +}
> > > +
> > > +/*
> > > + * This is called in the error path, to cancel a transition before it has
> > > + * started, i.e. klp_init_transition() has been called but
> > > + * klp_start_transition() hasn't.  If the transition *has* been started,
> > > + * klp_reverse_transition() should be used instead.
> > > + */
> > > +void klp_cancel_transition(void)
> > > +{
> > > + klp_target_state = !klp_target_state;
> > > + klp_complete_transition();
> > > +}
> > 
> > If we fail to enable patch in __klp_enable_patch(), we call 
> > klp_cancel_transition() and get to klp_complete_transition(). If the patch 
> > has immediate set to true, the module would not be allowed to go (the 
> > changes are in the last patch unfortunately, but my remark is closely 
> > related to klp_cancel_transition() and __klp_enable_patch()). This could 
> > annoy a user if it was due to a trivial reason. So we could call  
> > module_put() in this case. It should be safe as no task could be in a new 
> > function thanks to klp_ftrace_handler() logic.
> > 
> > Pity I have not spotted this earlier.
> > 
> > Putting module_put(patch->mod) right after klp_cancel_transition() call in 
> > __klp_enable_patch() would be the simplest fix (as a part of 15/15 patch). 
> > Maybe with a comment that it is safe to do it there.
> > 
> > What do you think?
> 
> Good catch.  I agree that 15/15 should have something like that.
> 
> Also, the module_put() will be needed for non-immediate patches which
> have a func->immediate set.

Right. This further complicates it.
 
> What do you think about the following?  I tried to put the logic in
> klp_complete_transition(), so the module_put()'s would be in one place.
> But it was too messy, so I put it in klp_cancel_transition() instead.
>
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index e96346e..bd1c1fd 100644
> --- a/kernel/livepatch/transition.c
> +++ 

Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-16 Thread Josh Poimboeuf
On Thu, Feb 16, 2017 at 03:33:26PM +0100, Miroslav Benes wrote:
> 
> > @@ -347,22 +356,36 @@ static int __klp_enable_patch(struct klp_patch *patch)
> >  
> > pr_notice("enabling patch '%s'\n", patch->mod->name);
> >  
> > +   klp_init_transition(patch, KLP_PATCHED);
> > +
> > +   /*
> > +* Enforce the order of the func->transition writes in
> > +* klp_init_transition() and the ops->func_stack writes in
> > +* klp_patch_object(), so that klp_ftrace_handler() will see the
> > +* func->transition updates before the handler is registered and the
> > +* new funcs become visible to the handler.
> > +*/
> > +   smp_wmb();
> > +
> > klp_for_each_object(patch, obj) {
> > if (!klp_is_object_loaded(obj))
> > continue;
> >  
> > ret = klp_patch_object(obj);
> > -   if (ret)
> > -   goto unregister;
> > +   if (ret) {
> > +   pr_warn("failed to enable patch '%s'\n",
> > +   patch->mod->name);
> > +
> > +   klp_cancel_transition();
> > +   return ret;
> > +   }
> 
> [...]
> 
> > +static void klp_complete_transition(void)
> > +{
> > +   struct klp_object *obj;
> > +   struct klp_func *func;
> > +   struct task_struct *g, *task;
> > +   unsigned int cpu;
> > +
> > +   if (klp_target_state == KLP_UNPATCHED) {
> > +   /*
> > +* All tasks have transitioned to KLP_UNPATCHED so we can now
> > +* remove the new functions from the func_stack.
> > +*/
> > +   klp_unpatch_objects(klp_transition_patch);
> > +
> > +   /*
> > +* Make sure klp_ftrace_handler() can no longer see functions
> > +* from this patch on the ops->func_stack.  Otherwise, after
> > +* func->transition gets cleared, the handler may choose a
> > +* removed function.
> > +*/
> > +   synchronize_rcu();
> > +   }
> > +
> > +   if (klp_transition_patch->immediate)
> > +   goto done;
> > +
> > +   klp_for_each_object(klp_transition_patch, obj)
> > +   klp_for_each_func(obj, func)
> > +   func->transition = false;
> > +
> > +   /* Prevent klp_ftrace_handler() from seeing KLP_UNDEFINED state */
> > +   if (klp_target_state == KLP_PATCHED)
> > +   synchronize_rcu();
> > +
> > +   read_lock(_lock);
> > +   for_each_process_thread(g, task) {
> > +   WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> > +   task->patch_state = KLP_UNDEFINED;
> > +   }
> > +   read_unlock(_lock);
> > +
> > +   for_each_possible_cpu(cpu) {
> > +   task = idle_task(cpu);
> > +   WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> > +   task->patch_state = KLP_UNDEFINED;
> > +   }
> > +
> > +done:
> > +   klp_target_state = KLP_UNDEFINED;
> > +   klp_transition_patch = NULL;
> > +}
> > +
> > +/*
> > + * This is called in the error path, to cancel a transition before it has
> > + * started, i.e. klp_init_transition() has been called but
> > + * klp_start_transition() hasn't.  If the transition *has* been started,
> > + * klp_reverse_transition() should be used instead.
> > + */
> > +void klp_cancel_transition(void)
> > +{
> > +   klp_target_state = !klp_target_state;
> > +   klp_complete_transition();
> > +}
> 
> If we fail to enable patch in __klp_enable_patch(), we call 
> klp_cancel_transition() and get to klp_complete_transition(). If the patch 
> has immediate set to true, the module would not be allowed to go (the 
> changes are in the last patch unfortunately, but my remark is closely 
> related to klp_cancel_transition() and __klp_enable_patch()). This could 
> annoy a user if it was due to a trivial reason. So we could call  
> module_put() in this case. It should be safe as no task could be in a new 
> function thanks to klp_ftrace_handler() logic.
> 
> Pity I have not spotted this earlier.
> 
> Putting module_put(patch->mod) right after klp_cancel_transition() call in 
> __klp_enable_patch() would be the simplest fix (as a part of 15/15 patch). 
> Maybe with a comment that it is safe to do it there.
> 
> What do you think?

Good catch.  I agree that 15/15 should have something like that.

Also, the module_put() will be needed for non-immediate patches which
have a func->immediate set.

What do you think about the following?  I tried to put the logic in
klp_complete_transition(), so the module_put()'s would be in one place.
But it was too messy, so I put it in klp_cancel_transition() instead.


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index e96346e..bd1c1fd 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -121,8 +121,28 @@ static void klp_complete_transition(void)
  */
 void klp_cancel_transition(void)
 {
+   bool immediate_func = false;
+
klp_target_state = !klp_target_state;

Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-16 Thread Miroslav Benes

> @@ -347,22 +356,36 @@ static int __klp_enable_patch(struct klp_patch *patch)
>  
>   pr_notice("enabling patch '%s'\n", patch->mod->name);
>  
> + klp_init_transition(patch, KLP_PATCHED);
> +
> + /*
> +  * Enforce the order of the func->transition writes in
> +  * klp_init_transition() and the ops->func_stack writes in
> +  * klp_patch_object(), so that klp_ftrace_handler() will see the
> +  * func->transition updates before the handler is registered and the
> +  * new funcs become visible to the handler.
> +  */
> + smp_wmb();
> +
>   klp_for_each_object(patch, obj) {
>   if (!klp_is_object_loaded(obj))
>   continue;
>  
>   ret = klp_patch_object(obj);
> - if (ret)
> - goto unregister;
> + if (ret) {
> + pr_warn("failed to enable patch '%s'\n",
> + patch->mod->name);
> +
> + klp_cancel_transition();
> + return ret;
> + }

[...]

> +static void klp_complete_transition(void)
> +{
> + struct klp_object *obj;
> + struct klp_func *func;
> + struct task_struct *g, *task;
> + unsigned int cpu;
> +
> + if (klp_target_state == KLP_UNPATCHED) {
> + /*
> +  * All tasks have transitioned to KLP_UNPATCHED so we can now
> +  * remove the new functions from the func_stack.
> +  */
> + klp_unpatch_objects(klp_transition_patch);
> +
> + /*
> +  * Make sure klp_ftrace_handler() can no longer see functions
> +  * from this patch on the ops->func_stack.  Otherwise, after
> +  * func->transition gets cleared, the handler may choose a
> +  * removed function.
> +  */
> + synchronize_rcu();
> + }
> +
> + if (klp_transition_patch->immediate)
> + goto done;
> +
> + klp_for_each_object(klp_transition_patch, obj)
> + klp_for_each_func(obj, func)
> + func->transition = false;
> +
> + /* Prevent klp_ftrace_handler() from seeing KLP_UNDEFINED state */
> + if (klp_target_state == KLP_PATCHED)
> + synchronize_rcu();
> +
> + read_lock(_lock);
> + for_each_process_thread(g, task) {
> + WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> + task->patch_state = KLP_UNDEFINED;
> + }
> + read_unlock(_lock);
> +
> + for_each_possible_cpu(cpu) {
> + task = idle_task(cpu);
> + WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> + task->patch_state = KLP_UNDEFINED;
> + }
> +
> +done:
> + klp_target_state = KLP_UNDEFINED;
> + klp_transition_patch = NULL;
> +}
> +
> +/*
> + * This is called in the error path, to cancel a transition before it has
> + * started, i.e. klp_init_transition() has been called but
> + * klp_start_transition() hasn't.  If the transition *has* been started,
> + * klp_reverse_transition() should be used instead.
> + */
> +void klp_cancel_transition(void)
> +{
> + klp_target_state = !klp_target_state;
> + klp_complete_transition();
> +}

If we fail to enable patch in __klp_enable_patch(), we call 
klp_cancel_transition() and get to klp_complete_transition(). If the patch 
has immediate set to true, the module would not be allowed to go (the 
changes are in the last patch unfortunately, but my remark is closely 
related to klp_cancel_transition() and __klp_enable_patch()). This could 
annoy a user if it was due to a trivial reason. So we could call  
module_put() in this case. It should be safe as no task could be in a new 
function thanks to klp_ftrace_handler() logic.

Pity I have not spotted this earlier.

Putting module_put(patch->mod) right after klp_cancel_transition() call in 
__klp_enable_patch() would be the simplest fix (as a part of 15/15 patch). 
Maybe with a comment that it is safe to do it there.

What do you think?

Miroslav


[PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-13 Thread Josh Poimboeuf
Change livepatch to use a basic per-task consistency model.  This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics.  This is the
biggest remaining piece needed to make livepatch more generally useful.

This code stems from the design proposal made by Vojtech [1] in November
2014.  It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching.  There are also a number of fallback options which make
it quite flexible.

Patches are applied on a per-task basis, when the task is deemed safe to
switch over.  When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds.  The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.

An interrupt handler inherits the patched state of the task it
interrupts.  The same is true for forked tasks: the child inherits the
patched state of the parent.

Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:

1. The first and most effective approach is stack checking of sleeping
   tasks.  If no affected functions are on the stack of a given task,
   the task is patched.  In most cases this will patch most or all of
   the tasks on the first try.  Otherwise it'll keep trying
   periodically.  This option is only available if the architecture has
   reliable stacks (HAVE_RELIABLE_STACKTRACE).

2. The second approach, if needed, is kernel exit switching.  A
   task is switched when it returns to user space from a system call, a
   user space IRQ, or a signal.  It's useful in the following cases:

   a) Patching I/O-bound user tasks which are sleeping on an affected
  function.  In this case you have to send SIGSTOP and SIGCONT to
  force it to exit the kernel and be patched.
   b) Patching CPU-bound user tasks.  If the task is highly CPU-bound
  then it will get patched the next time it gets interrupted by an
  IRQ.
   c) In the future it could be useful for applying patches for
  architectures which don't yet have HAVE_RELIABLE_STACKTRACE.  In
  this case you would have to signal most of the tasks on the
  system.  However this isn't supported yet because there's
  currently no way to patch kthreads without
  HAVE_RELIABLE_STACKTRACE.

3. For idle "swapper" tasks, since they don't ever exit the kernel, they
   instead have a klp_update_patch_state() call in the idle loop which
   allows them to be patched before the CPU enters the idle state.

   (Note there's not yet such an approach for kthreads.)

All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately.  This can be useful if the patch doesn't
change any function or data semantics.  Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.

There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency.  This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.

For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately.  This option should be used with care, only when the patch
doesn't change any function or data semantics.

In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.

The /sys/kernel/livepatch//transition file shows whether a patch
is in transition.  Only a single patch (the topmost patch on the stack)
can be in transition at a given time.  A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.

A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch//enabled file while
the transition is in progress.  Then all the tasks will attempt to
converge back to the original patch state.

[1] https://lkml.kernel.org/r/20141107140458.ga21...@suse.cz

Signed-off-by: Josh Poimboeuf 
---
 Documentation/ABI/testing/sysfs-kernel-livepatch |   8 +
 Documentation/livepatch/livepatch.txt| 186 +++-
 include/linux/init_task.h|   9 +
 include/linux/livepatch.h|  42 +-
 include/linux/sched.h|   3 +
 kernel/fork.c|   3 +
 kernel/livepatch/Makefile|