Re: [PATCH v2 2/3] livepatch: send a fake signal to all blocking tasks

2017-08-16 Thread Petr Mladek
On Thu 2017-08-10 12:48:14, Miroslav Benes wrote:
> Live patching consistency model is of LEAVE_PATCHED_SET and
> SWITCH_THREAD. This means that all tasks in the system have to be marked
> one by one as safe to call a new patched function. Safe means when a
> task is not (sleeping) in a set of patched functions. That is, no
> patched function is on the task's stack. Another clearly safe place is
> the boundary between kernel and userspace. The patching waits for all
> tasks to get outside of the patched set or to cross the boundary. The
> transition is completed afterwards.
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 79022b7eca2c..a359340c924d 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -452,7 +452,7 @@ EXPORT_SYMBOL_GPL(klp_enable_patch);
>  static ssize_t force_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
>  {
> - return sprintf(buf, "No operation is currently permitted.\n");
> + return sprintf(buf, "signal\n");

This makes invalid the "NOTE:" above this function ;-)

Best Regards,
Petr


Re: [PATCH v2 2/3] livepatch: send a fake signal to all blocking tasks

2017-08-14 Thread Miroslav Benes
On Fri, 11 Aug 2017, Josh Poimboeuf wrote:

> On Thu, Aug 10, 2017 at 12:48:14PM +0200, Miroslav Benes wrote:
> > Last, sending the fake signal is not automatic. It is done only when
> > admin requests it by writing 1 to force sysfs attribute in livepatch
> > sysfs directory.
> 
> 'writing 1' -> 'writing "signal"'
> 
> (unless you take my suggestion to change to two separate sysfs files)

I'll take two separate sysfs files instead.
 
> > @@ -468,7 +468,12 @@ static ssize_t force_store(struct kobject *kobj, 
> > struct kobj_attribute *attr,
> > return -EINVAL;
> > }
> >  
> > -   return -EINVAL;
> > +   if (!memcmp("signal", buf, min(sizeof("signal")-1, count)))
> > +   klp_force_signals();
> 
> Any reason why you can't just do a strcmp()?

Not really IIRC. I just borrowed the code from 
mm/huge_memory.c:enabled_store().

> > +++ b/kernel/livepatch/transition.c
> > @@ -577,3 +577,43 @@ void klp_copy_process(struct task_struct *child)
> >  
> > /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
> >  }
> > +
> > +/*
> > + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> > + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can 
> > request this
> > + * action currently.
> > + */
> > +void klp_force_signals(void)
> > +{
> > +   struct task_struct *g, *task;
> > +
> > +   pr_notice("signalling remaining tasks\n");
> 
> As a native US speaker with possible OCD spelling tendencies, it bothers
> me to see "signalling" with two l's instead of one.  According to
> Google, the UK spelling of the word has two l's, so maybe it's not a
> typo.  I'll forgive you if you don't fix it :-)

If it bothers you, I'll fix it. As a non-native speaker, I can live with 
both.

> > +
> > +   read_lock(&tasklist_lock);
> > +   for_each_process_thread(g, task) {
> > +   if (!klp_patch_pending(task))
> > +   continue;
> > +
> > +   /*
> > +* There is a small race here. We could see TIF_PATCH_PENDING
> > +* set and decide to wake up a kthread or send a fake signal.
> > +* Meanwhile the task could migrate itself and the action
> > +* would be meaningless. It is not serious though.
> > +*/
> > +   if (task->flags & PF_KTHREAD) {
> > +   /*
> > +* Wake up a kthread which still has not been migrated.
> > +*/
> > +   wake_up_process(task);
> > +   } else {
> > +   /*
> > +* Send fake signal to all non-kthread tasks which are
> > +* still not migrated.
> > +*/
> > +   spin_lock_irq(&task->sighand->siglock);
> > +   signal_wake_up(task, 0);
> > +   spin_unlock_irq(&task->sighand->siglock);
> > +   }
> > +   }
> > +   read_unlock(&tasklist_lock);
> 
> I can't remember if we talked about this before, is it possible to also
> signal/wake the idle tasks?

Jiri mentioned that in his email. It is not that easy. Take a look at 
pick_next_task() in kernel/sched/core.c. idle_sched_class is always the 
last one to be checked. Of course we could do something like this 
there...

if (klp_patch_pending(rq->idle)) {
p = idle_sched_class.pick_next_task(rq, prev);

return p;
}

... but people may be watching, so I didn't say anything.

Thanks,
Miroslav


Re: [PATCH v2 2/3] livepatch: send a fake signal to all blocking tasks

2017-08-12 Thread Jiri Kosina
On Fri, 11 Aug 2017, Josh Poimboeuf wrote:

> > +   read_lock(&tasklist_lock);
> > +   for_each_process_thread(g, task) {
> > +   if (!klp_patch_pending(task))
> > +   continue;
> > +
> > +   /*
> > +* There is a small race here. We could see TIF_PATCH_PENDING
> > +* set and decide to wake up a kthread or send a fake signal.
> > +* Meanwhile the task could migrate itself and the action
> > +* would be meaningless. It is not serious though.
> > +*/
> > +   if (task->flags & PF_KTHREAD) {
> > +   /*
> > +* Wake up a kthread which still has not been migrated.
> > +*/
> > +   wake_up_process(task);
> > +   } else {
> > +   /*
> > +* Send fake signal to all non-kthread tasks which are
> > +* still not migrated.
> > +*/
> > +   spin_lock_irq(&task->sighand->siglock);
> > +   signal_wake_up(task, 0);
> > +   spin_unlock_irq(&task->sighand->siglock);
> > +   }
> > +   }
> > +   read_unlock(&tasklist_lock);
> 
> I can't remember if we talked about this before, is it possible to also
> signal/wake the idle tasks?

Scheduler won't select idle task in case there is *anything* else runnable 
in any other sched class anyway. And if that is the case, there is no need 
for explicit wakeup, as idle task would get scheduled anyway implicitly.

So idle task is a little bit more difficult than that, unfortunately.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v2 2/3] livepatch: send a fake signal to all blocking tasks

2017-08-11 Thread Josh Poimboeuf
On Thu, Aug 10, 2017 at 12:48:14PM +0200, Miroslav Benes wrote:
> Last, sending the fake signal is not automatic. It is done only when
> admin requests it by writing 1 to force sysfs attribute in livepatch
> sysfs directory.

'writing 1' -> 'writing "signal"'

(unless you take my suggestion to change to two separate sysfs files)

> @@ -468,7 +468,12 @@ static ssize_t force_store(struct kobject *kobj, struct 
> kobj_attribute *attr,
>   return -EINVAL;
>   }
>  
> - return -EINVAL;
> + if (!memcmp("signal", buf, min(sizeof("signal")-1, count)))
> + klp_force_signals();

Any reason why you can't just do a strcmp()?

> +++ b/kernel/livepatch/transition.c
> @@ -577,3 +577,43 @@ void klp_copy_process(struct task_struct *child)
>  
>   /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
>  }
> +
> +/*
> + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request 
> this
> + * action currently.
> + */
> +void klp_force_signals(void)
> +{
> + struct task_struct *g, *task;
> +
> + pr_notice("signalling remaining tasks\n");

As a native US speaker with possible OCD spelling tendencies, it bothers
me to see "signalling" with two l's instead of one.  According to
Google, the UK spelling of the word has two l's, so maybe it's not a
typo.  I'll forgive you if you don't fix it :-)

> +
> + read_lock(&tasklist_lock);
> + for_each_process_thread(g, task) {
> + if (!klp_patch_pending(task))
> + continue;
> +
> + /*
> +  * There is a small race here. We could see TIF_PATCH_PENDING
> +  * set and decide to wake up a kthread or send a fake signal.
> +  * Meanwhile the task could migrate itself and the action
> +  * would be meaningless. It is not serious though.
> +  */
> + if (task->flags & PF_KTHREAD) {
> + /*
> +  * Wake up a kthread which still has not been migrated.
> +  */
> + wake_up_process(task);
> + } else {
> + /*
> +  * Send fake signal to all non-kthread tasks which are
> +  * still not migrated.
> +  */
> + spin_lock_irq(&task->sighand->siglock);
> + signal_wake_up(task, 0);
> + spin_unlock_irq(&task->sighand->siglock);
> + }
> + }
> + read_unlock(&tasklist_lock);

I can't remember if we talked about this before, is it possible to also
signal/wake the idle tasks?

-- 
Josh


[PATCH v2 2/3] livepatch: send a fake signal to all blocking tasks

2017-08-10 Thread Miroslav Benes
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.

The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example.  Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.

There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled in a
different way. They can be woken up so they leave the patched set and
their TIF_PATCH_PENDING can be cleared thanks to stack checking.

For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.

If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...

* each process dealing with signals should react accordingly to EINTR
  return values.
* syscalls returning EINTR happen to be quite common situation in the
  system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
  Thus EINTR values are returned when the system is resumed.

The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch.  TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().

Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).

Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to force sysfs attribute in livepatch
sysfs directory.

Signed-off-by: Miroslav Benes 
Cc: Oleg Nesterov 
Cc: Michael Ellerman 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x...@kernel.org
---
 Documentation/ABI/testing/sysfs-kernel-livepatch |  4 ++-
 Documentation/livepatch/livepatch.txt|  5 ++-
 arch/powerpc/kernel/signal.c |  6 ++--
 arch/x86/entry/common.c  |  6 ++--
 kernel/livepatch/core.c  |  9 --
 kernel/livepatch/transition.c| 40 
 kernel/livepatch/transition.h|  1 +
 kernel/signal.c  |  4 ++-
 8 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-livepatch 
b/Documentation/ABI/testing/sysfs-kernel-livepatch
index b7a487ca8852..45f4e3551d27 100644
--- a/Documentation/ABI/testing/sysfs-kernel-livepatch
+++ b/Documentation/ABI/testing/sysfs-kernel-livepatch
@@ -16,9 +16,11 @@ Contact: live-patch...@vger.kernel.org
The attribute allows administrator to affect the course of an
existing transition.
 
-   Reading from the file returns all available operations.
+   Reading from the file returns all available operations, which
+   may be "signal" (signalling remaining tasks).
 
Writing one of the strings to the file executes the operation.
+   "signal" sends a signal to all remaining blocking tasks.