Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-02-01 Thread Mark Rutland
On Wed, Feb 01, 2023 at 08:57:27AM -0800, Josh Poimboeuf wrote:
> On Wed, Feb 01, 2023 at 11:10:20AM +, Mark Rutland wrote:
> > On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote:
> > > On Tue, Jan 31, 2023 at 10:22:09AM +, Mark Rutland wrote:
> > > > > Hm, it might be nice if our out-of-line static call implementation 
> > > > > would
> > > > > automatically do a static key check as part of static_call_cond() for
> > > > > NULL-type static calls.
> > > > > 
> > > > > But the best answer is probably to just add inline static calls to
> > > > > arm64.  Is the lack of objtool the only thing blocking that?
> > > > 
> > > > The major issues were branch range limitations (and needing the linker 
> > > > to add
> > > > PLTs),
> > > 
> > > Does the compiler do the right thing (e.g., force PLT) if the branch
> > > target is outside the translation unit?  I'm wondering if we could for
> > > example use objtool to help enforce such rules at the call site.
> > 
> > It's the linker (rather than the compiler) that'll generate the PLT if the
> > caller and callee are out of range at link time. There are a few other 
> > issues
> > too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers,
> > CMODX patching requirements), so we'd have to generate a pseudo-PLT 
> > ourselves
> > at build time with a patching-friendly code sequence. Ard had a prototype 
> > for
> > that:
> > 
> >   
> > https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-a...@kernel.org/
> > 
> > ... but that was sufficiently painful that we went with the current static 
> > key
> > approach:
> > 
> >   https://lore.kernel.org/all/20211109172408.49641-1-mark.rutl...@arm.com/
> 
> Thanks for the background.
> 
> Was there a reason for putting it out-of-line rather than directly in
> _cond_resched()?

I think that's mostly a historical accident; I'm not aware of a reaason we
can't put that directly in _cond_resched(). 

Since we started from out-of-line static call trampolines, even the out-of-line
static key check looked nicer, and I think we just never considered moving the
static key check inline.

> If it were inline then it wouldn't be that much different from the
> static called version and I wonder if we could simplify by just using
> the static key for all PREEMPT_DYNAMIC configs.

That would be nice!

> > > > If we knew each call-site would only call a particular function or skip 
> > > > the
> > > > call, then we could do better (and would probably need something like 
> > > > objtool
> > > > to NOP that out at compile time), but since we don't know the callee at 
> > > > build
> > > > time we can't ensure we have a PLT in range when necessary.
> > > 
> > > Unfortunately most static calls have multiple destinations.
> > 
> > Sure, but here we're just enabling/disabling a call, which we could treat
> > differently, or wrap at a different level within the scheduler code. I'm 
> > happy
> > to take a look at that.
> 
> I can try to emulate what you did for PREEMPT_DYNAMIC.  I'll Cc you on
> my actual patch to come soon-ish.

I look forward to it! :)

> > > And most don't have the option of being NULL.
> > 
> > Oh, I was under the impression that all could be disabled/skipped, which is
> > what a NULL target implied.
> 
> I guess what I was trying to say is that if the target can be NULL, the
> call site has to use static_call_cond() to not break the
> !HAVE_STATIC_CALL case.  But most call sites use static_call().

Ah, sorry -- I had missed that we had distinct static_call_cond() and
static_call() helpers.

Thanks,
Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-02-01 Thread Mark Rutland
On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote:
> On Tue, Jan 31, 2023 at 10:22:09AM +, Mark Rutland wrote:
> > > Hm, it might be nice if our out-of-line static call implementation would
> > > automatically do a static key check as part of static_call_cond() for
> > > NULL-type static calls.
> > > 
> > > But the best answer is probably to just add inline static calls to
> > > arm64.  Is the lack of objtool the only thing blocking that?
> > 
> > The major issues were branch range limitations (and needing the linker to 
> > add
> > PLTs),
> 
> Does the compiler do the right thing (e.g., force PLT) if the branch
> target is outside the translation unit?  I'm wondering if we could for
> example use objtool to help enforce such rules at the call site.

It's the linker (rather than the compiler) that'll generate the PLT if the
caller and callee are out of range at link time. There are a few other issues
too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers,
CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves
at build time with a patching-friendly code sequence. Ard had a prototype for
that:

  
https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-a...@kernel.org/

... but that was sufficiently painful that we went with the current static key
approach:

  https://lore.kernel.org/all/20211109172408.49641-1-mark.rutl...@arm.com/

> > and painful instruction patching requirements (e.g. the architecture's
> > "CMODX" rules for Concurrent MODification and eXecution of instructions). We
> > went with the static key scheme above because that was what our assembled 
> > code
> > generation would devolve to anyway.
> > 
> > If we knew each call-site would only call a particular function or skip the
> > call, then we could do better (and would probably need something like 
> > objtool
> > to NOP that out at compile time), but since we don't know the callee at 
> > build
> > time we can't ensure we have a PLT in range when necessary.
> 
> Unfortunately most static calls have multiple destinations.

Sure, but here we're just enabling/disabling a call, which we could treat
differently, or wrap at a different level within the scheduler code. I'm happy
to take a look at that.

> And most don't have the option of being NULL.

Oh, I was under the impression that all could be disabled/skipped, which is
what a NULL target implied.

Thanks,
Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-31 Thread Mark Rutland
On Mon, Jan 30, 2023 at 11:48:23AM -0800, Josh Poimboeuf wrote:
> On Mon, Jan 30, 2023 at 06:36:32PM +, Mark Rutland wrote:
> > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> > > >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> > > >  int __sched dynamic_cond_resched(void)
> > > >  {
> > > > -   if (!static_branch_unlikely(_dynamic_cond_resched))
> > > > +   if (!static_branch_unlikely(_dynamic_cond_resched)) {
> > > > +   klp_sched_try_switch();
> > > > return 0;
> > > > +   }
> > > > return __cond_resched();
> > > >  }
> > > >  EXPORT_SYMBOL(dynamic_cond_resched);
> > > 
> > > I would make the klp_sched_try_switch() not depend on
> > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> > > pass through __schedule().
> > > 
> > > But you'll probably want to check with Mark here, this all might
> > > generate crap code on arm64.
> > 
> > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll 
> > generate
> > at least a load, a conditional branch, and an indirect branch. That's not
> > ideal, but I'd have to benchmark it to find out whether it's a significant
> > overhead relative to the baseline of PREEMPT_DYNAMIC.
> > 
> > For arm64 it'd be a bit nicer to have another static key check, and a call 
> > to
> > __klp_sched_try_switch(). That way the static key check gets turned into a 
> > NOP
> > in the common case, and the call to __klp_sched_try_switch() can be a direct
> > call (potentially a tail-call if we made it return 0).
> 
> Hm, it might be nice if our out-of-line static call implementation would
> automatically do a static key check as part of static_call_cond() for
> NULL-type static calls.
> 
> But the best answer is probably to just add inline static calls to
> arm64.  Is the lack of objtool the only thing blocking that?

The major issues were branch range limitations (and needing the linker to add
PLTs), and painful instruction patching requirements (e.g. the architecture's
"CMODX" rules for Concurrent MODification and eXecution of instructions). We
went with the static key scheme above because that was what our assembled code
generation would devolve to anyway.

If we knew each call-site would only call a particular function or skip the
call, then we could do better (and would probably need something like objtool
to NOP that out at compile time), but since we don't know the callee at build
time we can't ensure we have a PLT in range when necessary.

> Objtool is now modular, so all the controversial CFG reverse engineering
> is now optional, so it shouldn't be too hard to just enable objtool for
> static call inlines.

Funnily enough, I spent some time yesterday looking at enabling a trivial
objtool for arm64 as I wanted some basic ELF rewriting functionality (to
manipulate the mcount_loc table). So I'll likely be looking at that soon
regardless of static calls. :)

Thanks,
Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-31 Thread Peter Zijlstra
On Mon, Jan 30, 2023 at 11:59:30AM -0800, Josh Poimboeuf wrote:

> @@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode)
>  
>   switch (mode) {
>   case preempt_dynamic_none:
> - preempt_dynamic_enable(cond_resched);
> + if (!klp_override)
> + preempt_dynamic_enable(cond_resched);
>   preempt_dynamic_disable(might_resched);
>   preempt_dynamic_disable(preempt_schedule);
>   preempt_dynamic_disable(preempt_schedule_notrace);
>   preempt_dynamic_disable(irqentry_exit_cond_resched);
> + //FIXME avoid printk for klp restore

if (mode != preempt_dynamic_mode)

>   pr_info("Dynamic Preempt: none\n");
>   break;
>  
>   case preempt_dynamic_voluntary:
> - preempt_dynamic_enable(cond_resched);
> + if (!klp_override)
> + preempt_dynamic_enable(cond_resched);
>   preempt_dynamic_enable(might_resched);
>   preempt_dynamic_disable(preempt_schedule);
>   preempt_dynamic_disable(preempt_schedule_notrace);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-30 Thread Mark Rutland
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> >  int __sched dynamic_cond_resched(void)
> >  {
> > -   if (!static_branch_unlikely(_dynamic_cond_resched))
> > +   if (!static_branch_unlikely(_dynamic_cond_resched)) {
> > +   klp_sched_try_switch();
> > return 0;
> > +   }
> > return __cond_resched();
> >  }
> >  EXPORT_SYMBOL(dynamic_cond_resched);
> 
> I would make the klp_sched_try_switch() not depend on
> sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> pass through __schedule().
> 
> But you'll probably want to check with Mark here, this all might
> generate crap code on arm64.

IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate
at least a load, a conditional branch, and an indirect branch. That's not
ideal, but I'd have to benchmark it to find out whether it's a significant
overhead relative to the baseline of PREEMPT_DYNAMIC.

For arm64 it'd be a bit nicer to have another static key check, and a call to
__klp_sched_try_switch(). That way the static key check gets turned into a NOP
in the common case, and the call to __klp_sched_try_switch() can be a direct
call (potentially a tail-call if we made it return 0).

Thanks,
Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-30 Thread Peter Zijlstra
On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:


> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4df2b3e76b30..fbcd3acca25c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /* task_struct member predeclarations (sorted alphabetically): */
> @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched);
>  
>  static __always_inline int _cond_resched(void)
>  {
> + //FIXME this is a bit redundant with preemption disabled
> + klp_sched_try_switch();
> +
>   return static_call_mod(cond_resched)();
>  }

Right, I was thinking you'd do something like:

static_call_update(cond_resched, klp_cond_resched);

With:

static int klp_cond_resched(void)
{
klp_try_switch_task(current);
return __cond_resched();
}

That would force cond_resched() into doing the transition thing,
irrespective of the preemption mode at hand. And then, when KLP be done,
re-run sched_dynamic_update() to reset it to whatever it ought to be.

> @@ -401,8 +421,10 @@ void klp_try_complete_transition(void)
>*/
>   read_lock(_lock);
>   for_each_process_thread(g, task)
> - if (!klp_try_switch_task(task))
> + if (!klp_try_switch_task(task)) {
> + set_tsk_need_resched(task);
>   complete = false;
> + }

Yeah, no, that's broken -- preemption state live in more than just the
TIF bit.

>   read_unlock(_lock);
>  
>   /*
> @@ -413,6 +435,7 @@ void klp_try_complete_transition(void)
>   task = idle_task(cpu);
>   if (cpu_online(cpu)) {
>   if (!klp_try_switch_task(task)) {
> + set_tsk_need_resched(task);
>   complete = false;
>   /* Make idle task go through the main loop. */
>   wake_up_if_idle(cpu);

Idem.

Also, I don't see the point of this and the __schedule() hook here:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3a0ef2fefbd5..01e32d242ef6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int 
> sched_mode)
>   struct rq *rq;
>   int cpu;
>  
> + klp_sched_try_switch();
> +
>   cpu = smp_processor_id();
>   rq = cpu_rq(cpu);
>   prev = rq->curr;

If it schedules, you'll get it with the normal switcheroo, because it'll
be inactive some of the time. If it doesn't schedule, it'll run into
cond_resched().

> @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
>  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
>  int __sched dynamic_cond_resched(void)
>  {
> - if (!static_branch_unlikely(_dynamic_cond_resched))
> + if (!static_branch_unlikely(_dynamic_cond_resched)) {
> + klp_sched_try_switch();
>   return 0;
> + }
>   return __cond_resched();
>  }
>  EXPORT_SYMBOL(dynamic_cond_resched);

I would make the klp_sched_try_switch() not depend on
sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
pass through __schedule().

But you'll probably want to check with Mark here, this all might
generate crap code on arm64.

Both ways this seems to make KLP 'depend' (or at least work lots better)
when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for
_cond_resched() too?


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-30 Thread Petr Mladek via Virtualization
On Fri 2023-01-27 08:57:40, Seth Forshee wrote:
> On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote:
> > Could you please provide some more details about the test system?
> > Is there anything important to make it reproducible?
> > 
> > The following aspects come to my mind. It might require:
> > 
> >+ more workers running on the same system
> >+ have a dedicated CPU for the worker
> >+ livepatching the function called by work->fn()
> >+ running the same work again and again
> >+ huge and overloaded system
> 
> I'm isolating a CPU, starting a KVM guest with a virtio-net device, and
> setting the affinity of the vhost worker thread to only the isolated
> CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll
> note that in real-world cases the systems have many CPUs, and while the
> vhost threads aren't each given a dedicated CPU, if the system load is
> light enough a thread can end up with exlusive use of a CPU).
> 
> Then all I do is run iperf between the guest and the host with several
> parallel streams. I seem to be hitting the limits of the guest vCPUs
> before the vhost thread is fully saturated, as this gets it to about 90%
> CPU utilization by the vhost thread.

Thanks for the info!

> > > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > > > that the transition is tried only once per minute. It means that there
> > > > are "only" 60 attempts.
> > > > 
> > > > Just by chance, does it help you to increase the timeout, please?
> > > 
> > > To be honest my test setup reproduces the problem well enough to make
> > > KLP wait significant time due to vhost threads, but it seldom causes it
> > > to hit kpatch's timeout.
> > > 
> > > Our system management software will try to load a patch tens of times in
> > > a day, and we've seen real-world cases where patches couldn't load
> > > within kpatch's timeout for multiple days. But I don't have such an
> > > environment readily accessible for my own testing. I can try to refine
> > > my test case and see if I can get it to that point.
> > 
> > My understanding is that you try to load the patch repeatedly but
> > it always fails after the 1 minute timeout. It means that it always
> > starts from the beginning (no livepatched process).
> > 
> > Is there any chance to try it with a longer timeout, for example, one
> > hour? It should increase the chance if there are more problematic kthreads.
> 
> Yes, I can try it. But I think I already mentioned that we are somewhat
> limited by our system management software and how livepatch loading is
> currently implemented there. I'd need to consult with others about how
> long we could make the timeout, but 1 hour is definitely too long under
> our current system.

Another possibility is to do not wait at all. SUSE livepatch packages load
the livepatch module, remove not longer used livepatch modules and are
done with it.

Note that the module is loaded quickly. The transition is finished
asynchronously using workqueues.

Of course, there is a risk that the transition will never finish.
It would prevent loading any newer livepatch. But it might be handled
when the the newer livepatch is loaded. It might revert the pending
transition, ...

Of course, it would be great to make the transition more reliable.
It would be nice to add the hook into the scheduler as discussed
in another branch of this thread. But it might bring another problems,
for example, affect the system performance. Well, it probably can
be optimized or ratelimited.

Anyway, I wanted to say that there is a way to get rid of the timeout
completely.

Best Regards,
Petr
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-27 Thread Petr Mladek via Virtualization
On Fri 2023-01-27 11:37:02, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > > We've fairly regularaly seen liveptches which cannot transition 
> > > > > within kpatch's
> > > > > timeout period due to busy vhost worker kthreads.
> > > > 
> > > > I have missed this detail. Miroslav told me that we have solved
> > > > something similar some time ago, see
> > > > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/
> > > 
> > > Interesting thread. I had thought about something along the lines of the
> > > original patch, but there are some ideas in there that I hadn't
> > > considered.
> > 
> > Here's another idea, have we considered this?  Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> > 
> > Not sure how scheduler folks would feel about that ;-)
> 
> So, let me try and page all that back in :-)
> 
> KLP needs to unwind the stack to see if any of the patched functions are
> active, if not, flip task to new set.
> 
> Unwinding the stack of a task can be done when:
> 
>  - task is inactive (stable reg and stack) -- provided it stays inactive
>while unwinding etc..
> 
>  - task is current (guarantees stack doesn't dip below where we started
>due to being busy on top etc..)
> 
> Can NOT be done from interrupt context, because can hit in the middle of
> setting up stack frames etc..

All the above seems correct.

> The issue at hand is that some tasks run for a long time without passing
> through an explicit check.

There might actually be two possibilities why the transition fails
too often:

1. The task might be in the running state most of the time. Therefore
   the backtrace is not reliable most of the time.

   In this case, some cooperation with the scheduler would really
   help. We would need to stop the task and check the stack
   when it is stopped. Something like the patch you proposed.


2. The task might be sleeping but almost always in a livepatched
   function. Therefore it could not be transitioned.

   It might be the case with vhost_worker(). The main loop is "tiny".
   The kthread probaly spends most of the time with processing
   a vhost_work. And if the "works" are livepatched...

   In this case, it would help to call klp_try_switch_task(current)
   in the main loop in vhost_worker(). It would always succeed
   when vhost_worker() is not livepatched on its own.

   Note that even this would not help with kPatch when a single
   vhost_work might need more than the 1 minute timout to get proceed.

> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index f1b25ec581e0..06746095a724 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -9,6 +9,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
>   return !ret;
>  }
>  
> +static int __stop_try_switch(void *arg)
> +{
> + return klp_try_switch_task(arg) ? 0 : -EBUSY;
> +}
> +
> +static bool klp_try_switch_task_harder(struct task_struct *task)
> +{
> + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
> +}
> +
>  /*
>   * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
>   * Kthreads with TIF_PATCH_PENDING set are woken up.

Nice. I am surprised that it can be implemented so easily.

Best Regards,
Petr
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-27 Thread Petr Mladek via Virtualization
On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote:
> On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > We've fairly regularaly seen liveptches which cannot transition within 
> > > kpatch's
> > > timeout period due to busy vhost worker kthreads.
> > 
> > I have missed this detail. Miroslav told me that we have solved
> > something similar some time ago, see
> > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/
> 
> Interesting thread. I had thought about something along the lines of the
> original patch, but there are some ideas in there that I hadn't
> considered.

Could you please provide some more details about the test system?
Is there anything important to make it reproducible?

The following aspects come to my mind. It might require:

   + more workers running on the same system
   + have a dedicated CPU for the worker
   + livepatching the function called by work->fn()
   + running the same work again and again
   + huge and overloaded system


> > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > that the transition is tried only once per minute. It means that there
> > are "only" 60 attempts.
> > 
> > Just by chance, does it help you to increase the timeout, please?
> 
> To be honest my test setup reproduces the problem well enough to make
> KLP wait significant time due to vhost threads, but it seldom causes it
> to hit kpatch's timeout.
> 
> Our system management software will try to load a patch tens of times in
> a day, and we've seen real-world cases where patches couldn't load
> within kpatch's timeout for multiple days. But I don't have such an
> environment readily accessible for my own testing. I can try to refine
> my test case and see if I can get it to that point.

My understanding is that you try to load the patch repeatedly but
it always fails after the 1 minute timeout. It means that it always
starts from the beginning (no livepatched process).

Is there any chance to try it with a longer timeout, for example, one
hour? It should increase the chance if there are more problematic kthreads.

> > This low timeout might be useful for testing. But in practice, it does
> > not matter when the transition is lasting one hour or even longer.
> > It takes much longer time to prepare the livepatch.
> 
> Agreed. And to be clear, we cope with the fact that patches may take
> hours or even days to get applied in some cases. The patches I sent are
> just about improving the only case I've identified which has lead to
> kpatch failing to load a patch for a day or longer.

If it is acceptable to wait hours or even days then the 1 minute
timeout is quite contra-productive. We actually do not use any timeout
at all in livepatches provided by SUSE.

Best Regards,
Petr
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-27 Thread Peter Zijlstra
On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > We've fairly regularaly seen liveptches which cannot transition within 
> > > > kpatch's
> > > > timeout period due to busy vhost worker kthreads.
> > > 
> > > I have missed this detail. Miroslav told me that we have solved
> > > something similar some time ago, see
> > > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/
> > 
> > Interesting thread. I had thought about something along the lines of the
> > original patch, but there are some ideas in there that I hadn't
> > considered.
> 
> Here's another idea, have we considered this?  Have livepatch set
> TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> set.
> 
> Not sure how scheduler folks would feel about that ;-)

So, let me try and page all that back in :-)

KLP needs to unwind the stack to see if any of the patched functions are
active, if not, flip task to new set.

Unwinding the stack of a task can be done when:

 - task is inactive (stable reg and stack) -- provided it stays inactive
   while unwinding etc..

 - task is current (guarantees stack doesn't dip below where we started
   due to being busy on top etc..)

Can NOT be done from interrupt context, because can hit in the middle of
setting up stack frames etc..

The issue at hand is that some tasks run for a long time without passing
through an explicit check.

The thread above tried sticking something in cond_resched() which is a
problem for PREEMPT=y since cond_resched() is a no-op.

Preempt notifiers were raised, and those would actually be nice, except
you can only install a notifier on current and you need some memory
allocated per task, which makes it less than ideal. Plus ...

... putting something in finish_task_switch() wouldn't be the end of the
world I suppose, but then you still need to force schedule the task --
imagine it being the only runnable task on the CPU, there's nothing
going to make it actually switch.

Which then leads me to suggest something daft like this.. does that
help?


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..06746095a724 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
return !ret;
 }
 
+static int __stop_try_switch(void *arg)
+{
+   return klp_try_switch_task(arg) ? 0 : -EBUSY;
+}
+
+static bool klp_try_switch_task_harder(struct task_struct *task)
+{
+   return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
+}
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
  * Kthreads with TIF_PATCH_PENDING set are woken up.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-26 Thread Petr Mladek via Virtualization
On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> We've fairly regularaly seen liveptches which cannot transition within 
> kpatch's
> timeout period due to busy vhost worker kthreads.

I have missed this detail. Miroslav told me that we have solved
something similar some time ago, see
https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/

Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
that the transition is tried only once per minute. It means that there
are "only" 60 attempts.

Just by chance, does it help you to increase the timeout, please?

This low timeout might be useful for testing. But in practice, it does
not matter when the transition is lasting one hour or even longer.
It takes much longer time to prepare the livepatch.

Best Regards,
Petr
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

2023-01-22 Thread Michael S. Tsirkin
On Fri, Jan 20, 2023 at 04:12:20PM -0600, Seth Forshee (DigitalOcean) wrote:
> We've fairly regularaly seen liveptches which cannot transition within 
> kpatch's
> timeout period due to busy vhost worker kthreads. In looking for a solution 
> the
> only answer I found was to call klp_update_patch_state() from a safe location.
> I tried adding this call to vhost_worker(), and it works, but this creates the
> potential for problems if a livepatch attempted to patch vhost_worker().
> Without a call to klp_update_patch_state() fully loaded vhost kthreads can
> never switch because vhost_worker() will always appear on the stack, but with
> the call these kthreads can switch but will still be running the old version 
> of
> vhost_worker().
> 
> To avoid this situation I've added a new function, klp_switch_current(), which
> switches the current task only if its stack does not include any function 
> being
> patched. This allows kthreads to safely attempt switching themselves if a 
> patch
> is pending. There is at least one downside, however. Since there's no way for
> the kthread to track whether it has already tried to switch for a pending 
> patch
> it can end up calling klp_switch_current() repeatedly when it can never be
> safely switched.
> 
> I don't know whether this is the right solution, and I'm happy to try out 
> other
> suggestions. But in my testing these patches proved effective in consistently
> switching heavily loaded vhost kthreads almost immediately.
> 
> To: Josh Poimboeuf 
> To: Jiri Kosina 
> To: Miroslav Benes 
> To: Petr Mladek 
> To: Joe Lawrence 
> To: "Michael S. Tsirkin" 
> To: Jason Wang 
> Cc: live-patch...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: k...@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Seth Forshee (DigitalOcean) 

Don't know enough about live patching to judge this.

I'll let livepatch maintainers judge this, and merge through
the livepatch tree if appropriate. For that:
Acked-by: Michael S. Tsirkin 
but pls underestand this is more a 'looks ok superficially and
I don't have better ideas'  than 'I have reviewed this thoroughly'.

> 
> ---
> Seth Forshee (DigitalOcean) (2):
>   livepatch: add an interface for safely switching kthreads
>   vhost: check for pending livepatches from vhost worker kthreads
> 
>  drivers/vhost/vhost.c |  4 
>  include/linux/livepatch.h |  2 ++
>  kernel/livepatch/transition.c | 11 +++
>  3 files changed, 17 insertions(+)
> ---
> base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
> change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a
> 
> Best regards,
> -- 
> Seth Forshee (DigitalOcean) 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization