Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Wed, Feb 01, 2023 at 08:57:27AM -0800, Josh Poimboeuf wrote: > On Wed, Feb 01, 2023 at 11:10:20AM +, Mark Rutland wrote: > > On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote: > > > On Tue, Jan 31, 2023 at 10:22:09AM +, Mark Rutland wrote: > > > > > Hm, it might be nice if our out-of-line static call implementation > > > > > would > > > > > automatically do a static key check as part of static_call_cond() for > > > > > NULL-type static calls. > > > > > > > > > > But the best answer is probably to just add inline static calls to > > > > > arm64. Is the lack of objtool the only thing blocking that? > > > > > > > > The major issues were branch range limitations (and needing the linker > > > > to add > > > > PLTs), > > > > > > Does the compiler do the right thing (e.g., force PLT) if the branch > > > target is outside the translation unit? I'm wondering if we could for > > > example use objtool to help enforce such rules at the call site. > > > > It's the linker (rather than the compiler) that'll generate the PLT if the > > caller and callee are out of range at link time. There are a few other > > issues > > too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers, > > CMODX patching requirements), so we'd have to generate a pseudo-PLT > > ourselves > > at build time with a patching-friendly code sequence. Ard had a prototype > > for > > that: > > > > > > https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-a...@kernel.org/ > > > > ... but that was sufficiently painful that we went with the current static > > key > > approach: > > > > https://lore.kernel.org/all/20211109172408.49641-1-mark.rutl...@arm.com/ > > Thanks for the background. > > Was there a reason for putting it out-of-line rather than directly in > _cond_resched()? I think that's mostly a historical accident; I'm not aware of a reaason we can't put that directly in _cond_resched(). Since we started from out-of-line static call trampolines, even the out-of-line static key check looked nicer, and I think we just never considered moving the static key check inline. > If it were inline then it wouldn't be that much different from the > static called version and I wonder if we could simplify by just using > the static key for all PREEMPT_DYNAMIC configs. That would be nice! > > > > If we knew each call-site would only call a particular function or skip > > > > the > > > > call, then we could do better (and would probably need something like > > > > objtool > > > > to NOP that out at compile time), but since we don't know the callee at > > > > build > > > > time we can't ensure we have a PLT in range when necessary. > > > > > > Unfortunately most static calls have multiple destinations. > > > > Sure, but here we're just enabling/disabling a call, which we could treat > > differently, or wrap at a different level within the scheduler code. I'm > > happy > > to take a look at that. > > I can try to emulate what you did for PREEMPT_DYNAMIC. I'll Cc you on > my actual patch to come soon-ish. I look forward to it! :) > > > And most don't have the option of being NULL. > > > > Oh, I was under the impression that all could be disabled/skipped, which is > > what a NULL target implied. > > I guess what I was trying to say is that if the target can be NULL, the > call site has to use static_call_cond() to not break the > !HAVE_STATIC_CALL case. But most call sites use static_call(). Ah, sorry -- I had missed that we had distinct static_call_cond() and static_call() helpers. Thanks, Mark. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote: > On Tue, Jan 31, 2023 at 10:22:09AM +, Mark Rutland wrote: > > > Hm, it might be nice if our out-of-line static call implementation would > > > automatically do a static key check as part of static_call_cond() for > > > NULL-type static calls. > > > > > > But the best answer is probably to just add inline static calls to > > > arm64. Is the lack of objtool the only thing blocking that? > > > > The major issues were branch range limitations (and needing the linker to > > add > > PLTs), > > Does the compiler do the right thing (e.g., force PLT) if the branch > target is outside the translation unit? I'm wondering if we could for > example use objtool to help enforce such rules at the call site. It's the linker (rather than the compiler) that'll generate the PLT if the caller and callee are out of range at link time. There are a few other issues too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers, CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves at build time with a patching-friendly code sequence. Ard had a prototype for that: https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-a...@kernel.org/ ... but that was sufficiently painful that we went with the current static key approach: https://lore.kernel.org/all/20211109172408.49641-1-mark.rutl...@arm.com/ > > and painful instruction patching requirements (e.g. the architecture's > > "CMODX" rules for Concurrent MODification and eXecution of instructions). We > > went with the static key scheme above because that was what our assembled > > code > > generation would devolve to anyway. > > > > If we knew each call-site would only call a particular function or skip the > > call, then we could do better (and would probably need something like > > objtool > > to NOP that out at compile time), but since we don't know the callee at > > build > > time we can't ensure we have a PLT in range when necessary. > > Unfortunately most static calls have multiple destinations. Sure, but here we're just enabling/disabling a call, which we could treat differently, or wrap at a different level within the scheduler code. I'm happy to take a look at that. > And most don't have the option of being NULL. Oh, I was under the impression that all could be disabled/skipped, which is what a NULL target implied. Thanks, Mark. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Mon, Jan 30, 2023 at 11:48:23AM -0800, Josh Poimboeuf wrote: > On Mon, Jan 30, 2023 at 06:36:32PM +, Mark Rutland wrote: > > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > > > int __sched dynamic_cond_resched(void) > > > > { > > > > - if (!static_branch_unlikely(_dynamic_cond_resched)) > > > > + if (!static_branch_unlikely(_dynamic_cond_resched)) { > > > > + klp_sched_try_switch(); > > > > return 0; > > > > + } > > > > return __cond_resched(); > > > > } > > > > EXPORT_SYMBOL(dynamic_cond_resched); > > > > > > I would make the klp_sched_try_switch() not depend on > > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > > > pass through __schedule(). > > > > > > But you'll probably want to check with Mark here, this all might > > > generate crap code on arm64. > > > > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll > > generate > > at least a load, a conditional branch, and an indirect branch. That's not > > ideal, but I'd have to benchmark it to find out whether it's a significant > > overhead relative to the baseline of PREEMPT_DYNAMIC. > > > > For arm64 it'd be a bit nicer to have another static key check, and a call > > to > > __klp_sched_try_switch(). That way the static key check gets turned into a > > NOP > > in the common case, and the call to __klp_sched_try_switch() can be a direct > > call (potentially a tail-call if we made it return 0). > > Hm, it might be nice if our out-of-line static call implementation would > automatically do a static key check as part of static_call_cond() for > NULL-type static calls. > > But the best answer is probably to just add inline static calls to > arm64. Is the lack of objtool the only thing blocking that? The major issues were branch range limitations (and needing the linker to add PLTs), and painful instruction patching requirements (e.g. the architecture's "CMODX" rules for Concurrent MODification and eXecution of instructions). We went with the static key scheme above because that was what our assembled code generation would devolve to anyway. If we knew each call-site would only call a particular function or skip the call, then we could do better (and would probably need something like objtool to NOP that out at compile time), but since we don't know the callee at build time we can't ensure we have a PLT in range when necessary. > Objtool is now modular, so all the controversial CFG reverse engineering > is now optional, so it shouldn't be too hard to just enable objtool for > static call inlines. Funnily enough, I spent some time yesterday looking at enabling a trivial objtool for arm64 as I wanted some basic ELF rewriting functionality (to manipulate the mcount_loc table). So I'll likely be looking at that soon regardless of static calls. :) Thanks, Mark. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Mon, Jan 30, 2023 at 11:59:30AM -0800, Josh Poimboeuf wrote: > @@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode) > > switch (mode) { > case preempt_dynamic_none: > - preempt_dynamic_enable(cond_resched); > + if (!klp_override) > + preempt_dynamic_enable(cond_resched); > preempt_dynamic_disable(might_resched); > preempt_dynamic_disable(preempt_schedule); > preempt_dynamic_disable(preempt_schedule_notrace); > preempt_dynamic_disable(irqentry_exit_cond_resched); > + //FIXME avoid printk for klp restore if (mode != preempt_dynamic_mode) > pr_info("Dynamic Preempt: none\n"); > break; > > case preempt_dynamic_voluntary: > - preempt_dynamic_enable(cond_resched); > + if (!klp_override) > + preempt_dynamic_enable(cond_resched); > preempt_dynamic_enable(might_resched); > preempt_dynamic_disable(preempt_schedule); > preempt_dynamic_disable(preempt_schedule_notrace); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > int __sched dynamic_cond_resched(void) > > { > > - if (!static_branch_unlikely(_dynamic_cond_resched)) > > + if (!static_branch_unlikely(_dynamic_cond_resched)) { > > + klp_sched_try_switch(); > > return 0; > > + } > > return __cond_resched(); > > } > > EXPORT_SYMBOL(dynamic_cond_resched); > > I would make the klp_sched_try_switch() not depend on > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > pass through __schedule(). > > But you'll probably want to check with Mark here, this all might > generate crap code on arm64. IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate at least a load, a conditional branch, and an indirect branch. That's not ideal, but I'd have to benchmark it to find out whether it's a significant overhead relative to the baseline of PREEMPT_DYNAMIC. For arm64 it'd be a bit nicer to have another static key check, and a call to __klp_sched_try_switch(). That way the static key check gets turned into a NOP in the common case, and the call to __klp_sched_try_switch() can be a direct call (potentially a tail-call if we made it return 0). Thanks, Mark. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 4df2b3e76b30..fbcd3acca25c 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -36,6 +36,7 @@ > #include > #include > #include > +#include > #include > > /* task_struct member predeclarations (sorted alphabetically): */ > @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched); > > static __always_inline int _cond_resched(void) > { > + //FIXME this is a bit redundant with preemption disabled > + klp_sched_try_switch(); > + > return static_call_mod(cond_resched)(); > } Right, I was thinking you'd do something like: static_call_update(cond_resched, klp_cond_resched); With: static int klp_cond_resched(void) { klp_try_switch_task(current); return __cond_resched(); } That would force cond_resched() into doing the transition thing, irrespective of the preemption mode at hand. And then, when KLP be done, re-run sched_dynamic_update() to reset it to whatever it ought to be. > @@ -401,8 +421,10 @@ void klp_try_complete_transition(void) >*/ > read_lock(_lock); > for_each_process_thread(g, task) > - if (!klp_try_switch_task(task)) > + if (!klp_try_switch_task(task)) { > + set_tsk_need_resched(task); > complete = false; > + } Yeah, no, that's broken -- preemption state live in more than just the TIF bit. > read_unlock(_lock); > > /* > @@ -413,6 +435,7 @@ void klp_try_complete_transition(void) > task = idle_task(cpu); > if (cpu_online(cpu)) { > if (!klp_try_switch_task(task)) { > + set_tsk_need_resched(task); > complete = false; > /* Make idle task go through the main loop. */ > wake_up_if_idle(cpu); Idem. Also, I don't see the point of this and the __schedule() hook here: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 3a0ef2fefbd5..01e32d242ef6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int > sched_mode) > struct rq *rq; > int cpu; > > + klp_sched_try_switch(); > + > cpu = smp_processor_id(); > rq = cpu_rq(cpu); > prev = rq->curr; If it schedules, you'll get it with the normal switcheroo, because it'll be inactive some of the time. If it doesn't schedule, it'll run into cond_resched(). > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > int __sched dynamic_cond_resched(void) > { > - if (!static_branch_unlikely(_dynamic_cond_resched)) > + if (!static_branch_unlikely(_dynamic_cond_resched)) { > + klp_sched_try_switch(); > return 0; > + } > return __cond_resched(); > } > EXPORT_SYMBOL(dynamic_cond_resched); I would make the klp_sched_try_switch() not depend on sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed pass through __schedule(). But you'll probably want to check with Mark here, this all might generate crap code on arm64. Both ways this seems to make KLP 'depend' (or at least work lots better) when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for _cond_resched() too? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Fri 2023-01-27 08:57:40, Seth Forshee wrote: > On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote: > > Could you please provide some more details about the test system? > > Is there anything important to make it reproducible? > > > > The following aspects come to my mind. It might require: > > > >+ more workers running on the same system > >+ have a dedicated CPU for the worker > >+ livepatching the function called by work->fn() > >+ running the same work again and again > >+ huge and overloaded system > > I'm isolating a CPU, starting a KVM guest with a virtio-net device, and > setting the affinity of the vhost worker thread to only the isolated > CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll > note that in real-world cases the systems have many CPUs, and while the > vhost threads aren't each given a dedicated CPU, if the system load is > light enough a thread can end up with exlusive use of a CPU). > > Then all I do is run iperf between the guest and the host with several > parallel streams. I seem to be hitting the limits of the guest vCPUs > before the vhost thread is fully saturated, as this gets it to about 90% > CPU utilization by the vhost thread. Thanks for the info! > > > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > > > > that the transition is tried only once per minute. It means that there > > > > are "only" 60 attempts. > > > > > > > > Just by chance, does it help you to increase the timeout, please? > > > > > > To be honest my test setup reproduces the problem well enough to make > > > KLP wait significant time due to vhost threads, but it seldom causes it > > > to hit kpatch's timeout. > > > > > > Our system management software will try to load a patch tens of times in > > > a day, and we've seen real-world cases where patches couldn't load > > > within kpatch's timeout for multiple days. But I don't have such an > > > environment readily accessible for my own testing. I can try to refine > > > my test case and see if I can get it to that point. > > > > My understanding is that you try to load the patch repeatedly but > > it always fails after the 1 minute timeout. It means that it always > > starts from the beginning (no livepatched process). > > > > Is there any chance to try it with a longer timeout, for example, one > > hour? It should increase the chance if there are more problematic kthreads. > > Yes, I can try it. But I think I already mentioned that we are somewhat > limited by our system management software and how livepatch loading is > currently implemented there. I'd need to consult with others about how > long we could make the timeout, but 1 hour is definitely too long under > our current system. Another possibility is to do not wait at all. SUSE livepatch packages load the livepatch module, remove not longer used livepatch modules and are done with it. Note that the module is loaded quickly. The transition is finished asynchronously using workqueues. Of course, there is a risk that the transition will never finish. It would prevent loading any newer livepatch. But it might be handled when the the newer livepatch is loaded. It might revert the pending transition, ... Of course, it would be great to make the transition more reliable. It would be nice to add the hook into the scheduler as discussed in another branch of this thread. But it might bring another problems, for example, affect the system performance. Well, it probably can be optimized or ratelimited. Anyway, I wanted to say that there is a way to get rid of the timeout completely. Best Regards, Petr ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Fri 2023-01-27 11:37:02, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > > We've fairly regularaly seen liveptches which cannot transition > > > > > within kpatch's > > > > > timeout period due to busy vhost worker kthreads. > > > > > > > > I have missed this detail. Miroslav told me that we have solved > > > > something similar some time ago, see > > > > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/ > > > > > > Interesting thread. I had thought about something along the lines of the > > > original patch, but there are some ideas in there that I hadn't > > > considered. > > > > Here's another idea, have we considered this? Have livepatch set > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > set. > > > > Not sure how scheduler folks would feel about that ;-) > > So, let me try and page all that back in :-) > > KLP needs to unwind the stack to see if any of the patched functions are > active, if not, flip task to new set. > > Unwinding the stack of a task can be done when: > > - task is inactive (stable reg and stack) -- provided it stays inactive >while unwinding etc.. > > - task is current (guarantees stack doesn't dip below where we started >due to being busy on top etc..) > > Can NOT be done from interrupt context, because can hit in the middle of > setting up stack frames etc.. All the above seems correct. > The issue at hand is that some tasks run for a long time without passing > through an explicit check. There might actually be two possibilities why the transition fails too often: 1. The task might be in the running state most of the time. Therefore the backtrace is not reliable most of the time. In this case, some cooperation with the scheduler would really help. We would need to stop the task and check the stack when it is stopped. Something like the patch you proposed. 2. The task might be sleeping but almost always in a livepatched function. Therefore it could not be transitioned. It might be the case with vhost_worker(). The main loop is "tiny". The kthread probaly spends most of the time with processing a vhost_work. And if the "works" are livepatched... In this case, it would help to call klp_try_switch_task(current) in the main loop in vhost_worker(). It would always succeed when vhost_worker() is not livepatched on its own. Note that even this would not help with kPatch when a single vhost_work might need more than the 1 minute timout to get proceed. > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index f1b25ec581e0..06746095a724 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -9,6 +9,7 @@ > > #include > #include > +#include > #include "core.h" > #include "patch.h" > #include "transition.h" > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) > return !ret; > } > > +static int __stop_try_switch(void *arg) > +{ > + return klp_try_switch_task(arg) ? 0 : -EBUSY; > +} > + > +static bool klp_try_switch_task_harder(struct task_struct *task) > +{ > + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); > +} > + > /* > * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > * Kthreads with TIF_PATCH_PENDING set are woken up. Nice. I am surprised that it can be implemented so easily. Best Regards, Petr ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote: > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > We've fairly regularaly seen liveptches which cannot transition within > > > kpatch's > > > timeout period due to busy vhost worker kthreads. > > > > I have missed this detail. Miroslav told me that we have solved > > something similar some time ago, see > > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/ > > Interesting thread. I had thought about something along the lines of the > original patch, but there are some ideas in there that I hadn't > considered. Could you please provide some more details about the test system? Is there anything important to make it reproducible? The following aspects come to my mind. It might require: + more workers running on the same system + have a dedicated CPU for the worker + livepatching the function called by work->fn() + running the same work again and again + huge and overloaded system > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > > that the transition is tried only once per minute. It means that there > > are "only" 60 attempts. > > > > Just by chance, does it help you to increase the timeout, please? > > To be honest my test setup reproduces the problem well enough to make > KLP wait significant time due to vhost threads, but it seldom causes it > to hit kpatch's timeout. > > Our system management software will try to load a patch tens of times in > a day, and we've seen real-world cases where patches couldn't load > within kpatch's timeout for multiple days. But I don't have such an > environment readily accessible for my own testing. I can try to refine > my test case and see if I can get it to that point. My understanding is that you try to load the patch repeatedly but it always fails after the 1 minute timeout. It means that it always starts from the beginning (no livepatched process). Is there any chance to try it with a longer timeout, for example, one hour? It should increase the chance if there are more problematic kthreads. > > This low timeout might be useful for testing. But in practice, it does > > not matter when the transition is lasting one hour or even longer. > > It takes much longer time to prepare the livepatch. > > Agreed. And to be clear, we cope with the fact that patches may take > hours or even days to get applied in some cases. The patches I sent are > just about improving the only case I've identified which has lead to > kpatch failing to load a patch for a day or longer. If it is acceptable to wait hours or even days then the 1 minute timeout is quite contra-productive. We actually do not use any timeout at all in livepatches provided by SUSE. Best Regards, Petr ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > We've fairly regularaly seen liveptches which cannot transition within > > > > kpatch's > > > > timeout period due to busy vhost worker kthreads. > > > > > > I have missed this detail. Miroslav told me that we have solved > > > something similar some time ago, see > > > https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/ > > > > Interesting thread. I had thought about something along the lines of the > > original patch, but there are some ideas in there that I hadn't > > considered. > > Here's another idea, have we considered this? Have livepatch set > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > set. > > Not sure how scheduler folks would feel about that ;-) So, let me try and page all that back in :-) KLP needs to unwind the stack to see if any of the patched functions are active, if not, flip task to new set. Unwinding the stack of a task can be done when: - task is inactive (stable reg and stack) -- provided it stays inactive while unwinding etc.. - task is current (guarantees stack doesn't dip below where we started due to being busy on top etc..) Can NOT be done from interrupt context, because can hit in the middle of setting up stack frames etc.. The issue at hand is that some tasks run for a long time without passing through an explicit check. The thread above tried sticking something in cond_resched() which is a problem for PREEMPT=y since cond_resched() is a no-op. Preempt notifiers were raised, and those would actually be nice, except you can only install a notifier on current and you need some memory allocated per task, which makes it less than ideal. Plus ... ... putting something in finish_task_switch() wouldn't be the end of the world I suppose, but then you still need to force schedule the task -- imagine it being the only runnable task on the CPU, there's nothing going to make it actually switch. Which then leads me to suggest something daft like this.. does that help? diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..06746095a724 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include #include +#include #include "core.h" #include "patch.h" #include "transition.h" @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +static int __stop_try_switch(void *arg) +{ + return klp_try_switch_task(arg) ? 0 : -EBUSY; +} + +static bool klp_try_switch_task_harder(struct task_struct *task) +{ + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); +} + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. * Kthreads with TIF_PATCH_PENDING set are woken up. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > We've fairly regularaly seen liveptches which cannot transition within > kpatch's > timeout period due to busy vhost worker kthreads. I have missed this detail. Miroslav told me that we have solved something similar some time ago, see https://lore.kernel.org/all/20220507174628.2086373-1-s...@kernel.org/ Honestly, kpatch's timeout 1 minute looks incredible low to me. Note that the transition is tried only once per minute. It means that there are "only" 60 attempts. Just by chance, does it help you to increase the timeout, please? This low timeout might be useful for testing. But in practice, it does not matter when the transition is lasting one hour or even longer. It takes much longer time to prepare the livepatch. Best Regards, Petr ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
On Fri, Jan 20, 2023 at 04:12:20PM -0600, Seth Forshee (DigitalOcean) wrote: > We've fairly regularaly seen liveptches which cannot transition within > kpatch's > timeout period due to busy vhost worker kthreads. In looking for a solution > the > only answer I found was to call klp_update_patch_state() from a safe location. > I tried adding this call to vhost_worker(), and it works, but this creates the > potential for problems if a livepatch attempted to patch vhost_worker(). > Without a call to klp_update_patch_state() fully loaded vhost kthreads can > never switch because vhost_worker() will always appear on the stack, but with > the call these kthreads can switch but will still be running the old version > of > vhost_worker(). > > To avoid this situation I've added a new function, klp_switch_current(), which > switches the current task only if its stack does not include any function > being > patched. This allows kthreads to safely attempt switching themselves if a > patch > is pending. There is at least one downside, however. Since there's no way for > the kthread to track whether it has already tried to switch for a pending > patch > it can end up calling klp_switch_current() repeatedly when it can never be > safely switched. > > I don't know whether this is the right solution, and I'm happy to try out > other > suggestions. But in my testing these patches proved effective in consistently > switching heavily loaded vhost kthreads almost immediately. > > To: Josh Poimboeuf > To: Jiri Kosina > To: Miroslav Benes > To: Petr Mladek > To: Joe Lawrence > To: "Michael S. Tsirkin" > To: Jason Wang > Cc: live-patch...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: k...@vger.kernel.org > Cc: virtualization@lists.linux-foundation.org > Cc: net...@vger.kernel.org > Signed-off-by: Seth Forshee (DigitalOcean) Don't know enough about live patching to judge this. I'll let livepatch maintainers judge this, and merge through the livepatch tree if appropriate. For that: Acked-by: Michael S. Tsirkin but pls underestand this is more a 'looks ok superficially and I don't have better ideas' than 'I have reviewed this thoroughly'. > > --- > Seth Forshee (DigitalOcean) (2): > livepatch: add an interface for safely switching kthreads > vhost: check for pending livepatches from vhost worker kthreads > > drivers/vhost/vhost.c | 4 > include/linux/livepatch.h | 2 ++ > kernel/livepatch/transition.c | 11 +++ > 3 files changed, 17 insertions(+) > --- > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4 > change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a > > Best regards, > -- > Seth Forshee (DigitalOcean) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization