Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
On Fri, Oct 02, 2015 at 02:00:14PM +0200, Petr Mladek wrote: > On Thu 2015-10-01 10:00:53, Paul E. McKenney wrote: > > On Thu, Oct 01, 2015 at 05:59:43PM +0200, Petr Mladek wrote: > > > On Tue 2015-09-29 22:08:33, Paul E. McKenney wrote: > > > > On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > > > > > My intention is to make it easier to manipulate kthreads. This RFC > > > > > tries > > > > > to use the kthread worker API. It is based on comments from the > > > > > first attempt. See https://lkml.org/lkml/2015/7/28/648 and > > > > > the list of changes below. > > > > > > > If the point of these patches was simply to test your API, and if you are > > not looking to get them upstream, we are OK. > > I would like to eventually transform all kthreads into an API that > will better define the kthread workflow. It need not be this one, > though. I am still looking for a good API that will be acceptable[*] > > One of the reason that I played with RCU, khugepaged, and ring buffer > kthreads is that they are maintained by core developers. I hope that > it will help to get better consensus. > > > > If you want them upstream, you need to explain to me why the patches > > help something. > > As I said, RCU kthreads do not show a big win because they ignore > freezer, are not parked, never stop, do not handle signals. But the > change will allow to live patch them because they leave the main > function on a safe place. > > The ring buffer benchmark is much better example. It reduced > the main function of the consumer kthread to two lines. > It removed some error prone code that modified task state, > called scheduler, handled kthread_should_stop. IMHO, the workflow > is better and more safe now. > > I am going to prepare and send more examples where the change makes > the workflow easier. > > > > And also how the patches avoid breaking things. > > I do my best to keep the original functionality. If we decide to use > the kthread worker API, my first attempt is much more safe, see > https://lkml.org/lkml/2015/7/28/650. It basically replaces the > top level for cycle with one self-queuing work. There are some more > instructions to go back to the cycle but they define a common > safe point that will be maintained on a single location for > all kthread workers. > > > [*] I have played with two APIs yet. They define a safe point > for freezing, parking, stopping, signal handling, live patching. > Also some non-trivial logic of the main cycle is maintained > on a single location. > > Here are some details: > > 1. iterant API > -- > > It allows to define three callbacks that are called the following > way: > > init(); > while (!stop) > func(); > destroy(); > > See also https://lkml.org/lkml/2015/6/5/556. > > Advantages: > + simple and clear workflow > + simple use > + simple conversion from the current kthreads API > > Disadvantages: > + problematic solution of sleeping between events > + completely new API > > > 2. kthread worker API > - > > It is similar to workqueues. The difference is that the works > have a dedicated kthread, so we could better control the resources, > e.g. priority, scheduling policy, ... > > Advantages: > + already in use > + design proven to work (workqueues) > + nature way to wait for work in the common code (worker) > using event driven works and delayed works > + easy to convert to/from workqueues API > > Disadvantages: > + more code needed to define, initialize, and queue works > + more complicated conversion from the current API > if we want to make it a clean way (event driven) > + might need more synchronization in some cases[**] > > Questionable: > + event driven vs. procedural programming style > + allows more grained split of the functionality into > separate units (works) that might be queued > as needed > > > [**] wake_up() is nope for empty waitqueue. But queuing a work > into non-existing worker might cause a crash. Well, this is > usually already synchronized. > > > Any thoughts or preferences are highly appreciated. For the RCU grace-period kthreads, I am not seeing the advantage of either API over the current approach. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
On Thu 2015-10-01 10:00:53, Paul E. McKenney wrote: > On Thu, Oct 01, 2015 at 05:59:43PM +0200, Petr Mladek wrote: > > On Tue 2015-09-29 22:08:33, Paul E. McKenney wrote: > > > On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > > > > My intention is to make it easier to manipulate kthreads. This RFC tries > > > > to use the kthread worker API. It is based on comments from the > > > > first attempt. See https://lkml.org/lkml/2015/7/28/648 and > > > > the list of changes below. > > > > > If the point of these patches was simply to test your API, and if you are > not looking to get them upstream, we are OK. I would like to eventually transform all kthreads into an API that will better define the kthread workflow. It need not be this one, though. I am still looking for a good API that will be acceptable[*] One of the reason that I played with RCU, khugepaged, and ring buffer kthreads is that they are maintained by core developers. I hope that it will help to get better consensus. > If you want them upstream, you need to explain to me why the patches > help something. As I said, RCU kthreads do not show a big win because they ignore freezer, are not parked, never stop, do not handle signals. But the change will allow to live patch them because they leave the main function on a safe place. The ring buffer benchmark is much better example. It reduced the main function of the consumer kthread to two lines. It removed some error prone code that modified task state, called scheduler, handled kthread_should_stop. IMHO, the workflow is better and more safe now. I am going to prepare and send more examples where the change makes the workflow easier. > And also how the patches avoid breaking things. I do my best to keep the original functionality. If we decide to use the kthread worker API, my first attempt is much more safe, see https://lkml.org/lkml/2015/7/28/650. It basically replaces the top level for cycle with one self-queuing work. There are some more instructions to go back to the cycle but they define a common safe point that will be maintained on a single location for all kthread workers. [*] I have played with two APIs yet. They define a safe point for freezing, parking, stopping, signal handling, live patching. Also some non-trivial logic of the main cycle is maintained on a single location. Here are some details: 1. iterant API -- It allows to define three callbacks that are called the following way: init(); while (!stop) func(); destroy(); See also https://lkml.org/lkml/2015/6/5/556. Advantages: + simple and clear workflow + simple use + simple conversion from the current kthreads API Disadvantages: + problematic solution of sleeping between events + completely new API 2. kthread worker API - It is similar to workqueues. The difference is that the works have a dedicated kthread, so we could better control the resources, e.g. priority, scheduling policy, ... Advantages: + already in use + design proven to work (workqueues) + nature way to wait for work in the common code (worker) using event driven works and delayed works + easy to convert to/from workqueues API Disadvantages: + more code needed to define, initialize, and queue works + more complicated conversion from the current API if we want to make it a clean way (event driven) + might need more synchronization in some cases[**] Questionable: + event driven vs. procedural programming style + allows more grained split of the functionality into separate units (works) that might be queued as needed [**] wake_up() is nope for empty waitqueue. But queuing a work into non-existing worker might cause a crash. Well, this is usually already synchronized. Any thoughts or preferences are highly appreciated. Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
On Thu, Oct 01, 2015 at 05:59:43PM +0200, Petr Mladek wrote: > On Tue 2015-09-29 22:08:33, Paul E. McKenney wrote: > > On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > > > My intention is to make it easier to manipulate kthreads. This RFC tries > > > to use the kthread worker API. It is based on comments from the > > > first attempt. See https://lkml.org/lkml/2015/7/28/648 and > > > the list of changes below. > > > > > > 1st..8th patches: improve the existing kthread worker API > > > > > > 9th, 12th, 17th patches: convert three kthreads into the new API, > > > namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*] > > > > > > 10th, 11th patches: fix potential problems in the ring buffer > > > benchmark; also sent separately > > > > > > 13th patch: small fix for RCU kthread; also sent separately; > > > being tested by Paul > > > > > > 14th..16th patches: preparation steps for the RCU threads > > > conversion; they are needed _only_ if we split GP start > > > and QS handling into separate works[*] > > > > > > 18th patch: does a possible improvement of the kthread worker API; > > > it adds an extra parameter to the create*() functions, so I > > > rather put it into this draft > > > > > > > > > [*] IMPORTANT: I tried to split RCU GP start and GS state handling > > > into separate works this time. But there is a problem with > > > a race in rcu_gp_kthread_worker_poke(). It might queue > > > the wrong work. It can be detected and fixed by the work > > > itself but it is a bit ugly. Alternative solution is to > > > do both operations in one work. But then we sleep too much > > > in the work which is ugly as well. Any idea is appreciated. > > > > I think that the kernel is trying really hard to tell you that splitting > > up the RCU grace-period kthreads in this manner is not such a good idea. > > Yup, I guess that it would be better to stay with the approach taken > in the previous RFC. I mean to start the grace period and handle > the quiescent state in a single work. See > https://lkml.org/lkml/2015/7/28/650 It basically keeps the > functionality. The only difference is that we regularly leave > the RCU-specific function, so it will be possible to patch it. > > The RCU kthreads are very special because they basically ignore > freezer and they never stop. They do not show well the advantage > of any new API. I tried to convert them primary because they were > so sensitive. I thought that it was good for testing limits > of the API. OK, I am all for stress-testing new APIs. But I don't yet see a reason to push this RCU kthread stress-test of your new API upstream. ;-) And yes, trying to freeze the RCU grace-period kthreads mid-way through a grace period does not appear to me to be a strategy to win. I suggest trying that at the end of a grace period. Even then, halting grace periods at pretty much any time risks hanging the system. > > So what are we really trying to accomplish here? I am guessing something > > like the following: > > > > 1. Get each grace-period kthread to a known safe state within a > > short time of having requested a safe state. If I recall > > correctly, the point of this is to allow no-downtime kernel > > patches to the functions executed by the grace-period kthreads. > > > > 2. At the same time, if someone suddenly needs a grace period > > at some point in this process, the grace period kthreads are > > going to have to wake back up and handle the grace period. > > Or do you have some tricky way to guarantee that no one is > > going to need a grace period beyond the time you freeze > > the grace-period kthreads? > > > > 3. The boost kthreads should not be a big problem because failing > > to boost simply lets the grace period run longer. > > > > 4. The callback-offload kthreads are likely to be a big problem, > > because in systems configured with them, they need to be running > > to invoke the callbacks, and if the callbacks are not invoked, > > the grace period might just as well have failed to end. > > > > 5. The per-CPU kthreads are in the same boat as the callback-offload > > kthreads. One approach is to offline all the CPUs but one, and > > that will park all but the last per-CPU kthread. But handling > > that last per-CPU kthread would likely be "good clean fun"... > > > > 6. Other requirements? > > > > One approach would be to simply say that the top-level rcu_gp_kthread() > > function cannot be patched, and arrange for the grace-period kthreads > > to park at some point within this function. Or is there some requirement > > that I am missing? > > I am a bit confused by the above paragraphs because they mix patching, > stopping, and parking. Note that we do not need to stop any process > when live patching. > > I hope that it is more clear after my response in the other mail about > freezing. Or mayb
Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
On Tue 2015-09-29 22:08:33, Paul E. McKenney wrote: > On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > > My intention is to make it easier to manipulate kthreads. This RFC tries > > to use the kthread worker API. It is based on comments from the > > first attempt. See https://lkml.org/lkml/2015/7/28/648 and > > the list of changes below. > > > > 1st..8th patches: improve the existing kthread worker API > > > > 9th, 12th, 17th patches: convert three kthreads into the new API, > > namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*] > > > > 10th, 11th patches: fix potential problems in the ring buffer > > benchmark; also sent separately > > > > 13th patch: small fix for RCU kthread; also sent separately; > > being tested by Paul > > > > 14th..16th patches: preparation steps for the RCU threads > > conversion; they are needed _only_ if we split GP start > > and QS handling into separate works[*] > > > > 18th patch: does a possible improvement of the kthread worker API; > > it adds an extra parameter to the create*() functions, so I > > rather put it into this draft > > > > > > [*] IMPORTANT: I tried to split RCU GP start and GS state handling > > into separate works this time. But there is a problem with > > a race in rcu_gp_kthread_worker_poke(). It might queue > > the wrong work. It can be detected and fixed by the work > > itself but it is a bit ugly. Alternative solution is to > > do both operations in one work. But then we sleep too much > > in the work which is ugly as well. Any idea is appreciated. > > I think that the kernel is trying really hard to tell you that splitting > up the RCU grace-period kthreads in this manner is not such a good idea. Yup, I guess that it would be better to stay with the approach taken in the previous RFC. I mean to start the grace period and handle the quiescent state in a single work. See https://lkml.org/lkml/2015/7/28/650 It basically keeps the functionality. The only difference is that we regularly leave the RCU-specific function, so it will be possible to patch it. The RCU kthreads are very special because they basically ignore freezer and they never stop. They do not show well the advantage of any new API. I tried to convert them primary because they were so sensitive. I thought that it was good for testing limits of the API. > So what are we really trying to accomplish here? I am guessing something > like the following: > > 1.Get each grace-period kthread to a known safe state within a > short time of having requested a safe state. If I recall > correctly, the point of this is to allow no-downtime kernel > patches to the functions executed by the grace-period kthreads. > > 2.At the same time, if someone suddenly needs a grace period > at some point in this process, the grace period kthreads are > going to have to wake back up and handle the grace period. > Or do you have some tricky way to guarantee that no one is > going to need a grace period beyond the time you freeze > the grace-period kthreads? > > 3.The boost kthreads should not be a big problem because failing > to boost simply lets the grace period run longer. > > 4.The callback-offload kthreads are likely to be a big problem, > because in systems configured with them, they need to be running > to invoke the callbacks, and if the callbacks are not invoked, > the grace period might just as well have failed to end. > > 5.The per-CPU kthreads are in the same boat as the callback-offload > kthreads. One approach is to offline all the CPUs but one, and > that will park all but the last per-CPU kthread. But handling > that last per-CPU kthread would likely be "good clean fun"... > > 6.Other requirements? > > One approach would be to simply say that the top-level rcu_gp_kthread() > function cannot be patched, and arrange for the grace-period kthreads > to park at some point within this function. Or is there some requirement > that I am missing? I am a bit confused by the above paragraphs because they mix patching, stopping, and parking. Note that we do not need to stop any process when live patching. I hope that it is more clear after my response in the other mail about freezing. Or maybe, I am missing something. Anyway, thanks a lot for looking at the patches and feedback. Best Regards, Petr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > My intention is to make it easier to manipulate kthreads. This RFC tries > to use the kthread worker API. It is based on comments from the > first attempt. See https://lkml.org/lkml/2015/7/28/648 and > the list of changes below. > > 1st..8th patches: improve the existing kthread worker API > > 9th, 12th, 17th patches: convert three kthreads into the new API, > namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*] > > 10th, 11th patches: fix potential problems in the ring buffer > benchmark; also sent separately > > 13th patch: small fix for RCU kthread; also sent separately; > being tested by Paul > > 14th..16th patches: preparation steps for the RCU threads > conversion; they are needed _only_ if we split GP start > and QS handling into separate works[*] > > 18th patch: does a possible improvement of the kthread worker API; > it adds an extra parameter to the create*() functions, so I > rather put it into this draft > > > [*] IMPORTANT: I tried to split RCU GP start and GS state handling > into separate works this time. But there is a problem with > a race in rcu_gp_kthread_worker_poke(). It might queue > the wrong work. It can be detected and fixed by the work > itself but it is a bit ugly. Alternative solution is to > do both operations in one work. But then we sleep too much > in the work which is ugly as well. Any idea is appreciated. I think that the kernel is trying really hard to tell you that splitting up the RCU grace-period kthreads in this manner is not such a good idea. So what are we really trying to accomplish here? I am guessing something like the following: 1. Get each grace-period kthread to a known safe state within a short time of having requested a safe state. If I recall correctly, the point of this is to allow no-downtime kernel patches to the functions executed by the grace-period kthreads. 2. At the same time, if someone suddenly needs a grace period at some point in this process, the grace period kthreads are going to have to wake back up and handle the grace period. Or do you have some tricky way to guarantee that no one is going to need a grace period beyond the time you freeze the grace-period kthreads? 3. The boost kthreads should not be a big problem because failing to boost simply lets the grace period run longer. 4. The callback-offload kthreads are likely to be a big problem, because in systems configured with them, they need to be running to invoke the callbacks, and if the callbacks are not invoked, the grace period might just as well have failed to end. 5. The per-CPU kthreads are in the same boat as the callback-offload kthreads. One approach is to offline all the CPUs but one, and that will park all but the last per-CPU kthread. But handling that last per-CPU kthread would likely be "good clean fun"... 6. Other requirements? One approach would be to simply say that the top-level rcu_gp_kthread() function cannot be patched, and arrange for the grace-period kthreads to park at some point within this function. Or is there some requirement that I am missing? Thanx, Paul > Changes against v1: > > + remove wrappers to manipulate the scheduling policy and priority > > + remove questionable wakeup_and_destroy_kthread_worker() variant > > + do not check for chained work when draining the queue > > + allocate struct kthread worker in create_kthread_work() and > use more simple checks for running worker > > + add support for delayed kthread works and use them instead > of waiting inside the works > > + rework the "unrelated" fixes for the ring buffer benchmark > as discussed in the 1st RFC; also sent separately > > + convert also the consumer in the ring buffer benchmark > > > I have tested this patch set against the stable Linus tree > for 4.3-rc2. > > Petr Mladek (18): > kthread: Allow to call __kthread_create_on_node() with va_list args > kthread: Add create_kthread_worker*() > kthread: Add drain_kthread_worker() > kthread: Add destroy_kthread_worker() > kthread: Add pending flag to kthread work > kthread: Initial support for delayed kthread work > kthread: Allow to cancel kthread work > kthread: Allow to modify delayed kthread work > mm/huge_page: Convert khugepaged() into kthread worker API > ring_buffer: Do no not complete benchmark reader too early > ring_buffer: Fix more races when terminating the producer in the > benchmark > ring_buffer: Convert benchmark kthreads into kthread worker API > rcu: Finish folding ->fqs_state into ->gp_state > rcu: Store first_gp_fqs into struct rcu_state > rcu: Clean up timeouts for forcing the quiescent state > rcu: Check actual
Re: [RFC v2 00/18] kthread: Use kthread worker API more widely
Hello, Petr. On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote: > 9th, 12th, 17th patches: convert three kthreads into the new API, > namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*] I haven't gone through each conversion in detail but they generally look good to me. Thanks a lot for doing this! -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC v2 00/18] kthread: Use kthread worker API more widely
My intention is to make it easier to manipulate kthreads. This RFC tries to use the kthread worker API. It is based on comments from the first attempt. See https://lkml.org/lkml/2015/7/28/648 and the list of changes below. 1st..8th patches: improve the existing kthread worker API 9th, 12th, 17th patches: convert three kthreads into the new API, namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*] 10th, 11th patches: fix potential problems in the ring buffer benchmark; also sent separately 13th patch: small fix for RCU kthread; also sent separately; being tested by Paul 14th..16th patches: preparation steps for the RCU threads conversion; they are needed _only_ if we split GP start and QS handling into separate works[*] 18th patch: does a possible improvement of the kthread worker API; it adds an extra parameter to the create*() functions, so I rather put it into this draft [*] IMPORTANT: I tried to split RCU GP start and GS state handling into separate works this time. But there is a problem with a race in rcu_gp_kthread_worker_poke(). It might queue the wrong work. It can be detected and fixed by the work itself but it is a bit ugly. Alternative solution is to do both operations in one work. But then we sleep too much in the work which is ugly as well. Any idea is appreciated. Changes against v1: + remove wrappers to manipulate the scheduling policy and priority + remove questionable wakeup_and_destroy_kthread_worker() variant + do not check for chained work when draining the queue + allocate struct kthread worker in create_kthread_work() and use more simple checks for running worker + add support for delayed kthread works and use them instead of waiting inside the works + rework the "unrelated" fixes for the ring buffer benchmark as discussed in the 1st RFC; also sent separately + convert also the consumer in the ring buffer benchmark I have tested this patch set against the stable Linus tree for 4.3-rc2. Petr Mladek (18): kthread: Allow to call __kthread_create_on_node() with va_list args kthread: Add create_kthread_worker*() kthread: Add drain_kthread_worker() kthread: Add destroy_kthread_worker() kthread: Add pending flag to kthread work kthread: Initial support for delayed kthread work kthread: Allow to cancel kthread work kthread: Allow to modify delayed kthread work mm/huge_page: Convert khugepaged() into kthread worker API ring_buffer: Do no not complete benchmark reader too early ring_buffer: Fix more races when terminating the producer in the benchmark ring_buffer: Convert benchmark kthreads into kthread worker API rcu: Finish folding ->fqs_state into ->gp_state rcu: Store first_gp_fqs into struct rcu_state rcu: Clean up timeouts for forcing the quiescent state rcu: Check actual RCU_GP_FLAG_FQS when handling quiescent state rcu: Convert RCU gp kthreads into kthread worker API kthread: Better support freezable kthread workers include/linux/kthread.h | 67 + kernel/kthread.c | 544 --- kernel/rcu/tree.c| 407 -- kernel/rcu/tree.h| 24 +- kernel/rcu/tree_plugin.h | 16 +- kernel/rcu/tree_trace.c | 2 +- kernel/trace/ring_buffer_benchmark.c | 194 ++--- mm/huge_memory.c | 116 8 files changed, 1017 insertions(+), 353 deletions(-) -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/