Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Sat, Aug 8, 2020 at 5:26 AM Zhihao Cheng wrote: > 在 2020/8/8 3:29, Richard Weinberger 写道: > > On Fri, Aug 7, 2020 at 4:18 AM Zhihao Cheng wrote: > > > Maybe it's just me being dense and in need for a vacation. ;-) Applied to fixes. :-) -- Thanks, //richard
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
在 2020/8/8 3:29, Richard Weinberger 写道: On Fri, Aug 7, 2020 at 4:18 AM Zhihao Cheng wrote: Maybe it's just me being dense and in need for a vacation. ;-) I have quite a few ubi/ubifs patches in pending list, may you comment/check them before 5.9 ending please? thanks. \( ̄▽ ̄) For example: https://patchwork.ozlabs.org/project/linux-mtd/patch/20200601091037.3794172-2-chengzhih...@huawei.com/ https://patchwork.ozlabs.org/project/linux-mtd/patch/20200602112410.660785-1-chengzhih...@huawei.com/ https://patchwork.ozlabs.org/project/linux-mtd/cover/20200616071146.2607061-1-chengzhih...@huawei.com/
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Fri, Aug 7, 2020 at 4:18 AM Zhihao Cheng wrote: > That's where we hold different views. I have 3 viewpoints(You can point > out which one you disagree.): > > 1. If kthread_stop() happens at line 12, ubi thread is *marked* with > stop flag, it will stop at kthread_should_stop() as long as it can reach > the next iteration. > > 2. If task A is on runqueue and its state is TASK_RUNNING, task A will > be scheduled to execute. > > 3. If kthread_stop() happens at line 12, after program counter going to > line 14, ubi thead is on runqueue and its state is TASK_RUNNING. I have > explained this in situation 1 in last session. > > > I mean ubi thread is on runqueue with TASK_RUNNING state & stop flag > after the process you described. > > Line 12 kthread_stop() > > set_bit(mark stop flag) && wake_up_process(enqueue && > set TASK_RUNNING )=> TASK_RUNNING & stop flag & on runqueue > > Line 13 schedule() > > Do nothing but pick next task to execute You are perfectly right! I failed to concentrate on the state changes. Now all makes sense, also your comment before the if statement. So I don't know how to make this more clear in the code. Maybe it's just me being dense and in need for a vacation. ;-) -- Thanks, //richard
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
在 2020/8/7 4:15, Richard Weinberger 写道: On Wed, Aug 5, 2020 at 4:23 AM Zhihao Cheng wrote: Er, I can't get the point. I can list two possible situations, did I miss other situations? Yes. You keep ignoring the case I brought up. Let's start from scratch, maybe I miss something. So I'm sorry for being persistent. Never mind, we're all trying to figure it out. :-) . Besides, I'm not good at expressing question in English. (In Practicing...) The ubi thread can be reduced to a loop like this one: 1. for (;;) { 2. if (kthread_should_stop()) 3. break; 4. 5. if ( /* no work pending*/ ){ 6. set_current_state(TASK_INTERRUPTIBLE); 7. schedule(); 8. continue; 9. } 10. 11. do_work(); 12. } syzcaller found a case where stopping the thread did not work. If another task tries to stop the thread while no work is pending and the program counter in the thread is between lines 5 and 6, the kthread_stop() instruction has no effect. It has no effect because the thread sets the thread state to interruptible sleep and then schedules away. This is a common anti-pattern in the Linux kernel, sadly. Yes, but UBIFS is the exception, my solution looks like UBIFS. int ubifs_bg_thread(void *info) { while(1) { if (kthread_should_stop()) break; set_current_state(TASK_INTERRUPTIBLE); if (!c->need_bgt) { /* * Nothing prevents us from going sleep now and * be never woken up and block the task which * could wait in 'kthread_stop()' forever. */ if (kthread_should_stop()) break; schedule(); continue; } } } Do you agree with me so far or do you think syzcaller found a different issue? Yes, I agree. Your patch changes the loop as follows: 1. for (;;) { 2. if (kthread_should_stop()) 3. break; 4. 5. if ( /* no work pending*/ ){ 6. set_current_state(TASK_INTERRUPTIBLE); 7. 8. if (kthread_should_stop()) { 9. set_current_state(TASK_RUNNING); 10. break; 11. } 12. 13. schedule(); 14. continue; 15. } 16. 17. do_work(); 18. } That way there is a higher chance that the thread sees the stop flag and gracefully terminates, I fully agree on that. There's no disagreement so far. But it does not completely solve the problem. If kthread_stop() happens while the program counter of the ubi thread is at line 12, the stop flag is still missed and we end up in interruptible sleep just like before. That's where we hold different views. I have 3 viewpoints(You can point out which one you disagree.): 1. If kthread_stop() happens at line 12, ubi thread is *marked* with stop flag, it will stop at kthread_should_stop() as long as it can reach the next iteration. 2. If task A is on runqueue and its state is TASK_RUNNING, task A will be scheduled to execute. 3. If kthread_stop() happens at line 12, after program counter going to line 14, ubi thead is on runqueue and its state is TASK_RUNNING. I have explained this in situation 1 in last session. I mean ubi thread is on runqueue with TASK_RUNNING state & stop flag after the process you described. Line 12 kthread_stop() set_bit(mark stop flag) && wake_up_process(enqueue && set TASK_RUNNING ) => TASK_RUNNING & stop flag & on runqueue Line 13 schedule() Do nothing but pick next task to execute So, to solve the problem entirely I suggest changing schedule() to schedule_timeout() and let the thread wake up periodically.
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Wed, Aug 5, 2020 at 4:23 AM Zhihao Cheng wrote: > Er, I can't get the point. I can list two possible situations, did I > miss other situations? Yes. You keep ignoring the case I brought up. Let's start from scratch, maybe I miss something. So I'm sorry for being persistent. The ubi thread can be reduced to a loop like this one: 1. for (;;) { 2. if (kthread_should_stop()) 3. break; 4. 5. if ( /* no work pending*/ ){ 6. set_current_state(TASK_INTERRUPTIBLE); 7. schedule(); 8. continue; 9. } 10. 11. do_work(); 12. } syzcaller found a case where stopping the thread did not work. If another task tries to stop the thread while no work is pending and the program counter in the thread is between lines 5 and 6, the kthread_stop() instruction has no effect. It has no effect because the thread sets the thread state to interruptible sleep and then schedules away. This is a common anti-pattern in the Linux kernel, sadly. Do you agree with me so far or do you think syzcaller found a different issue? Your patch changes the loop as follows: 1. for (;;) { 2. if (kthread_should_stop()) 3. break; 4. 5. if ( /* no work pending*/ ){ 6. set_current_state(TASK_INTERRUPTIBLE); 7. 8. if (kthread_should_stop()) { 9. set_current_state(TASK_RUNNING); 10. break; 11. } 12. 13. schedule(); 14. continue; 15. } 16. 17. do_work(); 18. } That way there is a higher chance that the thread sees the stop flag and gracefully terminates, I fully agree on that. But it does not completely solve the problem. If kthread_stop() happens while the program counter of the ubi thread is at line 12, the stop flag is still missed and we end up in interruptible sleep just like before. So, to solve the problem entirely I suggest changing schedule() to schedule_timeout() and let the thread wake up periodically. -- Thanks, //richard
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
在 2020/8/5 5:56, Richard Weinberger 写道: On Tue, Aug 4, 2020 at 4:58 AM Zhihao Cheng wrote: Oh, you're thinking about influence by schedule(), I get it. But I think it still works. Because the ubi_thread is still on runqueue, it will be scheduled to execute later anyway. It will not get woken. This is the problem. opstate of ubi_thread on runqueue set_current_state(TASK_INTERRUPTIBLE) TASK_INTERRUPTIBLE Yes if (kthread_should_stop()) // not satisfy TASK_INTERRUPTIBLE Yes kthread_stop: wake_up_process ttwu_queue ttwu_do_activate ttwu_do_wakeup TASK_RUNNING Yes schedule __schedule(false) // prev->state is TASK_RUNNING, so we cannot move it from runqueue by deactivate_task(). So just pick next task to execute, ubi_thread is still on runqueue and will be scheduled to execute later. It will be in state TASK_RUNNING only if your check is reached. If kthread_stop() is called *before* your code: + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } ...everything is fine. But there is still a race window between your if (kthread_should_stop()) and schedule() in the next line. So if kthread_stop() is called right *after* the if and *before* schedule(), the task state is still TASK_INTERRUPTIBLE --> schedule() will not return unless the task is explicitly woken, which does not happen. Er, I can't get the point. I can list two possible situations, did I miss other situations? P1:ubi_thread set_current_state(TASK_INTERRUPTIBLE) if (kthread_should_stop()) { set_current_state(TASK_RUNNING) break } schedule() -> don't *remove* task from runqueue if *TASK_RUNNING*, removing operation is protected by rq_lock P2:kthread_stop set_bit(KTHREAD_SHOULD_STOP, >flags) wake_up_process(k) -> enqueue task & set *TASK_RUNNING*, these two operations are protected by rq_lock wait_for_completion(>exited) Situation 1: P1_set_current_state on-rq, TASK_RUNNING -> TASK_INTERRUPTIBLE P1_kthread_should_stop on-rq, TASK_INTERRUPTIBLE P2_set_bit on-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P2_wake_up_process on-rq, TASK_INTERRUPTIBLE -> TASK_RUNNING , KTHREAD_SHOULD_STOP P1_schedule on-rq, TASK_RUNNING , KTHREAD_SHOULD_STOP P2_wait_for_completion // wait for P1 exit Situation 2: P1_set_current_state on-rq, TASK_RUNNING -> TASK_INTERRUPTIBLE P1_kthread_should_stop on-rq, TASK_INTERRUPTIBLE P2_set_bit on-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P1_schedule off-rq, TASK_INTERRUPTIBLE , KTHREAD_SHOULD_STOP P2_wake_up_process on-rq, TASK_INTERRUPTIBLE -> TASK_RUNNING , KTHREAD_SHOULD_STOP P2_wait_for_completion // wait for P1 exit Before your patch, the race window was much larger, I fully agree, but your patch does not cure the problem it just makes it harder to hit. And using mdelay() to verify such a thing is also tricky because mdelay() will influence the task state.
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Tue, Aug 4, 2020 at 4:58 AM Zhihao Cheng wrote: > Oh, you're thinking about influence by schedule(), I get it. But I think > it still works. Because the ubi_thread is still on runqueue, it will be > scheduled to execute later anyway. It will not get woken. This is the problem. > > opstate of > ubi_thread on runqueue > set_current_state(TASK_INTERRUPTIBLE) TASK_INTERRUPTIBLE Yes > if (kthread_should_stop()) // not satisfy > TASK_INTERRUPTIBLE Yes > kthread_stop: >wake_up_process > ttwu_queue >ttwu_do_activate > ttwu_do_wakeup TASK_RUNNING Yes > schedule >__schedule(false) > > // prev->state is TASK_RUNNING, so we cannot move it from runqueue by > deactivate_task(). So just pick next task to execute, ubi_thread is > still on runqueue and will be scheduled to execute later. It will be in state TASK_RUNNING only if your check is reached. If kthread_stop() is called *before* your code: + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } ...everything is fine. But there is still a race window between your if (kthread_should_stop()) and schedule() in the next line. So if kthread_stop() is called right *after* the if and *before* schedule(), the task state is still TASK_INTERRUPTIBLE --> schedule() will not return unless the task is explicitly woken, which does not happen. Before your patch, the race window was much larger, I fully agree, but your patch does not cure the problem it just makes it harder to hit. And using mdelay() to verify such a thing is also tricky because mdelay() will influence the task state. -- Thanks, //richard
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
在 2020/8/4 6:11, Richard Weinberger 写道: On Mon, Aug 3, 2020 at 4:01 AM Zhihao Cheng wrote: Hmm, I see the problem but I fear this patch does not cure the race completely. It just lowers the chance to hit it. What if KTHREAD_SHOULD_STOP is set right after you checked for it? The patch can handle this case. ubi_thread will exit at kthread_should_stop() in next iteration. How can it reach the next iteration? Maybe I didn't fully get your explanation. As far as I understand the problem correctly, the following happens: 1. ubi_thread is running and the program counter is somewhere between "if (kthread_should_stop())" and schedule() 2. While detaching kthread_stop() is called 3. Since the program counter in the thread is right before schedule(), it does not check KTHREAD_SHOULD_STOP and blindly calls into schedule() 4. The thread goes to sleep and nothing wakes it anymore -> endless wait. Is this correct so far? Oh, you're thinking about influence by schedule(), I get it. But I think it still works. Because the ubi_thread is still on runqueue, it will be scheduled to execute later anyway. op state of ubi_thread on runqueue set_current_state(TASK_INTERRUPTIBLE) TASK_INTERRUPTIBLE Yes if (kthread_should_stop()) // not satisfy TASK_INTERRUPTIBLE Yes kthread_stop: wake_up_process ttwu_queue ttwu_do_activate ttwu_do_wakeup TASK_RUNNING Yes schedule __schedule(false) // prev->state is TASK_RUNNING, so we cannot move it from runqueue by deactivate_task(). So just pick next task to execute, ubi_thread is still on runqueue and will be scheduled to execute later. The test patch added mdelay(5000) before schedule(), which can make sure kthread_stop()->wake_up_process() executed before schedule(). Previous analysis can be proved through test. @@ -1638,6 +1641,15 @@ int ubi_thread(void *u) !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) { set_current_state(TASK_INTERRUPTIBLE); spin_unlock(>wl_lock); + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } + + pr_err("Check should stop B\n"); + mdelay(5000); + pr_err("delay 5000ms \n"); + schedule(); continue; } Your solution is putting another check for KTHREAD_SHOULD_STOP before schedule(). I argue that this will just reduce the chance to hit the race window because it can still happen that kthread_stop() is being called right after the second check and again before schedule(). Then we end up with the same situation.
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Mon, Aug 3, 2020 at 4:01 AM Zhihao Cheng wrote: > > Hmm, I see the problem but I fear this patch does not cure the race > > completely. > > It just lowers the chance to hit it. > > What if KTHREAD_SHOULD_STOP is set right after you checked for it? > > The patch can handle this case. ubi_thread will exit at > kthread_should_stop() in next iteration. How can it reach the next iteration? Maybe I didn't fully get your explanation. As far as I understand the problem correctly, the following happens: 1. ubi_thread is running and the program counter is somewhere between "if (kthread_should_stop())" and schedule() 2. While detaching kthread_stop() is called 3. Since the program counter in the thread is right before schedule(), it does not check KTHREAD_SHOULD_STOP and blindly calls into schedule() 4. The thread goes to sleep and nothing wakes it anymore -> endless wait. Is this correct so far? Your solution is putting another check for KTHREAD_SHOULD_STOP before schedule(). I argue that this will just reduce the chance to hit the race window because it can still happen that kthread_stop() is being called right after the second check and again before schedule(). Then we end up with the same situation. -- Thanks, //richard
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
在 2020/8/3 5:25, Richard Weinberger 写道: On Mon, Jun 1, 2020 at 11:13 AM Zhihao Cheng wrote: A detach hung is possible when a race occurs between the detach process and the ubi background thread. The following sequences outline the race: ubi thread: if (list_empty(>works)... ubi detach: set_bit(KTHREAD_SHOULD_STOP, >flags) => by kthread_stop() wake_up_process() => ubi thread is still running, so 0 is returned ubi thread: set_current_state(TASK_INTERRUPTIBLE) schedule() => ubi thread will never be scheduled again ubi detach: wait_for_completion() => hung task! To fix that, we need to check kthread_should_stop() after we set the task state, so the ubi thread will either see the stop bit and exit or the task state is reset to runnable such that it isn't scheduled out indefinitely. Signed-off-by: Zhihao Cheng Cc: Fixes: 801c135ce73d5df1ca ("UBI: Unsorted Block Images") Reported-by: syzbot+853639d0cb16c31c7...@syzkaller.appspotmail.com --- drivers/mtd/ubi/wl.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 5146cce5fe32..a4d4343053d7 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -1628,6 +1628,19 @@ int ubi_thread(void *u) !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) { set_current_state(TASK_INTERRUPTIBLE); spin_unlock(>wl_lock); + + /* +* Check kthread_should_stop() after we set the task +* state to guarantee that we either see the stop bit +* and exit or the task state is reset to runnable such +* that it's not scheduled out indefinitely and detects +* the stop bit at kthread_should_stop(). +*/ + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } + Hmm, I see the problem but I fear this patch does not cure the race completely. It just lowers the chance to hit it. What if KTHREAD_SHOULD_STOP is set right after you checked for it? The patch can handle this case. ubi_thread will exit at kthread_should_stop() in next iteration. You can apply following patch to verify it. (You may set 'kernel.hung_task_timeout_secs = 10' by sysctl) diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 27636063ed1b..44047861c2a1 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -91,6 +91,7 @@ #include #include "ubi.h" #include "wl.h" +#include /* Number of physical eraseblocks reserved for wear-leveling purposes */ #define WL_RESERVED_PEBS 1 @@ -1627,8 +1628,10 @@ int ubi_thread(void *u) for (;;) { int err; - if (kthread_should_stop()) + if (kthread_should_stop()) { + pr_err("Exit at stop A\n"); break; + } if (try_to_freeze()) continue; @@ -1638,6 +1641,15 @@ int ubi_thread(void *u) !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) { set_current_state(TASK_INTERRUPTIBLE); spin_unlock(>wl_lock); + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } + + pr_err("Check should stop B\n"); + mdelay(5000); + pr_err("delay 5000ms \n"); + schedule(); continue; } diff --git a/kernel/kthread.c b/kernel/kthread.c index 132f84a5fde3..18627acb9573 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -590,6 +590,10 @@ int kthread_stop(struct task_struct *k) get_task_struct(k); kthread = to_kthread(k); set_bit(KTHREAD_SHOULD_STOP, >flags); + + if (k->comm && !strncmp(k->comm, "ubi_bgt0d", strlen("ubi_bgt0d"))) + pr_err("Stop flag has been set for task %s\n", k->comm); + kthread_unpark(k); wake_up_process(k); wait_for_completion(>exited); kernel msg: [ 210.602005] Check should stop B # Please execute ubi_detach manually when you see this [ 211.347638] Stop flag has been set for task ubi_bgt0d [ 215.603026] delay 5000ms [ 215.605728] Exit at stop A schedule(); continue; }
Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state
On Mon, Jun 1, 2020 at 11:13 AM Zhihao Cheng wrote: > > A detach hung is possible when a race occurs between the detach process > and the ubi background thread. The following sequences outline the race: > > ubi thread: if (list_empty(>works)... > > ubi detach: set_bit(KTHREAD_SHOULD_STOP, >flags) > => by kthread_stop() > wake_up_process() > => ubi thread is still running, so 0 is returned > > ubi thread: set_current_state(TASK_INTERRUPTIBLE) > schedule() > => ubi thread will never be scheduled again > > ubi detach: wait_for_completion() > => hung task! > > To fix that, we need to check kthread_should_stop() after we set the > task state, so the ubi thread will either see the stop bit and exit or > the task state is reset to runnable such that it isn't scheduled out > indefinitely. > > Signed-off-by: Zhihao Cheng > Cc: > Fixes: 801c135ce73d5df1ca ("UBI: Unsorted Block Images") > Reported-by: syzbot+853639d0cb16c31c7...@syzkaller.appspotmail.com > --- > drivers/mtd/ubi/wl.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c > index 5146cce5fe32..a4d4343053d7 100644 > --- a/drivers/mtd/ubi/wl.c > +++ b/drivers/mtd/ubi/wl.c > @@ -1628,6 +1628,19 @@ int ubi_thread(void *u) > !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) { > set_current_state(TASK_INTERRUPTIBLE); > spin_unlock(>wl_lock); > + > + /* > +* Check kthread_should_stop() after we set the task > +* state to guarantee that we either see the stop bit > +* and exit or the task state is reset to runnable > such > +* that it's not scheduled out indefinitely and > detects > +* the stop bit at kthread_should_stop(). > +*/ > + if (kthread_should_stop()) { > + set_current_state(TASK_RUNNING); > + break; > + } > + Hmm, I see the problem but I fear this patch does not cure the race completely. It just lowers the chance to hit it. What if KTHREAD_SHOULD_STOP is set right after you checked for it? > schedule(); > continue; > } -- Thanks, //richard
[PATCH] ubi: check kthread_should_stop() after the setting of task state
A detach hung is possible when a race occurs between the detach process and the ubi background thread. The following sequences outline the race: ubi thread: if (list_empty(>works)... ubi detach: set_bit(KTHREAD_SHOULD_STOP, >flags) => by kthread_stop() wake_up_process() => ubi thread is still running, so 0 is returned ubi thread: set_current_state(TASK_INTERRUPTIBLE) schedule() => ubi thread will never be scheduled again ubi detach: wait_for_completion() => hung task! To fix that, we need to check kthread_should_stop() after we set the task state, so the ubi thread will either see the stop bit and exit or the task state is reset to runnable such that it isn't scheduled out indefinitely. Signed-off-by: Zhihao Cheng Cc: Fixes: 801c135ce73d5df1ca ("UBI: Unsorted Block Images") Reported-by: syzbot+853639d0cb16c31c7...@syzkaller.appspotmail.com --- drivers/mtd/ubi/wl.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 5146cce5fe32..a4d4343053d7 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -1628,6 +1628,19 @@ int ubi_thread(void *u) !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) { set_current_state(TASK_INTERRUPTIBLE); spin_unlock(>wl_lock); + + /* +* Check kthread_should_stop() after we set the task +* state to guarantee that we either see the stop bit +* and exit or the task state is reset to runnable such +* that it's not scheduled out indefinitely and detects +* the stop bit at kthread_should_stop(). +*/ + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + break; + } + schedule(); continue; } -- 2.25.4