Re: rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)
On Thu, Aug 09, 2007 at 09:03:53PM +0400, Oleg Nesterov wrote: > On 08/07, Oleg Nesterov wrote: > > > > On 08/07, Gautham R Shenoy wrote: > > > > > > A will now call kthread_bind(B, cpu1). > > > kthread_bind(), calls wait_task_inactive(B), to ensures that > > > B has scheduled itself out. > > > > > > B is still on the runqueue, so A calls yield() in wait_task_inactive(). > > > But since A is the task with the highest prio, scheduler schedules it > > > back again. > > > > > > Thus B never gets to run to schedule itself out. > > > A loops waiting for B to schedule out leading to system hang. > > > > But I think we have another case. An RT ptracer can share the same CPU > > with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes > > a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and > > yields in wait_task_inactive. > > Even simpler. > > #include > #include > #include > #include > #include > #define __USE_GNU > #include > > void die(const char *msg) > { > printf("ERR!! %s: %m\n", msg); > kill(0, SIGKILL); > } > > void set_cpu(int cpu) > { > unsigned cpuval = 1 << cpu; > if (sched_setaffinity(0, 4, (void*)) < 0) > die("setaffinity"); > } > > // __wake_up_parent() does SYNC wake up, we need a handler to provoke > // signal_wake_up(). > // otherwise ptrace_stop() is not preempted after read_unlock(tasklist). > static void sigchld(int sig) > { > } > > int main(void) > { > set_cpu(0); > > int pid = fork(); > if (!pid) > for (;;) > ; > > struct sched_param sp = { 99 }; > if (sched_setscheduler(0, SCHED_FIFO, )) > die("setscheduler"); > > signal(SIGCHLD, sigchld); > > if (ptrace(PTRACE_ATTACH, pid, NULL, NULL)) > die("attach"); > > wait(NULL); > > if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) > die("detach"); > > kill(pid, SIGKILL); > > return 0; > } > > Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task > could be reniced and killed, but still not good. > > ptracee does ptrace_stop()->do_notify_parent_cldstop(), ptracer preempts > the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to > wait_task_inactive() and yields forever. > > Can we just replace yield() with schedule_timeout_uninterruptible(1) ? > wait_task_inactive() has no time-critical callers, and as it currently > used "on_rq" case is really unlikely. schedule_timeout_uninterruptible(1) works fine, in my case. It makes sense to have it there instead of yield. Like you pointed out, it gets called only in "unlikely" case. patch below. Thanks and Regards gautham. --> yield() in wait_task_inactive(), can cause a high priority thread to be scheduled back in, and there by loop forever while it is waiting for some lower priority thread which is unfortunately still on the runqueue. Use schedule_timeout_uninterruptible(1) instead. Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]> Credit: Oleg Nesterov <[EMAIL PROTECTED]> --- kernel/sched.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc2/kernel/sched.c === --- linux-2.6.23-rc2.orig/kernel/sched.c +++ linux-2.6.23-rc2/kernel/sched.c @@ -1106,7 +1106,7 @@ repeat: * yield - it could be a while. */ if (unlikely(on_rq)) { - yield(); + schedule_timeout_uninterruptible(1); goto repeat; } -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)
On Thu, Aug 09, 2007 at 09:03:53PM +0400, Oleg Nesterov wrote: On 08/07, Oleg Nesterov wrote: On 08/07, Gautham R Shenoy wrote: A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. But I think we have another case. An RT ptracer can share the same CPU with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and yields in wait_task_inactive. Even simpler. #include stdio.h #include signal.h #include unistd.h #include sys/ptrace.h #include sys/wait.h #define __USE_GNU #include sched.h void die(const char *msg) { printf(ERR!! %s: %m\n, msg); kill(0, SIGKILL); } void set_cpu(int cpu) { unsigned cpuval = 1 cpu; if (sched_setaffinity(0, 4, (void*)cpuval) 0) die(setaffinity); } // __wake_up_parent() does SYNC wake up, we need a handler to provoke // signal_wake_up(). // otherwise ptrace_stop() is not preempted after read_unlock(tasklist). static void sigchld(int sig) { } int main(void) { set_cpu(0); int pid = fork(); if (!pid) for (;;) ; struct sched_param sp = { 99 }; if (sched_setscheduler(0, SCHED_FIFO, sp)) die(setscheduler); signal(SIGCHLD, sigchld); if (ptrace(PTRACE_ATTACH, pid, NULL, NULL)) die(attach); wait(NULL); if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) die(detach); kill(pid, SIGKILL); return 0; } Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task could be reniced and killed, but still not good. ptracee does ptrace_stop()-do_notify_parent_cldstop(), ptracer preempts the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to wait_task_inactive() and yields forever. Can we just replace yield() with schedule_timeout_uninterruptible(1) ? wait_task_inactive() has no time-critical callers, and as it currently used on_rq case is really unlikely. schedule_timeout_uninterruptible(1) works fine, in my case. It makes sense to have it there instead of yield. Like you pointed out, it gets called only in unlikely case. patch below. Thanks and Regards gautham. -- yield() in wait_task_inactive(), can cause a high priority thread to be scheduled back in, and there by loop forever while it is waiting for some lower priority thread which is unfortunately still on the runqueue. Use schedule_timeout_uninterruptible(1) instead. Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED] Credit: Oleg Nesterov [EMAIL PROTECTED] --- kernel/sched.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc2/kernel/sched.c === --- linux-2.6.23-rc2.orig/kernel/sched.c +++ linux-2.6.23-rc2/kernel/sched.c @@ -1106,7 +1106,7 @@ repeat: * yield - it could be a while. */ if (unlikely(on_rq)) { - yield(); + schedule_timeout_uninterruptible(1); goto repeat; } -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)
On 08/07, Oleg Nesterov wrote: > > On 08/07, Gautham R Shenoy wrote: > > > > A will now call kthread_bind(B, cpu1). > > kthread_bind(), calls wait_task_inactive(B), to ensures that > > B has scheduled itself out. > > > > B is still on the runqueue, so A calls yield() in wait_task_inactive(). > > But since A is the task with the highest prio, scheduler schedules it > > back again. > > > > Thus B never gets to run to schedule itself out. > > A loops waiting for B to schedule out leading to system hang. > > But I think we have another case. An RT ptracer can share the same CPU > with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes > a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and > yields in wait_task_inactive. Even simpler. #include #include #include #include #include #define __USE_GNU #include void die(const char *msg) { printf("ERR!! %s: %m\n", msg); kill(0, SIGKILL); } void set_cpu(int cpu) { unsigned cpuval = 1 << cpu; if (sched_setaffinity(0, 4, (void*)) < 0) die("setaffinity"); } // __wake_up_parent() does SYNC wake up, we need a handler to provoke // signal_wake_up(). // otherwise ptrace_stop() is not preempted after read_unlock(tasklist). static void sigchld(int sig) { } int main(void) { set_cpu(0); int pid = fork(); if (!pid) for (;;) ; struct sched_param sp = { 99 }; if (sched_setscheduler(0, SCHED_FIFO, )) die("setscheduler"); signal(SIGCHLD, sigchld); if (ptrace(PTRACE_ATTACH, pid, NULL, NULL)) die("attach"); wait(NULL); if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) die("detach"); kill(pid, SIGKILL); return 0; } Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task could be reniced and killed, but still not good. ptracee does ptrace_stop()->do_notify_parent_cldstop(), ptracer preempts the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to wait_task_inactive() and yields forever. Can we just replace yield() with schedule_timeout_uninterruptible(1) ? wait_task_inactive() has no time-critical callers, and as it currently used "on_rq" case is really unlikely. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)
On 08/07, Oleg Nesterov wrote: On 08/07, Gautham R Shenoy wrote: A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. But I think we have another case. An RT ptracer can share the same CPU with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and yields in wait_task_inactive. Even simpler. #include stdio.h #include signal.h #include unistd.h #include sys/ptrace.h #include sys/wait.h #define __USE_GNU #include sched.h void die(const char *msg) { printf(ERR!! %s: %m\n, msg); kill(0, SIGKILL); } void set_cpu(int cpu) { unsigned cpuval = 1 cpu; if (sched_setaffinity(0, 4, (void*)cpuval) 0) die(setaffinity); } // __wake_up_parent() does SYNC wake up, we need a handler to provoke // signal_wake_up(). // otherwise ptrace_stop() is not preempted after read_unlock(tasklist). static void sigchld(int sig) { } int main(void) { set_cpu(0); int pid = fork(); if (!pid) for (;;) ; struct sched_param sp = { 99 }; if (sched_setscheduler(0, SCHED_FIFO, sp)) die(setscheduler); signal(SIGCHLD, sigchld); if (ptrace(PTRACE_ATTACH, pid, NULL, NULL)) die(attach); wait(NULL); if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) die(detach); kill(pid, SIGKILL); return 0; } Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task could be reniced and killed, but still not good. ptracee does ptrace_stop()-do_notify_parent_cldstop(), ptracer preempts the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to wait_task_inactive() and yields forever. Can we just replace yield() with schedule_timeout_uninterruptible(1) ? wait_task_inactive() has no time-critical callers, and as it currently used on_rq case is really unlikely. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On 08/07, Venki Pallipadi wrote: > > On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: > > > > As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just > > an optimization, and easy to "fix": > > > > --- kernel/kthread.c2007-07-28 16:58:17.0 +0400 > > +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 > > @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, > > WARN_ON(1); > > return; > > } > > - /* Must have done schedule() in kthread() before we set_task_cpu */ > > - wait_task_inactive(k); > > - set_task_cpu(k, cpu); > > - k->cpus_allowed = cpumask_of_cpu(cpu); > > + set_cpus_allowed(current, cpumask_of_cpu(cpu)); > > } > > EXPORT_SYMBOL(kthread_bind); > > > > Not sure whether set_cpus_allowed() will work here. Looks like, it needs the > CPU to be online during the call and in kthread_bind() case CPU may be > offline. Aah, you are right, of course. Thanks, Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: > On 08/07, Gautham R Shenoy wrote: > > > > After some debugging, I saw that the hang occured because > > the high prio process was stuck in a loop doing yield() inside > > wait_task_inactive(). Description follows: > > > > Say a high-prio task (A) does a kthread_create(B), > > followed by a kthread_bind(B, cpu1). At this moment, > > only cpu0 is online. > > > > Now, immediately after being created, B would > > do a > > complete(>started) [kernel/kthread.c: kthread()], > > before scheduling itself out. > > > > This complete() will wake up kthreadd, which had spawned B. > > It is possible that during the wakeup, kthreadd might preempt B. > > Thus, B is still on the runqueue, and not yet called schedule(). > > > > kthreadd, will inturn do a > > complete(>done); [kernel/kthread.c: create_kthread()] > > which will wake up the thread which had called kthread_create(). > > In our case it's task A, which will run immediately, since its priority > > is higher. > > > > A will now call kthread_bind(B, cpu1). > > kthread_bind(), calls wait_task_inactive(B), to ensures that > > B has scheduled itself out. > > > > B is still on the runqueue, so A calls yield() in wait_task_inactive(). > > But since A is the task with the highest prio, scheduler schedules it > > back again. > > > > Thus B never gets to run to schedule itself out. > > A loops waiting for B to schedule out leading to system hang. > > As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just > an optimization, and easy to "fix": > > --- kernel/kthread.c 2007-07-28 16:58:17.0 +0400 > +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 > @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, > WARN_ON(1); > return; > } > - /* Must have done schedule() in kthread() before we set_task_cpu */ > - wait_task_inactive(k); > - set_task_cpu(k, cpu); > - k->cpus_allowed = cpumask_of_cpu(cpu); > + set_cpus_allowed(current, cpumask_of_cpu(cpu)); > } > EXPORT_SYMBOL(kthread_bind); > Not sure whether set_cpus_allowed() will work here. Looks like, it needs the CPU to be online during the call and in kthread_bind() case CPU may be offline. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On 08/07, Gautham R Shenoy wrote: > > After some debugging, I saw that the hang occured because > the high prio process was stuck in a loop doing yield() inside > wait_task_inactive(). Description follows: > > Say a high-prio task (A) does a kthread_create(B), > followed by a kthread_bind(B, cpu1). At this moment, > only cpu0 is online. > > Now, immediately after being created, B would > do a > complete(>started) [kernel/kthread.c: kthread()], > before scheduling itself out. > > This complete() will wake up kthreadd, which had spawned B. > It is possible that during the wakeup, kthreadd might preempt B. > Thus, B is still on the runqueue, and not yet called schedule(). > > kthreadd, will inturn do a > complete(>done); [kernel/kthread.c: create_kthread()] > which will wake up the thread which had called kthread_create(). > In our case it's task A, which will run immediately, since its priority > is higher. > > A will now call kthread_bind(B, cpu1). > kthread_bind(), calls wait_task_inactive(B), to ensures that > B has scheduled itself out. > > B is still on the runqueue, so A calls yield() in wait_task_inactive(). > But since A is the task with the highest prio, scheduler schedules it > back again. > > Thus B never gets to run to schedule itself out. > A loops waiting for B to schedule out leading to system hang. As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just an optimization, and easy to "fix": --- kernel/kthread.c2007-07-28 16:58:17.0 +0400 +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, WARN_ON(1); return; } - /* Must have done schedule() in kthread() before we set_task_cpu */ - wait_task_inactive(k); - set_task_cpu(k, cpu); - k->cpus_allowed = cpumask_of_cpu(cpu); + set_cpus_allowed(current, cpumask_of_cpu(cpu)); } EXPORT_SYMBOL(kthread_bind); But I think we have another case. An RT ptracer can share the same CPU with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and yields in wait_task_inactive. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Cpu-Hotplug and Real-Time
Hi, While running a cpu-hotplug test involving a high priority process (SCHED_RR, prio=94) trying to periodically offline and online cpu1 on a 2-processor machine, I noticed that the system was becoming unresponsive after a few iterations. However, when the same test was repeated with processors greater than 2, it worked fine. Also, if the hotplugging process, was not of rt-prio, it worked fine on a 2-processor machine. After some debugging, I saw that the hang occured because the high prio process was stuck in a loop doing yield() inside wait_task_inactive(). Description follows: Say a high-prio task (A) does a kthread_create(B), followed by a kthread_bind(B, cpu1). At this moment, only cpu0 is online. Now, immediately after being created, B would do a complete(>started) [kernel/kthread.c: kthread()], before scheduling itself out. This complete() will wake up kthreadd, which had spawned B. It is possible that during the wakeup, kthreadd might preempt B. Thus, B is still on the runqueue, and not yet called schedule(). kthreadd, will inturn do a complete(>done); [kernel/kthread.c: create_kthread()] which will wake up the thread which had called kthread_create(). In our case it's task A, which will run immediately, since its priority is higher. A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. In my case, A was the high priority process trying to bring up cpu1, and thus doing a kthread_create/kthread_bind in migration_call(): CPU_UP_PREPARE. B was the migration thread for cpu1. And the above problem occurs when only one cpu is online. Possible solutions to this problem: a) Let the newly spawned kernel threads inherit their parent's prio and policy. b) Instead of using yield() in wait_task_inactive(), we could use something like a yield_to(p): yield_to(struct task_struct p) { int old_prio = p->prio; /* Temporarily boost p's priority atleast to that of current task */ if (current->prio > old_prio) set_prio(p, current->prio); yield(); /* Reset priority back to the original value */ set_prio(p, old_prio); } Thoughts? Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Cpu-Hotplug and Real-Time
Hi, While running a cpu-hotplug test involving a high priority process (SCHED_RR, prio=94) trying to periodically offline and online cpu1 on a 2-processor machine, I noticed that the system was becoming unresponsive after a few iterations. However, when the same test was repeated with processors greater than 2, it worked fine. Also, if the hotplugging process, was not of rt-prio, it worked fine on a 2-processor machine. After some debugging, I saw that the hang occured because the high prio process was stuck in a loop doing yield() inside wait_task_inactive(). Description follows: Say a high-prio task (A) does a kthread_create(B), followed by a kthread_bind(B, cpu1). At this moment, only cpu0 is online. Now, immediately after being created, B would do a complete(create-started) [kernel/kthread.c: kthread()], before scheduling itself out. This complete() will wake up kthreadd, which had spawned B. It is possible that during the wakeup, kthreadd might preempt B. Thus, B is still on the runqueue, and not yet called schedule(). kthreadd, will inturn do a complete(create-done); [kernel/kthread.c: create_kthread()] which will wake up the thread which had called kthread_create(). In our case it's task A, which will run immediately, since its priority is higher. A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. In my case, A was the high priority process trying to bring up cpu1, and thus doing a kthread_create/kthread_bind in migration_call(): CPU_UP_PREPARE. B was the migration thread for cpu1. And the above problem occurs when only one cpu is online. Possible solutions to this problem: a) Let the newly spawned kernel threads inherit their parent's prio and policy. b) Instead of using yield() in wait_task_inactive(), we could use something like a yield_to(p): yield_to(struct task_struct p) { int old_prio = p-prio; /* Temporarily boost p's priority atleast to that of current task */ if (current-prio old_prio) set_prio(p, current-prio); yield(); /* Reset priority back to the original value */ set_prio(p, old_prio); } Thoughts? Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On 08/07, Gautham R Shenoy wrote: After some debugging, I saw that the hang occured because the high prio process was stuck in a loop doing yield() inside wait_task_inactive(). Description follows: Say a high-prio task (A) does a kthread_create(B), followed by a kthread_bind(B, cpu1). At this moment, only cpu0 is online. Now, immediately after being created, B would do a complete(create-started) [kernel/kthread.c: kthread()], before scheduling itself out. This complete() will wake up kthreadd, which had spawned B. It is possible that during the wakeup, kthreadd might preempt B. Thus, B is still on the runqueue, and not yet called schedule(). kthreadd, will inturn do a complete(create-done); [kernel/kthread.c: create_kthread()] which will wake up the thread which had called kthread_create(). In our case it's task A, which will run immediately, since its priority is higher. A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just an optimization, and easy to fix: --- kernel/kthread.c2007-07-28 16:58:17.0 +0400 +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, WARN_ON(1); return; } - /* Must have done schedule() in kthread() before we set_task_cpu */ - wait_task_inactive(k); - set_task_cpu(k, cpu); - k-cpus_allowed = cpumask_of_cpu(cpu); + set_cpus_allowed(current, cpumask_of_cpu(cpu)); } EXPORT_SYMBOL(kthread_bind); But I think we have another case. An RT ptracer can share the same CPU with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and yields in wait_task_inactive. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: On 08/07, Gautham R Shenoy wrote: After some debugging, I saw that the hang occured because the high prio process was stuck in a loop doing yield() inside wait_task_inactive(). Description follows: Say a high-prio task (A) does a kthread_create(B), followed by a kthread_bind(B, cpu1). At this moment, only cpu0 is online. Now, immediately after being created, B would do a complete(create-started) [kernel/kthread.c: kthread()], before scheduling itself out. This complete() will wake up kthreadd, which had spawned B. It is possible that during the wakeup, kthreadd might preempt B. Thus, B is still on the runqueue, and not yet called schedule(). kthreadd, will inturn do a complete(create-done); [kernel/kthread.c: create_kthread()] which will wake up the thread which had called kthread_create(). In our case it's task A, which will run immediately, since its priority is higher. A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just an optimization, and easy to fix: --- kernel/kthread.c 2007-07-28 16:58:17.0 +0400 +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, WARN_ON(1); return; } - /* Must have done schedule() in kthread() before we set_task_cpu */ - wait_task_inactive(k); - set_task_cpu(k, cpu); - k-cpus_allowed = cpumask_of_cpu(cpu); + set_cpus_allowed(current, cpumask_of_cpu(cpu)); } EXPORT_SYMBOL(kthread_bind); Not sure whether set_cpus_allowed() will work here. Looks like, it needs the CPU to be online during the call and in kthread_bind() case CPU may be offline. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On 08/07, Venki Pallipadi wrote: On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just an optimization, and easy to fix: --- kernel/kthread.c2007-07-28 16:58:17.0 +0400 +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, WARN_ON(1); return; } - /* Must have done schedule() in kthread() before we set_task_cpu */ - wait_task_inactive(k); - set_task_cpu(k, cpu); - k-cpus_allowed = cpumask_of_cpu(cpu); + set_cpus_allowed(current, cpumask_of_cpu(cpu)); } EXPORT_SYMBOL(kthread_bind); Not sure whether set_cpus_allowed() will work here. Looks like, it needs the CPU to be online during the call and in kthread_bind() case CPU may be offline. Aah, you are right, of course. Thanks, Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/