Re: rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)

2007-08-16 Thread Gautham R Shenoy
On Thu, Aug 09, 2007 at 09:03:53PM +0400, Oleg Nesterov wrote:
> On 08/07, Oleg Nesterov wrote:
> >
> > On 08/07, Gautham R Shenoy wrote:
> > >
> > > A will now call kthread_bind(B, cpu1).
> > > kthread_bind(), calls wait_task_inactive(B), to ensures that 
> > > B has scheduled itself out.
> > > 
> > > B is still on the runqueue, so A calls yield() in wait_task_inactive().
> > > But since A is the task with the highest prio, scheduler schedules it
> > > back again.
> > > 
> > > Thus B never gets to run to schedule itself out.
> > > A loops waiting for B to schedule out leading  to system hang.
> > 
> > But I think we have another case. An RT ptracer can share the same CPU
> > with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes
> > a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
> > yields in wait_task_inactive.
> 
> Even simpler.
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #define   __USE_GNU
> #include 
> 
> void die(const char *msg)
> {
>   printf("ERR!! %s: %m\n", msg);
> kill(0, SIGKILL);
> }
> 
> void set_cpu(int cpu)
> {
>   unsigned cpuval = 1 << cpu;
>   if (sched_setaffinity(0, 4, (void*)) < 0)
>   die("setaffinity");
> }
> 
> // __wake_up_parent() does SYNC wake up, we need a handler to provoke
> // signal_wake_up().
> // otherwise ptrace_stop() is not preempted after read_unlock(tasklist).
> static void sigchld(int sig)
> {
> }
> 
> int main(void)
> {
>   set_cpu(0);
> 
>   int pid = fork();
>   if (!pid)
>   for (;;)
>   ;
> 
>   struct sched_param sp = { 99 };
>   if (sched_setscheduler(0, SCHED_FIFO, ))
>   die("setscheduler");
> 
>   signal(SIGCHLD, sigchld);
> 
>   if (ptrace(PTRACE_ATTACH, pid, NULL, NULL))
>   die("attach");
> 
>   wait(NULL);
> 
>   if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
>   die("detach");
> 
>   kill(pid, SIGKILL);
> 
>   return 0;
> }
> 
> Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task
> could be reniced and killed, but still not good.
> 
> ptracee does ptrace_stop()->do_notify_parent_cldstop(), ptracer preempts
> the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to
> wait_task_inactive() and yields forever.
> 
> Can we just replace yield() with schedule_timeout_uninterruptible(1) ?
> wait_task_inactive() has no time-critical callers, and as it currently
> used "on_rq" case is really unlikely.

schedule_timeout_uninterruptible(1) works fine, in my case.
It makes sense to have it there instead of yield. Like you pointed out, 
it gets called only in "unlikely" case.

patch below.
Thanks and Regards
gautham.

-->
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue. 

Use schedule_timeout_uninterruptible(1) instead.

Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
Credit: Oleg Nesterov <[EMAIL PROTECTED]>

---
 kernel/sched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc2/kernel/sched.c
===
--- linux-2.6.23-rc2.orig/kernel/sched.c
+++ linux-2.6.23-rc2/kernel/sched.c
@@ -1106,7 +1106,7 @@ repeat:
 * yield - it could be a while.
 */
if (unlikely(on_rq)) {
-   yield();
+   schedule_timeout_uninterruptible(1);
goto repeat;
}
 

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)

2007-08-16 Thread Gautham R Shenoy
On Thu, Aug 09, 2007 at 09:03:53PM +0400, Oleg Nesterov wrote:
 On 08/07, Oleg Nesterov wrote:
 
  On 08/07, Gautham R Shenoy wrote:
  
   A will now call kthread_bind(B, cpu1).
   kthread_bind(), calls wait_task_inactive(B), to ensures that 
   B has scheduled itself out.
   
   B is still on the runqueue, so A calls yield() in wait_task_inactive().
   But since A is the task with the highest prio, scheduler schedules it
   back again.
   
   Thus B never gets to run to schedule itself out.
   A loops waiting for B to schedule out leading  to system hang.
  
  But I think we have another case. An RT ptracer can share the same CPU
  with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes
  a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
  yields in wait_task_inactive.
 
 Even simpler.
 
 #include stdio.h
 #include signal.h
 #include unistd.h
 #include sys/ptrace.h
 #include sys/wait.h
 #define   __USE_GNU
 #include sched.h
 
 void die(const char *msg)
 {
   printf(ERR!! %s: %m\n, msg);
 kill(0, SIGKILL);
 }
 
 void set_cpu(int cpu)
 {
   unsigned cpuval = 1  cpu;
   if (sched_setaffinity(0, 4, (void*)cpuval)  0)
   die(setaffinity);
 }
 
 // __wake_up_parent() does SYNC wake up, we need a handler to provoke
 // signal_wake_up().
 // otherwise ptrace_stop() is not preempted after read_unlock(tasklist).
 static void sigchld(int sig)
 {
 }
 
 int main(void)
 {
   set_cpu(0);
 
   int pid = fork();
   if (!pid)
   for (;;)
   ;
 
   struct sched_param sp = { 99 };
   if (sched_setscheduler(0, SCHED_FIFO, sp))
   die(setscheduler);
 
   signal(SIGCHLD, sigchld);
 
   if (ptrace(PTRACE_ATTACH, pid, NULL, NULL))
   die(attach);
 
   wait(NULL);
 
   if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
   die(detach);
 
   kill(pid, SIGKILL);
 
   return 0;
 }
 
 Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task
 could be reniced and killed, but still not good.
 
 ptracee does ptrace_stop()-do_notify_parent_cldstop(), ptracer preempts
 the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to
 wait_task_inactive() and yields forever.
 
 Can we just replace yield() with schedule_timeout_uninterruptible(1) ?
 wait_task_inactive() has no time-critical callers, and as it currently
 used on_rq case is really unlikely.

schedule_timeout_uninterruptible(1) works fine, in my case.
It makes sense to have it there instead of yield. Like you pointed out, 
it gets called only in unlikely case.

patch below.
Thanks and Regards
gautham.

--
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue. 

Use schedule_timeout_uninterruptible(1) instead.

Signed-off-by: Gautham R Shenoy [EMAIL PROTECTED]
Credit: Oleg Nesterov [EMAIL PROTECTED]

---
 kernel/sched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc2/kernel/sched.c
===
--- linux-2.6.23-rc2.orig/kernel/sched.c
+++ linux-2.6.23-rc2/kernel/sched.c
@@ -1106,7 +1106,7 @@ repeat:
 * yield - it could be a while.
 */
if (unlikely(on_rq)) {
-   yield();
+   schedule_timeout_uninterruptible(1);
goto repeat;
}
 

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)

2007-08-09 Thread Oleg Nesterov
On 08/07, Oleg Nesterov wrote:
>
> On 08/07, Gautham R Shenoy wrote:
> >
> > A will now call kthread_bind(B, cpu1).
> > kthread_bind(), calls wait_task_inactive(B), to ensures that 
> > B has scheduled itself out.
> > 
> > B is still on the runqueue, so A calls yield() in wait_task_inactive().
> > But since A is the task with the highest prio, scheduler schedules it
> > back again.
> > 
> > Thus B never gets to run to schedule itself out.
> > A loops waiting for B to schedule out leading  to system hang.
> 
> But I think we have another case. An RT ptracer can share the same CPU
> with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes
> a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
> yields in wait_task_inactive.

Even simpler.

#include 
#include 
#include 
#include 
#include 
#define __USE_GNU
#include 

void die(const char *msg)
{
printf("ERR!! %s: %m\n", msg);
kill(0, SIGKILL);
}

void set_cpu(int cpu)
{
unsigned cpuval = 1 << cpu;
if (sched_setaffinity(0, 4, (void*)) < 0)
die("setaffinity");
}

// __wake_up_parent() does SYNC wake up, we need a handler to provoke
// signal_wake_up().
// otherwise ptrace_stop() is not preempted after read_unlock(tasklist).
static void sigchld(int sig)
{
}

int main(void)
{
set_cpu(0);

int pid = fork();
if (!pid)
for (;;)
;

struct sched_param sp = { 99 };
if (sched_setscheduler(0, SCHED_FIFO, ))
die("setscheduler");

signal(SIGCHLD, sigchld);

if (ptrace(PTRACE_ATTACH, pid, NULL, NULL))
die("attach");

wait(NULL);

if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
die("detach");

kill(pid, SIGKILL);

return 0;
}

Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task
could be reniced and killed, but still not good.

ptracee does ptrace_stop()->do_notify_parent_cldstop(), ptracer preempts
the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to
wait_task_inactive() and yields forever.

Can we just replace yield() with schedule_timeout_uninterruptible(1) ?
wait_task_inactive() has no time-critical callers, and as it currently
used "on_rq" case is really unlikely.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


rt ptracer can monopolize CPU (was: Cpu-Hotplug and Real-Time)

2007-08-09 Thread Oleg Nesterov
On 08/07, Oleg Nesterov wrote:

 On 08/07, Gautham R Shenoy wrote:
 
  A will now call kthread_bind(B, cpu1).
  kthread_bind(), calls wait_task_inactive(B), to ensures that 
  B has scheduled itself out.
  
  B is still on the runqueue, so A calls yield() in wait_task_inactive().
  But since A is the task with the highest prio, scheduler schedules it
  back again.
  
  Thus B never gets to run to schedule itself out.
  A loops waiting for B to schedule out leading  to system hang.
 
 But I think we have another case. An RT ptracer can share the same CPU
 with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes
 a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
 yields in wait_task_inactive.

Even simpler.

#include stdio.h
#include signal.h
#include unistd.h
#include sys/ptrace.h
#include sys/wait.h
#define __USE_GNU
#include sched.h

void die(const char *msg)
{
printf(ERR!! %s: %m\n, msg);
kill(0, SIGKILL);
}

void set_cpu(int cpu)
{
unsigned cpuval = 1  cpu;
if (sched_setaffinity(0, 4, (void*)cpuval)  0)
die(setaffinity);
}

// __wake_up_parent() does SYNC wake up, we need a handler to provoke
// signal_wake_up().
// otherwise ptrace_stop() is not preempted after read_unlock(tasklist).
static void sigchld(int sig)
{
}

int main(void)
{
set_cpu(0);

int pid = fork();
if (!pid)
for (;;)
;

struct sched_param sp = { 99 };
if (sched_setscheduler(0, SCHED_FIFO, sp))
die(setscheduler);

signal(SIGCHLD, sigchld);

if (ptrace(PTRACE_ATTACH, pid, NULL, NULL))
die(attach);

wait(NULL);

if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
die(detach);

kill(pid, SIGKILL);

return 0;
}

Locks CPU 0. Not a security problem, needs CAP_SYS_NICE and the task
could be reniced and killed, but still not good.

ptracee does ptrace_stop()-do_notify_parent_cldstop(), ptracer preempts
the child before it calls schedule(), ptrace(PTRACE_DETACH) goes to
wait_task_inactive() and yields forever.

Can we just replace yield() with schedule_timeout_uninterruptible(1) ?
wait_task_inactive() has no time-critical callers, and as it currently
used on_rq case is really unlikely.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Oleg Nesterov
On 08/07, Venki Pallipadi wrote:
>
> On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote:
> > 
> > As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
> > an optimization, and easy to "fix":
> > 
> > --- kernel/kthread.c2007-07-28 16:58:17.0 +0400
> > +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400
> > @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
> > WARN_ON(1);
> > return;
> > }
> > -   /* Must have done schedule() in kthread() before we set_task_cpu */
> > -   wait_task_inactive(k);
> > -   set_task_cpu(k, cpu);
> > -   k->cpus_allowed = cpumask_of_cpu(cpu);
> > +   set_cpus_allowed(current, cpumask_of_cpu(cpu));
> >  }
> >  EXPORT_SYMBOL(kthread_bind);
> > 
> 
> Not sure whether set_cpus_allowed() will work here. Looks like, it needs the
> CPU to be online during the call and in kthread_bind() case CPU may be 
> offline.

Aah, you are right, of course.

Thanks,

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Venki Pallipadi
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote:
> On 08/07, Gautham R Shenoy wrote:
> >
> > After some debugging, I saw that the hang occured because
> > the high prio process was stuck in a loop doing yield() inside
> > wait_task_inactive(). Description follows:
> > 
> > Say a high-prio task (A) does a kthread_create(B),
> > followed by a kthread_bind(B, cpu1). At this moment, 
> > only cpu0 is online.
> > 
> > Now, immediately after being created, B would
> > do a 
> > complete(>started) [kernel/kthread.c: kthread()], 
> > before scheduling itself out.
> > 
> > This complete() will wake up kthreadd, which had spawned B.
> > It is possible that during the wakeup, kthreadd might preempt B.
> > Thus, B is still on the runqueue, and not yet called schedule().
> > 
> > kthreadd, will inturn do a 
> > complete(>done); [kernel/kthread.c: create_kthread()]
> > which will wake up the thread which had called kthread_create().
> > In our case it's task A, which will run immediately, since its priority
> > is higher.
> > 
> > A will now call kthread_bind(B, cpu1).
> > kthread_bind(), calls wait_task_inactive(B), to ensures that 
> > B has scheduled itself out.
> > 
> > B is still on the runqueue, so A calls yield() in wait_task_inactive().
> > But since A is the task with the highest prio, scheduler schedules it
> > back again.
> > 
> > Thus B never gets to run to schedule itself out.
> > A loops waiting for B to schedule out leading  to system hang.
> 
> As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
> an optimization, and easy to "fix":
> 
> --- kernel/kthread.c  2007-07-28 16:58:17.0 +0400
> +++ /proc/self/fd/0   2007-08-07 18:56:54.248073547 +0400
> @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
>   WARN_ON(1);
>   return;
>   }
> - /* Must have done schedule() in kthread() before we set_task_cpu */
> - wait_task_inactive(k);
> - set_task_cpu(k, cpu);
> - k->cpus_allowed = cpumask_of_cpu(cpu);
> + set_cpus_allowed(current, cpumask_of_cpu(cpu));
>  }
>  EXPORT_SYMBOL(kthread_bind);
> 

Not sure whether set_cpus_allowed() will work here. Looks like, it needs the
CPU to be online during the call and in kthread_bind() case CPU may be offline.

Thanks,
Venki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Oleg Nesterov
On 08/07, Gautham R Shenoy wrote:
>
> After some debugging, I saw that the hang occured because
> the high prio process was stuck in a loop doing yield() inside
> wait_task_inactive(). Description follows:
> 
> Say a high-prio task (A) does a kthread_create(B),
> followed by a kthread_bind(B, cpu1). At this moment, 
> only cpu0 is online.
> 
> Now, immediately after being created, B would
> do a 
> complete(>started) [kernel/kthread.c: kthread()], 
> before scheduling itself out.
> 
> This complete() will wake up kthreadd, which had spawned B.
> It is possible that during the wakeup, kthreadd might preempt B.
> Thus, B is still on the runqueue, and not yet called schedule().
> 
> kthreadd, will inturn do a 
> complete(>done); [kernel/kthread.c: create_kthread()]
> which will wake up the thread which had called kthread_create().
> In our case it's task A, which will run immediately, since its priority
> is higher.
> 
> A will now call kthread_bind(B, cpu1).
> kthread_bind(), calls wait_task_inactive(B), to ensures that 
> B has scheduled itself out.
> 
> B is still on the runqueue, so A calls yield() in wait_task_inactive().
> But since A is the task with the highest prio, scheduler schedules it
> back again.
> 
> Thus B never gets to run to schedule itself out.
> A loops waiting for B to schedule out leading  to system hang.

As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
an optimization, and easy to "fix":

--- kernel/kthread.c2007-07-28 16:58:17.0 +0400
+++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400
@@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
WARN_ON(1);
return;
}
-   /* Must have done schedule() in kthread() before we set_task_cpu */
-   wait_task_inactive(k);
-   set_task_cpu(k, cpu);
-   k->cpus_allowed = cpumask_of_cpu(cpu);
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
 }
 EXPORT_SYMBOL(kthread_bind);

But I think we have another case. An RT ptracer can share the same CPU
with ptracee. The latter sets TASK_STOPPED, unlocks ->siglock, and takes
a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
yields in wait_task_inactive.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Cpu-Hotplug and Real-Time

2007-08-07 Thread Gautham R Shenoy
Hi, 

While running a cpu-hotplug test involving a high priority
process (SCHED_RR, prio=94) trying to periodically offline and
online cpu1 on a 2-processor machine, I noticed that the system was
becoming unresponsive after a few iterations.

However, when the same test was repeated with processors
greater than 2, it worked fine. 
Also, if the hotplugging process, was not of rt-prio, it
worked fine on a 2-processor machine.

After some debugging, I saw that the hang occured because
the high prio process was stuck in a loop doing yield() inside
wait_task_inactive(). Description follows:

Say a high-prio task (A) does a kthread_create(B),
followed by a kthread_bind(B, cpu1). At this moment, 
only cpu0 is online.

Now, immediately after being created, B would
do a 
complete(>started) [kernel/kthread.c: kthread()], 
before scheduling itself out.

This complete() will wake up kthreadd, which had spawned B.
It is possible that during the wakeup, kthreadd might preempt B.
Thus, B is still on the runqueue, and not yet called schedule().

kthreadd, will inturn do a 
complete(>done); [kernel/kthread.c: create_kthread()]
which will wake up the thread which had called kthread_create().
In our case it's task A, which will run immediately, since its priority
is higher.

A will now call kthread_bind(B, cpu1).
kthread_bind(), calls wait_task_inactive(B), to ensures that 
B has scheduled itself out.

B is still on the runqueue, so A calls yield() in wait_task_inactive().
But since A is the task with the highest prio, scheduler schedules it
back again.

Thus B never gets to run to schedule itself out.
A loops waiting for B to schedule out leading  to system hang.

In my case,
A was the high priority process trying to bring up cpu1, and
thus doing a kthread_create/kthread_bind in 
migration_call(): CPU_UP_PREPARE.

B was the migration thread for cpu1.

And the above problem occurs when only one cpu is online.

Possible solutions to this problem:
a) Let the newly spawned kernel threads inherit
   their parent's prio and policy. 

b) Instead of using yield() in wait_task_inactive(), we could use
   something like a yield_to(p):

yield_to(struct task_struct p)
{
int old_prio = p->prio;
/* Temporarily boost p's priority atleast to that of current task */
if (current->prio > old_prio)
set_prio(p, current->prio);
yield();
/* Reset priority back to the original value */
set_prio(p, old_prio);
}


Thoughts?

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Cpu-Hotplug and Real-Time

2007-08-07 Thread Gautham R Shenoy
Hi, 

While running a cpu-hotplug test involving a high priority
process (SCHED_RR, prio=94) trying to periodically offline and
online cpu1 on a 2-processor machine, I noticed that the system was
becoming unresponsive after a few iterations.

However, when the same test was repeated with processors
greater than 2, it worked fine. 
Also, if the hotplugging process, was not of rt-prio, it
worked fine on a 2-processor machine.

After some debugging, I saw that the hang occured because
the high prio process was stuck in a loop doing yield() inside
wait_task_inactive(). Description follows:

Say a high-prio task (A) does a kthread_create(B),
followed by a kthread_bind(B, cpu1). At this moment, 
only cpu0 is online.

Now, immediately after being created, B would
do a 
complete(create-started) [kernel/kthread.c: kthread()], 
before scheduling itself out.

This complete() will wake up kthreadd, which had spawned B.
It is possible that during the wakeup, kthreadd might preempt B.
Thus, B is still on the runqueue, and not yet called schedule().

kthreadd, will inturn do a 
complete(create-done); [kernel/kthread.c: create_kthread()]
which will wake up the thread which had called kthread_create().
In our case it's task A, which will run immediately, since its priority
is higher.

A will now call kthread_bind(B, cpu1).
kthread_bind(), calls wait_task_inactive(B), to ensures that 
B has scheduled itself out.

B is still on the runqueue, so A calls yield() in wait_task_inactive().
But since A is the task with the highest prio, scheduler schedules it
back again.

Thus B never gets to run to schedule itself out.
A loops waiting for B to schedule out leading  to system hang.

In my case,
A was the high priority process trying to bring up cpu1, and
thus doing a kthread_create/kthread_bind in 
migration_call(): CPU_UP_PREPARE.

B was the migration thread for cpu1.

And the above problem occurs when only one cpu is online.

Possible solutions to this problem:
a) Let the newly spawned kernel threads inherit
   their parent's prio and policy. 

b) Instead of using yield() in wait_task_inactive(), we could use
   something like a yield_to(p):

yield_to(struct task_struct p)
{
int old_prio = p-prio;
/* Temporarily boost p's priority atleast to that of current task */
if (current-prio  old_prio)
set_prio(p, current-prio);
yield();
/* Reset priority back to the original value */
set_prio(p, old_prio);
}


Thoughts?

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Oleg Nesterov
On 08/07, Gautham R Shenoy wrote:

 After some debugging, I saw that the hang occured because
 the high prio process was stuck in a loop doing yield() inside
 wait_task_inactive(). Description follows:
 
 Say a high-prio task (A) does a kthread_create(B),
 followed by a kthread_bind(B, cpu1). At this moment, 
 only cpu0 is online.
 
 Now, immediately after being created, B would
 do a 
 complete(create-started) [kernel/kthread.c: kthread()], 
 before scheduling itself out.
 
 This complete() will wake up kthreadd, which had spawned B.
 It is possible that during the wakeup, kthreadd might preempt B.
 Thus, B is still on the runqueue, and not yet called schedule().
 
 kthreadd, will inturn do a 
 complete(create-done); [kernel/kthread.c: create_kthread()]
 which will wake up the thread which had called kthread_create().
 In our case it's task A, which will run immediately, since its priority
 is higher.
 
 A will now call kthread_bind(B, cpu1).
 kthread_bind(), calls wait_task_inactive(B), to ensures that 
 B has scheduled itself out.
 
 B is still on the runqueue, so A calls yield() in wait_task_inactive().
 But since A is the task with the highest prio, scheduler schedules it
 back again.
 
 Thus B never gets to run to schedule itself out.
 A loops waiting for B to schedule out leading  to system hang.

As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
an optimization, and easy to fix:

--- kernel/kthread.c2007-07-28 16:58:17.0 +0400
+++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400
@@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
WARN_ON(1);
return;
}
-   /* Must have done schedule() in kthread() before we set_task_cpu */
-   wait_task_inactive(k);
-   set_task_cpu(k, cpu);
-   k-cpus_allowed = cpumask_of_cpu(cpu);
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
 }
 EXPORT_SYMBOL(kthread_bind);

But I think we have another case. An RT ptracer can share the same CPU
with ptracee. The latter sets TASK_STOPPED, unlocks -siglock, and takes
a preemption. Ptracer does ptrace_check_attach(), sees TASK_STOPPED, and
yields in wait_task_inactive.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Venki Pallipadi
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote:
 On 08/07, Gautham R Shenoy wrote:
 
  After some debugging, I saw that the hang occured because
  the high prio process was stuck in a loop doing yield() inside
  wait_task_inactive(). Description follows:
  
  Say a high-prio task (A) does a kthread_create(B),
  followed by a kthread_bind(B, cpu1). At this moment, 
  only cpu0 is online.
  
  Now, immediately after being created, B would
  do a 
  complete(create-started) [kernel/kthread.c: kthread()], 
  before scheduling itself out.
  
  This complete() will wake up kthreadd, which had spawned B.
  It is possible that during the wakeup, kthreadd might preempt B.
  Thus, B is still on the runqueue, and not yet called schedule().
  
  kthreadd, will inturn do a 
  complete(create-done); [kernel/kthread.c: create_kthread()]
  which will wake up the thread which had called kthread_create().
  In our case it's task A, which will run immediately, since its priority
  is higher.
  
  A will now call kthread_bind(B, cpu1).
  kthread_bind(), calls wait_task_inactive(B), to ensures that 
  B has scheduled itself out.
  
  B is still on the runqueue, so A calls yield() in wait_task_inactive().
  But since A is the task with the highest prio, scheduler schedules it
  back again.
  
  Thus B never gets to run to schedule itself out.
  A loops waiting for B to schedule out leading  to system hang.
 
 As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
 an optimization, and easy to fix:
 
 --- kernel/kthread.c  2007-07-28 16:58:17.0 +0400
 +++ /proc/self/fd/0   2007-08-07 18:56:54.248073547 +0400
 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
   WARN_ON(1);
   return;
   }
 - /* Must have done schedule() in kthread() before we set_task_cpu */
 - wait_task_inactive(k);
 - set_task_cpu(k, cpu);
 - k-cpus_allowed = cpumask_of_cpu(cpu);
 + set_cpus_allowed(current, cpumask_of_cpu(cpu));
  }
  EXPORT_SYMBOL(kthread_bind);
 

Not sure whether set_cpus_allowed() will work here. Looks like, it needs the
CPU to be online during the call and in kthread_bind() case CPU may be offline.

Thanks,
Venki
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cpu-Hotplug and Real-Time

2007-08-07 Thread Oleg Nesterov
On 08/07, Venki Pallipadi wrote:

 On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote:
  
  As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just
  an optimization, and easy to fix:
  
  --- kernel/kthread.c2007-07-28 16:58:17.0 +0400
  +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400
  @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k,
  WARN_ON(1);
  return;
  }
  -   /* Must have done schedule() in kthread() before we set_task_cpu */
  -   wait_task_inactive(k);
  -   set_task_cpu(k, cpu);
  -   k-cpus_allowed = cpumask_of_cpu(cpu);
  +   set_cpus_allowed(current, cpumask_of_cpu(cpu));
   }
   EXPORT_SYMBOL(kthread_bind);
  
 
 Not sure whether set_cpus_allowed() will work here. Looks like, it needs the
 CPU to be online during the call and in kthread_bind() case CPU may be 
 offline.

Aah, you are right, of course.

Thanks,

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/