Re: [patch RT 1/3] cpu/rt: Rework cpu down for PREEMPT_RT

2012-07-16 Thread Steven Rostedt
On Mon, 2012-07-16 at 08:07 +, Thomas Gleixner wrote:

I know you are on vacation (hope you are enjoying yourself ;-)


> ---
>  include/linux/sched.h |7 ++
>  kernel/cpu.c  |  236 
> +
>  kernel/sched/core.c   |   82 +-
>  3 files changed, 285 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7fc8321..777f7bb 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1973,6 +1973,10 @@ extern void do_set_cpus_allowed(struct task_struct *p,
>  
>  extern int set_cpus_allowed_ptr(struct task_struct *p,
>   const struct cpumask *new_mask);
> +int migrate_me(void);
> +void tell_sched_cpu_down_begin(int cpu);
> +void tell_sched_cpu_down_done(int cpu);
> +
>  #else
>  static inline void do_set_cpus_allowed(struct task_struct *p,
> const struct cpumask *new_mask)
> @@ -1985,6 +1989,9 @@ static inline int set_cpus_allowed_ptr(struct 
> task_struct *p,
>   return -EINVAL;
>   return 0;
>  }
> +static inline int migrate_me(void) { return 0; }
> +static inline void tell_sched_cpu_down_begin(int cpu) { }
> +static inline void tell_sched_cpu_down_done(int cpu) { }
>  #endif
>  
>  #ifndef CONFIG_CPUMASK_OFFSTACK
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d79d33a..c5b3273 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -46,12 +46,7 @@ static int cpu_hotplug_disabled;
>  
>  static struct {
>   struct task_struct *active_writer;
> -#ifdef CONFIG_PREEMPT_RT_FULL
> - /* Makes the lock keep the task's state */
> - spinlock_t lock;
> -#else
>   struct mutex lock; /* Synchronizes accesses to refcount, */
> -#endif
>   /*
>* Also blocks the new readers during
>* an ongoing cpu hotplug operation.
> @@ -67,20 +62,42 @@ static struct {

As I was backporting this to 3.0-rt, I noticed that the following is
needed too:

diff --git a/kernel/cpu.c b/kernel/cpu.c
index c5b3273..3e722c0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -54,11 +54,7 @@ static struct {
int refcount;
 } cpu_hotplug = {
.active_writer = NULL,
-#ifdef CONFIG_PREEMPT_RT_FULL
-   .lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.lock),
-#else
.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
-#endif
.refcount = 0,
 };

As this goes with part of the applied patch above.
 
I'll add this on top, if no one objects.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch RT 1/3] cpu/rt: Rework cpu down for PREEMPT_RT

2012-07-16 Thread Thomas Gleixner
Bringing a CPU down is a pain with the PREEMPT_RT kernel because
tasks can be preempted in many more places than in non-RT. In
order to handle per_cpu variables, tasks may be pinned to a CPU
for a while, and even sleep. But these tasks need to be off the CPU
if that CPU is going down.

Several synchronization methods have been tried, but when stressed
they failed. This is a new approach.

A sync_tsk thread is still created and tasks may still block on a
lock when the CPU is going down, but how that works is a bit different.
When cpu_down() starts, it will create the sync_tsk and wait on it
to inform that current tasks that are pinned on the CPU are no longer
pinned. But new tasks that are about to be pinned will still be allowed
to do so at this time.

Then the notifiers are called. Several notifiers will bring down tasks
that will enter these locations. Some of these tasks will take locks
of other tasks that are on the CPU. If we don't let those other tasks
continue, but make them block until CPU down is done, the tasks that
the notifiers are waiting on will never complete as they are waiting
for the locks held by the tasks that are blocked.

Thus we still let the task pin the CPU until the notifiers are done.
After the notifiers run, we then make new tasks entering the pinned
CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
in the hotplug_pcp descriptor.

To help things along, a new function in the scheduler code is created
called migrate_me(). This function will try to migrate the current task
off the CPU this is going down if possible. When the sync_tsk is created,
all tasks will then try to migrate off the CPU going down. There are
several cases that this wont work, but it helps in most cases.

After the notifiers are called and if a task can't migrate off but enters
the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
until the CPU down is complete. Then the scheduler will force the migration
anyway.

Also, I found that THREAD_BOUND need to also be accounted for in the
pinned CPU, and the migrate_disable no longer treats them special.
This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.

Signed-off-by: Steven Rostedt 
Signed-off-by: Thomas Gleixner 
---
 include/linux/sched.h |7 ++
 kernel/cpu.c  |  236 +
 kernel/sched/core.c   |   82 +-
 3 files changed, 285 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7fc8321..777f7bb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1973,6 +1973,10 @@ extern void do_set_cpus_allowed(struct task_struct *p,
 
 extern int set_cpus_allowed_ptr(struct task_struct *p,
const struct cpumask *new_mask);
+int migrate_me(void);
+void tell_sched_cpu_down_begin(int cpu);
+void tell_sched_cpu_down_done(int cpu);
+
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p,
  const struct cpumask *new_mask)
@@ -1985,6 +1989,9 @@ static inline int set_cpus_allowed_ptr(struct task_struct 
*p,
return -EINVAL;
return 0;
 }
+static inline int migrate_me(void) { return 0; }
+static inline void tell_sched_cpu_down_begin(int cpu) { }
+static inline void tell_sched_cpu_down_done(int cpu) { }
 #endif
 
 #ifndef CONFIG_CPUMASK_OFFSTACK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d79d33a..c5b3273 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -46,12 +46,7 @@ static int cpu_hotplug_disabled;
 
 static struct {
struct task_struct *active_writer;
-#ifdef CONFIG_PREEMPT_RT_FULL
-   /* Makes the lock keep the task's state */
-   spinlock_t lock;
-#else
struct mutex lock; /* Synchronizes accesses to refcount, */
-#endif
/*
 * Also blocks the new readers during
 * an ongoing cpu hotplug operation.
@@ -67,20 +62,42 @@ static struct {
.refcount = 0,
 };
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-# define hotplug_lock() rt_spin_lock(&cpu_hotplug.lock)
-# define hotplug_unlock() rt_spin_unlock(&cpu_hotplug.lock)
-#else
-# define hotplug_lock() mutex_lock(&cpu_hotplug.lock)
-# define hotplug_unlock() mutex_unlock(&cpu_hotplug.lock)
-#endif
-
+/**
+ * hotplug_pcp - per cpu hotplug descriptor
+ * @unplug:set when pin_current_cpu() needs to sync tasks
+ * @sync_tsk:  the task that waits for tasks to finish pinned sections
+ * @refcount:  counter of tasks in pinned sections
+ * @grab_lock: set when the tasks entering pinned sections should wait
+ * @synced:notifier for @sync_tsk to tell cpu_down it's finished
+ * @mutex: the mutex to make tasks wait (used when @grab_lock is true)
+ * @mutex_init:zero if the mutex hasn't been initialized yet.
+ *
+ * Although @unplug and @sync_tsk may point to the same task, the @unplug
+ * is used as a flag and still exists after @sync_tsk has exited and
+ * @sync_tsk set to NULL.
+ */
 struct hotp