Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> Ok. Sounds like disabling cache_reaper is a better option for now. Like you
> said
> it's unlikely that slabs will grow much if that cpu is not heavily used by the
> kernel.

Running for prolonged times without cache_reaper is no good.

What we are talking about here is to disable the cache_reaper during cpu 
shutdown. The slab cpu shutdown will clean the per cpu caches anyways so 
we really do not need the slab_reaper running during cpu shutdown.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


I guess I kind of hijacked the thread. The second part of my first email was
dropped. Basically I was saying that I'm working on CPU isolation extensions.
Where an isolated CPU is not supposed to do much kernel work. In which case
you'd want to run slab cache reaper on some other CPU on behalf of the
isolated
one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
that you guys fixed the hotplug case it does not help in that scenario.


A cpu must have a per cpu cache in order to do slab allocations. The 
locking in the slab allocator depends on it.


If the isolated cpus have no need for slab allocations then you will also 
not need to run the slab_reaper().


Ok. Sounds like disabling cache_reaper is a better option for now. Like you said
it's unlikely that slabs will grow much if that cpu is not heavily used by the
kernel.

Thanks
Max
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> I guess I kind of hijacked the thread. The second part of my first email was
> dropped. Basically I was saying that I'm working on CPU isolation extensions.
> Where an isolated CPU is not supposed to do much kernel work. In which case
> you'd want to run slab cache reaper on some other CPU on behalf of the
> isolated
> one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
> that you guys fixed the hotplug case it does not help in that scenario.

A cpu must have a per cpu cache in order to do slab allocations. The 
locking in the slab allocator depends on it.

If the isolated cpus have no need for slab allocations then you will also 
not need to run the slab_reaper().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> I agree that running the reaper on the wrong CPU is not the best way to go
> about it.
> But it seems like disabling it is even worse, unless I missing something. ie
> wasting
> memory.

Disabling during shutdown is no problem because the per cpu caches are 
going to be drained anyways by the cpu down code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Oleg Nesterov wrote:

On 02/20, Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


Well seems that we have a set of unresolved issues with workqueues and cpu
hotplug.

How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when
threads/timers get
moved around.
The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.


I personally agree. Besides, cache_reaper is not alone. Note the comment
in debug_smp_processor_id() about cpu-bound processes. The slab does correct
thing right now, stops the timer on CPU_DEAD. Other problems imho should be
solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.


I guess I kind of hijacked the thread. The second part of my first email was
dropped. Basically I was saying that I'm working on CPU isolation extensions.
Where an isolated CPU is not supposed to do much kernel work. In which case
you'd want to run slab cache reaper on some other CPU on behalf of the isolated
one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
that you guys fixed the hotplug case it does not help in that scenario.

Max
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


Well seems that we have a set of unresolved issues with workqueues and cpu
hotplug.

How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when
threads/timers get
moved around.


The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.
The cache_reaper() is of no importance to hotplug. You just need to make 
sure that it is not in the way (disable it and if its running wait 
until the cache_reaper has finished).


I agree that running the reaper on the wrong CPU is not the best way to go 
about it.
But it seems like disabling it is even worse, unless I missing something. ie 
wasting
memory.

btw What kind of troubles were you talking about ? Performance or robustness ?
As I said performance wise it does not make sense to run reaper on the wrong 
CPU but
it does seems to work just fine from the correctness (locking, etc) 
perspective. Again
it's totally possible that I'm missing something here :).

Thanks
Max
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Oleg Nesterov
On 02/20, Christoph Lameter wrote:
>
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
> > > > Well seems that we have a set of unresolved issues with workqueues and 
> > > > cpu
> > > > hotplug.
> > 
> > How about storing 'cpu' explicitly in the work queue instead of relying on 
> > the
> > smp_processor_id() and friends ? That way there is no ambiguity when
> > threads/timers get
> > moved around.
> 
> The slab functionality is designed to work on the processor with the 
> queue. These tricks will only cause more trouble in the future. The 
> cache_reaper needs to be either disabled or run on the right processor. It 
> should never run on the wrong processor.

I personally agree. Besides, cache_reaper is not alone. Note the comment
in debug_smp_processor_id() about cpu-bound processes. The slab does correct
thing right now, stops the timer on CPU_DEAD. Other problems imho should be
solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Folks,

Oleg Nesterov wrote:

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

Uhh This may not be correct in terms of how the slab operates.


But this is practically impossible to avoid. We can't delay CPU_DOWN until all
workqueues flush their cwq->worklist. This is livelockable, the work can 
re-queue
itself, and new works can be added since the dying CPU is still on 
cpu_online_map.
This means that some pending works will be processed on another CPU.

delayed_work is even worse, the timer can migrate as well.

The first problem (smp_processor_id() is not stable) could be solved if we
use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.


This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).
Well seems that we have a set of unresolved issues with workqueues and cpu 
hotplug.


How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when 
threads/timers get
moved around.
I'm cooking a set of patches to extend cpu isolation concept a bit. In which 
case I'd
like one CPU to run cache_reap timer on behalf of another cpu. See the patch 
below.

diff --git a/mm/slab.c b/mm/slab.c
index c610062..0f46d11 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -766,7 +766,17 @@ int slab_is_available(void)
return g_cpucache_up == FULL;
 }

-static DEFINE_PER_CPU(struct delayed_work, reap_work);
+struct slab_work {
+   struct delayed_work dw;
+   unsigned int   cpu;
+};
+
+static DEFINE_PER_CPU(struct slab_work, reap_work);
+
+static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, 
unsigned int cpu)
+{
+   return cachep->array[cpu];
+}

 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 {
@@ -915,9 +925,9 @@ static void init_reap_node(int cpu)
per_cpu(reap_node, cpu) = node;
 }

-static void next_reap_node(void)
+static void next_reap_node(unsigned int cpu)
 {
-   int node = __get_cpu_var(reap_node);
+   int node = per_cpu(reap_node, cpu);

/*
 * Also drain per cpu pages on remote zones
@@ -928,12 +938,12 @@ static void next_reap_node(void)
node = next_node(node, node_online_map);
if (unlikely(node >= MAX_NUMNODES))
node = first_node(node_online_map);
-   __get_cpu_var(reap_node) = node;
+   per_cpu(reap_node, cpu) = node;
 }

 #else
 #define init_reap_node(cpu) do { } while (0)
-#define next_reap_node(void) do { } while (0)
+#define next_reap_node(cpu) do { } while (0)
 #endif

 /*
@@ -945,17 +955,18 @@ static void next_reap_node(void)
  */
 static void __devinit start_cpu_timer(int cpu)
 {
-   struct delayed_work *reap_work = _cpu(reap_work, cpu);
+   struct slab_work *reap_work = _cpu(reap_work, cpu);

/*
 * When this gets called from do_initcalls via cpucache_init(),
 * init_workqueues() has already run, so keventd will be setup
 * at that time.
 */
-   if (keventd_up() && reap_work->work.func == NULL) {
+   if (keventd_up() && reap_work->dw.work.func == NULL) {
init_reap_node(cpu);
-   INIT_DELAYED_WORK(reap_work, cache_reap);
-   schedule_delayed_work_on(cpu, reap_work,
+   INIT_DELAYED_WORK(_work->dw, cache_reap);
+   reap_work->cpu = cpu;
+   schedule_delayed_work_on(cpu, _work->dw,
__round_jiffies_relative(HZ, cpu));
}
 }
@@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to,
 #ifndef CONFIG_NUMA

 #define drain_alien_cache(cachep, alien) do { } while (0)
-#define reap_alien(cachep, l3) do { } while (0)
+#define reap_alien(cachep, l3, cpu) do { } while (0)

 static inline struct array_cache **alloc_alien_cache(int node, int limit)
 {
@@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
 /*
  * Called from cache_reap() to regularly drain alien caches round robin.
  */
-static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
+static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, 
unsigned int cpu)
 {
-   int node = __get_cpu_var(reap_node);
+   int node = per_cpu(reap_node, cpu);

if (l3->alien) {
struct array_cache *ac = l3->alien[node];
@@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct 
kmem_list3 *l3,
  * If we cannot acquire the cache chain mutex then just give up - we'll try
  * again on the next iteration.
  */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *_work)
 {
struct kmem_cache *searchp;
struct 

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> > > Well seems that we have a set of unresolved issues with workqueues and cpu
> > > hotplug.
> 
> How about storing 'cpu' explicitly in the work queue instead of relying on the
> smp_processor_id() and friends ? That way there is no ambiguity when
> threads/timers get
> moved around.

The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.

The cache_reaper() is of no importance to hotplug. You just need to make 
sure that it is not in the way (disable it and if its running wait 
until the cache_reaper has finished).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

   Well seems that we have a set of unresolved issues with workqueues and cpu
   hotplug.
 
 How about storing 'cpu' explicitly in the work queue instead of relying on the
 smp_processor_id() and friends ? That way there is no ambiguity when
 threads/timers get
 moved around.

The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.

The cache_reaper() is of no importance to hotplug. You just need to make 
sure that it is not in the way (disable it and if its running wait 
until the cache_reaper has finished).

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Folks,

Oleg Nesterov wrote:

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

Uhh This may not be correct in terms of how the slab operates.


But this is practically impossible to avoid. We can't delay CPU_DOWN until all
workqueues flush their cwq-worklist. This is livelockable, the work can 
re-queue
itself, and new works can be added since the dying CPU is still on 
cpu_online_map.
This means that some pending works will be processed on another CPU.

delayed_work is even worse, the timer can migrate as well.

The first problem (smp_processor_id() is not stable) could be solved if we
use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.


This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).
Well seems that we have a set of unresolved issues with workqueues and cpu 
hotplug.


How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when 
threads/timers get
moved around.
I'm cooking a set of patches to extend cpu isolation concept a bit. In which 
case I'd
like one CPU to run cache_reap timer on behalf of another cpu. See the patch 
below.

diff --git a/mm/slab.c b/mm/slab.c
index c610062..0f46d11 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -766,7 +766,17 @@ int slab_is_available(void)
return g_cpucache_up == FULL;
 }

-static DEFINE_PER_CPU(struct delayed_work, reap_work);
+struct slab_work {
+   struct delayed_work dw;
+   unsigned int   cpu;
+};
+
+static DEFINE_PER_CPU(struct slab_work, reap_work);
+
+static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, 
unsigned int cpu)
+{
+   return cachep-array[cpu];
+}

 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 {
@@ -915,9 +925,9 @@ static void init_reap_node(int cpu)
per_cpu(reap_node, cpu) = node;
 }

-static void next_reap_node(void)
+static void next_reap_node(unsigned int cpu)
 {
-   int node = __get_cpu_var(reap_node);
+   int node = per_cpu(reap_node, cpu);

/*
 * Also drain per cpu pages on remote zones
@@ -928,12 +938,12 @@ static void next_reap_node(void)
node = next_node(node, node_online_map);
if (unlikely(node = MAX_NUMNODES))
node = first_node(node_online_map);
-   __get_cpu_var(reap_node) = node;
+   per_cpu(reap_node, cpu) = node;
 }

 #else
 #define init_reap_node(cpu) do { } while (0)
-#define next_reap_node(void) do { } while (0)
+#define next_reap_node(cpu) do { } while (0)
 #endif

 /*
@@ -945,17 +955,18 @@ static void next_reap_node(void)
  */
 static void __devinit start_cpu_timer(int cpu)
 {
-   struct delayed_work *reap_work = per_cpu(reap_work, cpu);
+   struct slab_work *reap_work = per_cpu(reap_work, cpu);

/*
 * When this gets called from do_initcalls via cpucache_init(),
 * init_workqueues() has already run, so keventd will be setup
 * at that time.
 */
-   if (keventd_up()  reap_work-work.func == NULL) {
+   if (keventd_up()  reap_work-dw.work.func == NULL) {
init_reap_node(cpu);
-   INIT_DELAYED_WORK(reap_work, cache_reap);
-   schedule_delayed_work_on(cpu, reap_work,
+   INIT_DELAYED_WORK(reap_work-dw, cache_reap);
+   reap_work-cpu = cpu;
+   schedule_delayed_work_on(cpu, reap_work-dw,
__round_jiffies_relative(HZ, cpu));
}
 }
@@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to,
 #ifndef CONFIG_NUMA

 #define drain_alien_cache(cachep, alien) do { } while (0)
-#define reap_alien(cachep, l3) do { } while (0)
+#define reap_alien(cachep, l3, cpu) do { } while (0)

 static inline struct array_cache **alloc_alien_cache(int node, int limit)
 {
@@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
 /*
  * Called from cache_reap() to regularly drain alien caches round robin.
  */
-static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
+static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, 
unsigned int cpu)
 {
-   int node = __get_cpu_var(reap_node);
+   int node = per_cpu(reap_node, cpu);

if (l3-alien) {
struct array_cache *ac = l3-alien[node];
@@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct 
kmem_list3 *l3,
  * If we cannot acquire the cache chain mutex then just give up - we'll try
  * again on the next iteration.
  */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *_work)
 {
struct kmem_cache *searchp;
struct 

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Oleg Nesterov
On 02/20, Christoph Lameter wrote:

 On Tue, 20 Feb 2007, Max Krasnyansky wrote:
 
Well seems that we have a set of unresolved issues with workqueues and 
cpu
hotplug.
  
  How about storing 'cpu' explicitly in the work queue instead of relying on 
  the
  smp_processor_id() and friends ? That way there is no ambiguity when
  threads/timers get
  moved around.
 
 The slab functionality is designed to work on the processor with the 
 queue. These tricks will only cause more trouble in the future. The 
 cache_reaper needs to be either disabled or run on the right processor. It 
 should never run on the wrong processor.

I personally agree. Besides, cache_reaper is not alone. Note the comment
in debug_smp_processor_id() about cpu-bound processes. The slab does correct
thing right now, stops the timer on CPU_DEAD. Other problems imho should be
solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


Well seems that we have a set of unresolved issues with workqueues and cpu
hotplug.

How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when
threads/timers get
moved around.


The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.
The cache_reaper() is of no importance to hotplug. You just need to make 
sure that it is not in the way (disable it and if its running wait 
until the cache_reaper has finished).


I agree that running the reaper on the wrong CPU is not the best way to go 
about it.
But it seems like disabling it is even worse, unless I missing something. ie 
wasting
memory.

btw What kind of troubles were you talking about ? Performance or robustness ?
As I said performance wise it does not make sense to run reaper on the wrong 
CPU but
it does seems to work just fine from the correctness (locking, etc) 
perspective. Again
it's totally possible that I'm missing something here :).

Thanks
Max
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Oleg Nesterov wrote:

On 02/20, Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


Well seems that we have a set of unresolved issues with workqueues and cpu
hotplug.

How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when
threads/timers get
moved around.
The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.


I personally agree. Besides, cache_reaper is not alone. Note the comment
in debug_smp_processor_id() about cpu-bound processes. The slab does correct
thing right now, stops the timer on CPU_DEAD. Other problems imho should be
solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.


I guess I kind of hijacked the thread. The second part of my first email was
dropped. Basically I was saying that I'm working on CPU isolation extensions.
Where an isolated CPU is not supposed to do much kernel work. In which case
you'd want to run slab cache reaper on some other CPU on behalf of the isolated
one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
that you guys fixed the hotplug case it does not help in that scenario.

Max
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

 I agree that running the reaper on the wrong CPU is not the best way to go
 about it.
 But it seems like disabling it is even worse, unless I missing something. ie
 wasting
 memory.

Disabling during shutdown is no problem because the per cpu caches are 
going to be drained anyways by the cpu down code.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

 I guess I kind of hijacked the thread. The second part of my first email was
 dropped. Basically I was saying that I'm working on CPU isolation extensions.
 Where an isolated CPU is not supposed to do much kernel work. In which case
 you'd want to run slab cache reaper on some other CPU on behalf of the
 isolated
 one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
 that you guys fixed the hotplug case it does not help in that scenario.

A cpu must have a per cpu cache in order to do slab allocations. The 
locking in the slab allocator depends on it.

If the isolated cpus have no need for slab allocations then you will also 
not need to run the slab_reaper().

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky

Christoph Lameter wrote:

On Tue, 20 Feb 2007, Max Krasnyansky wrote:


I guess I kind of hijacked the thread. The second part of my first email was
dropped. Basically I was saying that I'm working on CPU isolation extensions.
Where an isolated CPU is not supposed to do much kernel work. In which case
you'd want to run slab cache reaper on some other CPU on behalf of the
isolated
one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
that you guys fixed the hotplug case it does not help in that scenario.


A cpu must have a per cpu cache in order to do slab allocations. The 
locking in the slab allocator depends on it.


If the isolated cpus have no need for slab allocations then you will also 
not need to run the slab_reaper().


Ok. Sounds like disabling cache_reaper is a better option for now. Like you said
it's unlikely that slabs will grow much if that cpu is not heavily used by the
kernel.

Thanks
Max
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote:

 Ok. Sounds like disabling cache_reaper is a better option for now. Like you
 said
 it's unlikely that slabs will grow much if that cpu is not heavily used by the
 kernel.

Running for prolonged times without cache_reaper is no good.

What we are talking about here is to disable the cache_reaper during cpu 
shutdown. The slab cpu shutdown will clean the per cpu caches anyways so 
we really do not need the slab_reaper running during cpu shutdown.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> On Tue, 30 Jan 2007, Oleg Nesterov wrote:
> 
> > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
> > so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
> > but this is not related.
> 
> Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
> anymore, right? Which brings us to this form of the patch:

Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so
on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted,
sorry for the confusion!

Note: with the patch below we are doing cancel_rearming_delayed_work()
under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(),
so deadlock is not possible. However, this means that mutex_trylock() becomes
mandatory for cache_reap(), probably a little comment will be good.

> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.20-rc6-mm2/mm/slab.c
> ===
> --- linux-2.6.20-rc6-mm2.orig/mm/slab.c   2007-01-29 14:27:34.199229828 
> -0600
> +++ linux-2.6.20-rc6-mm2/mm/slab.c2007-01-29 15:47:18.293962726 -0600
> @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
>   start_cpu_timer(cpu);
>   break;
>  #ifdef CONFIG_HOTPLUG_CPU
> + case CPU_DOWN_PREPARE:
> + /* Shutdown cache reaper */
> + cancel_rearming_delayed_work(_cpu(reap_work, cpu));
> + per_cpu(reap_work, cpu).work.func = NULL;
> + break;
> + case CPU_DOWN_FAILED:
> + start_cpu_timer(cpu);
> + break;
>   case CPU_DEAD:
>   /*
>* Even if all the cpus of a node are down, we don't free the

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Tue, 30 Jan 2007, Oleg Nesterov wrote:

> Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
> so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
> but this is not related.

Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
anymore, right? Which brings us to this form of the patch:


Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm2/mm/slab.c
===
--- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600
+++ linux-2.6.20-rc6-mm2/mm/slab.c  2007-01-29 15:47:18.293962726 -0600
@@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
start_cpu_timer(cpu);
break;
 #ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+   break;
+   case CPU_DOWN_FAILED:
+   start_cpu_timer(cpu);
+   break;
case CPU_DEAD:
/*
 * Even if all the cpus of a node are down, we don't free the
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
> CPU_DOWN_FAILED somehow vanished in mm?

No, no, there are still in place, so I believe your patch is good.

Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
but this is not related.

> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.20-rc6-mm1/mm/slab.c
> ===
> --- linux-2.6.20-rc6-mm1.orig/mm/slab.c   2007-01-29 14:18:37.0 
> -0600
> +++ linux-2.6.20-rc6-mm1/mm/slab.c2007-01-29 14:21:18.119155877 -0600
> @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
>   start_cpu_timer(cpu);
>   break;
>  #ifdef CONFIG_HOTPLUG_CPU
> + case CPU_DOWN_PREPARE:
> + /* Shutdown cache reaper */
> + cancel_rearming_delayed_work(_cpu(reap_work, cpu));
> + per_cpu(reap_work, cpu).work.func = NULL;
> +
> + mutex_lock(_chain_mutex);
> + break;
> + case CPU_DOWN_FAILED:
> + mutex_unlock(_chain_mutex);
> + start_cpu_timer(cpu);
> + break;
>   case CPU_DEAD:
>   /*
>* Even if all the cpus of a node are down, we don't free the

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > But we could delay CPU_DOWN in the handler for the slab until we know that 
> > the cache_reaper is no longer running?
> 
> Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
> like you did in the previous patch. Did you mean this? If yes - then yes :)

Yes sure.

> Worse, we can have 2 handlers running in parallel on the same CPU. But this
> is fixed by your previous patch, I believe.

Good.

Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
CPU_DOWN_FAILED somehow vanished in mm?





Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm1/mm/slab.c
===
--- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 -0600
+++ linux-2.6.20-rc6-mm1/mm/slab.c  2007-01-29 14:21:18.119155877 -0600
@@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
start_cpu_timer(cpu);
break;
 #ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+
+   mutex_lock(_chain_mutex);
+   break;
+   case CPU_DOWN_FAILED:
+   mutex_unlock(_chain_mutex);
+   start_cpu_timer(cpu);
+   break;
case CPU_DEAD:
/*
 * Even if all the cpus of a node are down, we don't free the
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > > > Even if smp_processor_id() was stable during the execution of 
> > > > cache_reap(),
> > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We 
> > > > can't
> > > > avoid this, and this is correct.
> > > 
> > > Uhh This may not be correct in terms of how the slab operates.
> > 
> > But this is practically impossible to avoid. We can't delay CPU_DOWN until 
> > all
> > workqueues flush their cwq->worklist. This is livelockable, the work can 
> > re-queue
> > itself, and new works can be added since the dying CPU is still on 
> > cpu_online_map.
> > This means that some pending works will be processed on another CPU.
> 
> But we could delay CPU_DOWN in the handler for the slab until we know that 
> the cache_reaper is no longer running?

Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
like you did in the previous patch. Did you mean this? If yes - then yes :)

> > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct 
> > > > delayed_work.
> > > > This is absolutely harmless right now, but may be it's better to use
> > > > container_of(unused, struct delayed_work, work).
> 
> There is more where that is coming from. cache_reap determines the current 
> cpu in order to find the correct per cpu cache and also determines the 
> current node. If you move cache_reap to another processor / node then it 
> will just clean that one and not do anything for the processor that you 
> wanted it to run for.

Worse, we can have 2 handlers running in parallel on the same CPU. But this
is fixed by your previous patch, I believe.

>If we change processors in the middle of the run
> then it may do some actions on one cpu and some on another

Yep. For example, next_reap_node() will not be happy if we change CPU in
the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN,
and cache_reap() should not care about this.

> -static void cache_reap(struct work_struct *unused)
> +static void cache_reap(struct work_struct *w)
>  {
>   struct kmem_cache *searchp;
>   struct kmem_list3 *l3;
>   int node = numa_node_id();
> + struct delayed_work *work =
> + container_of(w, struct delayed_work, work);
>
> ...
>
> + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));

Actually, I was wrong. Yes, this should work, but only with your previous
patch. Otherwise, if the handler runs on the "wrong" CPU (this is not possible
since you added cancel_delayed_work()), we are in fact starting a second reaper
on the same CPU, not good.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > >  The slab would need a notification 
> > > that the workqueue for a processor was shutdown in order to set work.func 
> > > = NULL.
> > 
> > The slab has a notification: CPU_XXX events. It should cancel a pending per
> > cpu "reap_work timer".
> 
> Ahh. Okay then the following patch would fix it?
> 
> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.20-rc6/mm/slab.c
> ===
> --- linux-2.6.20-rc6.orig/mm/slab.c   2007-01-24 20:19:28.0 -0600
> +++ linux-2.6.20-rc6/mm/slab.c2007-01-29 13:08:05.773928988 -0600
> @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
>   break;
>  #ifdef CONFIG_HOTPLUG_CPU
>   case CPU_DOWN_PREPARE:
> + /* Shutdown cache reaper */
> + cancel_rearming_delayed_work(_cpu(reap_work, cpu));
> + per_cpu(reap_work, cpu).work.func = NULL;
> +
>   mutex_lock(_chain_mutex);
>   break;
>   case CPU_DOWN_FAILED:

Then CPU_DOWN_FAILED should do start_cpu_timer(cpu).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > > Even if smp_processor_id() was stable during the execution of 
> > > cache_reap(),
> > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> > > avoid this, and this is correct.
> > 
> > Uhh This may not be correct in terms of how the slab operates.
> 
> But this is practically impossible to avoid. We can't delay CPU_DOWN until all
> workqueues flush their cwq->worklist. This is livelockable, the work can 
> re-queue
> itself, and new works can be added since the dying CPU is still on 
> cpu_online_map.
> This means that some pending works will be processed on another CPU.

But we could delay CPU_DOWN in the handler for the slab until we know that 
the cache_reaper is no longer running?

> > > This means that __get_cpu_var(reap_work) returns a "wrong" struct 
> > > delayed_work.
> > > This is absolutely harmless right now, but may be it's better to use
> > > container_of(unused, struct delayed_work, work).

There is more where that is coming from. cache_reap determines the current 
cpu in order to find the correct per cpu cache and also determines the 
current node. If you move cache_reap to another processor / node then it 
will just clean that one and not do anything for the processor that you 
wanted it to run for. If we change processors in the middle of the run
then it may do some actions on one cpu and some on another

Having said that here is the patch that insures that makes cache_reap no 
longer use per_cpu to access the work structure. This may be a cleanup on
its own but it will not address your issues.



Use parameter passed to cache_reap to determine pointer to work structure

Use the pointer passed to cache_reap to determine the work
pointer and consolidate exit paths.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6/mm/slab.c
===
--- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-29 13:08:05.0 -0600
+++ linux-2.6.20-rc6/mm/slab.c  2007-01-29 13:17:01.695680898 -0600
@@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach
  * If we cannot acquire the cache chain mutex then just give up - we'll try
  * again on the next iteration.
  */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *w)
 {
struct kmem_cache *searchp;
struct kmem_list3 *l3;
int node = numa_node_id();
+   struct delayed_work *work =
+   container_of(w, struct delayed_work, work);
 
-   if (!mutex_trylock(_chain_mutex)) {
+   if (!mutex_trylock(_chain_mutex))
/* Give up. Setup the next iteration. */
-   schedule_delayed_work(&__get_cpu_var(reap_work),
- round_jiffies_relative(REAPTIMEOUT_CPUC));
-   return;
-   }
+   goto out;
 
list_for_each_entry(searchp, _chain, next) {
check_irq_on();
@@ -4075,9 +4074,9 @@ next:
mutex_unlock(_chain_mutex);
next_reap_node();
refresh_cpu_vm_stats(smp_processor_id());
+out:
/* Set up the next iteration */
-   schedule_delayed_work(&__get_cpu_var(reap_work),
-   round_jiffies_relative(REAPTIMEOUT_CPUC));
+   schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));
 }
 
 #ifdef CONFIG_PROC_FS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> >  The slab would need a notification 
> > that the workqueue for a processor was shutdown in order to set work.func 
> > = NULL.
> 
> The slab has a notification: CPU_XXX events. It should cancel a pending per
> cpu "reap_work timer".

Ahh. Okay then the following patch would fix it?

Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6/mm/slab.c
===
--- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600
+++ linux-2.6.20-rc6/mm/slab.c  2007-01-29 13:08:05.773928988 -0600
@@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
break;
 #ifdef CONFIG_HOTPLUG_CPU
case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+
mutex_lock(_chain_mutex);
break;
case CPU_DOWN_FAILED:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The 
> > > > last
> > > > CPU_UP will not restart a per-cpu "cache_reap timer".
> > > 
> > > Why?
> > 
> > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != 
> > NULL
> > we don't do schedule_delayed_work_on(). I think (if I am right) this is a 
> > slab's
> > bug.
> 
> The CPU_DOWN would need to set work.func == NULL for this to work. But 
> then the slab does not shut down the work queues for the processor. Isnt 
> this another issue with workqueues?

I think no, please see below. Actually, this is more related to timers for
this particular case.

>  The slab would need a notification 
> that the workqueue for a processor was shutdown in order to set work.func 
> = NULL.

The slab has a notification: CPU_XXX events. It should cancel a pending per
cpu "reap_work timer".

> > I think cache_reap() is not alone, and this is not its fault.
> > 
> > But please note another minor problem,
> > 
> > void cache_reap(struct work_struct *unused)
> > {
> > ...
> > 
> > schedule_delayed_work(&__get_cpu_var(reap_work), ...);
> > }
> > 
> > Even if smp_processor_id() was stable during the execution of cache_reap(),
> > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> > avoid this, and this is correct.
> 
> Uhh This may not be correct in terms of how the slab operates.

But this is practically impossible to avoid. We can't delay CPU_DOWN until all
workqueues flush their cwq->worklist. This is livelockable, the work can 
re-queue
itself, and new works can be added since the dying CPU is still on 
cpu_online_map.
This means that some pending works will be processed on another CPU.

delayed_work is even worse, the timer can migrate as well.

The first problem (smp_processor_id() is not stable) could be solved if we
use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.

> > This means that __get_cpu_var(reap_work) returns a "wrong" struct 
> > delayed_work.
> > This is absolutely harmless right now, but may be it's better to use
> > container_of(unused, struct delayed_work, work).
> 
> Well seems that we have a set of unresolved issues with workqueues and cpu 
> hotplug.

Yes.

Oleg.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > > CPU_UP will not restart a per-cpu "cache_reap timer".
> > 
> > Why?
> 
> Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
> we don't do schedule_delayed_work_on(). I think (if I am right) this is a 
> slab's
> bug.

The CPU_DOWN would need to set work.func == NULL for this to work. But 
then the slab does not shut down the work queues for the processor. Isnt 
this another issue with workqueues? The slab would need a notification 
that the workqueue for a processor was shutdown in order to set work.func 
= NULL.

> I think cache_reap() is not alone, and this is not its fault.
> 
> But please note another minor problem,
> 
>   void cache_reap(struct work_struct *unused)
>   {
>   ...
> 
>   schedule_delayed_work(&__get_cpu_var(reap_work), ...);
>   }
> 
> Even if smp_processor_id() was stable during the execution of cache_reap(),
> this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> avoid this, and this is correct.

Uhh This may not be correct in terms of how the slab operates.
 
> This means that __get_cpu_var(reap_work) returns a "wrong" struct 
> delayed_work.
> This is absolutely harmless right now, but may be it's better to use
> container_of(unused, struct delayed_work, work).

Well seems that we have a set of unresolved issues with workqueues and cpu 
hotplug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > Now,
> > static void __devinit start_cpu_timer(int cpu)
> > {
> > struct delayed_work *reap_work = _cpu(reap_work, cpu);
> > 
> > if (keventd_up() && reap_work->work.func == NULL) {
> > init_reap_node(cpu);
> > INIT_DELAYED_WORK(reap_work, cache_reap);
> > schedule_delayed_work_on(cpu, reap_work,
> > __round_jiffies_relative(HZ, 
> > cpu));
> > }
> > }
> > 
> > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > CPU_UP will not restart a per-cpu "cache_reap timer".
> 
> Why?

Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
bug.

> > With or without recent changes, it is possible that work->func() will run on
> > another CPU (not that to which it was submitted) if CPU goes down. In fact,
> > this can happen while work->func() is running, so even smp_processor_id()
> > is not safe to use in work->func().
> 
> But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
> general problem with schedule_delayed_work_on() and keventd?

I think this is yet another problem with workqueues/cpu-hotplug interaction.
Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't
garantee that smp_processor_id() is stable even if the thread is pinned to
specific processor.

> > However, cache_reap() seems to wrongly assume that smp_processor_id() is 
> > stable,
> > this is the second problem.
> >
> > Is my understanding correct?
>
> cache_reap assumes that the processor id is stable based on the kevent 
> thread being pinned to a processor.

I think cache_reap() is not alone, and this is not its fault.

But please note another minor problem,

void cache_reap(struct work_struct *unused)
{
...

schedule_delayed_work(&__get_cpu_var(reap_work), ...);
}

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> Now,
>   static void __devinit start_cpu_timer(int cpu)
>   {
>   struct delayed_work *reap_work = _cpu(reap_work, cpu);
> 
>   if (keventd_up() && reap_work->work.func == NULL) {
>   init_reap_node(cpu);
>   INIT_DELAYED_WORK(reap_work, cache_reap);
>   schedule_delayed_work_on(cpu, reap_work,
>   __round_jiffies_relative(HZ, 
> cpu));
>   }
>   }
> 
> This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> CPU_UP will not restart a per-cpu "cache_reap timer".

Why?

> With or without recent changes, it is possible that work->func() will run on
> another CPU (not that to which it was submitted) if CPU goes down. In fact,
> this can happen while work->func() is running, so even smp_processor_id()
> is not safe to use in work->func().

But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
general problem with schedule_delayed_work_on() and keventd?

> However, cache_reap() seems to wrongly assume that smp_processor_id() is 
> stable,
> this is the second problem.
> 
> Is my understanding correct?

cache_reap assumes that the processor id is stable based on the kevent 
thread being pinned to a processor.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

 Now,
   static void __devinit start_cpu_timer(int cpu)
   {
   struct delayed_work *reap_work = per_cpu(reap_work, cpu);
 
   if (keventd_up()  reap_work-work.func == NULL) {
   init_reap_node(cpu);
   INIT_DELAYED_WORK(reap_work, cache_reap);
   schedule_delayed_work_on(cpu, reap_work,
   __round_jiffies_relative(HZ, 
 cpu));
   }
   }
 
 This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
 CPU_UP will not restart a per-cpu cache_reap timer.

Why?

 With or without recent changes, it is possible that work-func() will run on
 another CPU (not that to which it was submitted) if CPU goes down. In fact,
 this can happen while work-func() is running, so even smp_processor_id()
 is not safe to use in work-func().

But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
general problem with schedule_delayed_work_on() and keventd?

 However, cache_reap() seems to wrongly assume that smp_processor_id() is 
 stable,
 this is the second problem.
 
 Is my understanding correct?

cache_reap assumes that the processor id is stable based on the kevent 
thread being pinned to a processor.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 On Mon, 29 Jan 2007, Oleg Nesterov wrote:
 
  Now,
  static void __devinit start_cpu_timer(int cpu)
  {
  struct delayed_work *reap_work = per_cpu(reap_work, cpu);
  
  if (keventd_up()  reap_work-work.func == NULL) {
  init_reap_node(cpu);
  INIT_DELAYED_WORK(reap_work, cache_reap);
  schedule_delayed_work_on(cpu, reap_work,
  __round_jiffies_relative(HZ, 
  cpu));
  }
  }
  
  This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
  CPU_UP will not restart a per-cpu cache_reap timer.
 
 Why?

Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL
we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
bug.

  With or without recent changes, it is possible that work-func() will run on
  another CPU (not that to which it was submitted) if CPU goes down. In fact,
  this can happen while work-func() is running, so even smp_processor_id()
  is not safe to use in work-func().
 
 But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
 general problem with schedule_delayed_work_on() and keventd?

I think this is yet another problem with workqueues/cpu-hotplug interaction.
Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't
garantee that smp_processor_id() is stable even if the thread is pinned to
specific processor.

  However, cache_reap() seems to wrongly assume that smp_processor_id() is 
  stable,
  this is the second problem.
 
  Is my understanding correct?

 cache_reap assumes that the processor id is stable based on the kevent 
 thread being pinned to a processor.

I think cache_reap() is not alone, and this is not its fault.

But please note another minor problem,

void cache_reap(struct work_struct *unused)
{
...

schedule_delayed_work(__get_cpu_var(reap_work), ...);
}

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

   This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
   CPU_UP will not restart a per-cpu cache_reap timer.
  
  Why?
 
 Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL
 we don't do schedule_delayed_work_on(). I think (if I am right) this is a 
 slab's
 bug.

The CPU_DOWN would need to set work.func == NULL for this to work. But 
then the slab does not shut down the work queues for the processor. Isnt 
this another issue with workqueues? The slab would need a notification 
that the workqueue for a processor was shutdown in order to set work.func 
= NULL.

 I think cache_reap() is not alone, and this is not its fault.
 
 But please note another minor problem,
 
   void cache_reap(struct work_struct *unused)
   {
   ...
 
   schedule_delayed_work(__get_cpu_var(reap_work), ...);
   }
 
 Even if smp_processor_id() was stable during the execution of cache_reap(),
 this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
 avoid this, and this is correct.

Uhh This may not be correct in terms of how the slab operates.
 
 This means that __get_cpu_var(reap_work) returns a wrong struct 
 delayed_work.
 This is absolutely harmless right now, but may be it's better to use
 container_of(unused, struct delayed_work, work).

Well seems that we have a set of unresolved issues with workqueues and cpu 
hotplug.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 On Mon, 29 Jan 2007, Oleg Nesterov wrote:
 
This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The 
last
CPU_UP will not restart a per-cpu cache_reap timer.
   
   Why?
  
  Because the last CPU_UP calls start_cpu_timer(), but since -work.func != 
  NULL
  we don't do schedule_delayed_work_on(). I think (if I am right) this is a 
  slab's
  bug.
 
 The CPU_DOWN would need to set work.func == NULL for this to work. But 
 then the slab does not shut down the work queues for the processor. Isnt 
 this another issue with workqueues?

I think no, please see below. Actually, this is more related to timers for
this particular case.

  The slab would need a notification 
 that the workqueue for a processor was shutdown in order to set work.func 
 = NULL.

The slab has a notification: CPU_XXX events. It should cancel a pending per
cpu reap_work timer.

  I think cache_reap() is not alone, and this is not its fault.
  
  But please note another minor problem,
  
  void cache_reap(struct work_struct *unused)
  {
  ...
  
  schedule_delayed_work(__get_cpu_var(reap_work), ...);
  }
  
  Even if smp_processor_id() was stable during the execution of cache_reap(),
  this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
  avoid this, and this is correct.
 
 Uhh This may not be correct in terms of how the slab operates.

But this is practically impossible to avoid. We can't delay CPU_DOWN until all
workqueues flush their cwq-worklist. This is livelockable, the work can 
re-queue
itself, and new works can be added since the dying CPU is still on 
cpu_online_map.
This means that some pending works will be processed on another CPU.

delayed_work is even worse, the timer can migrate as well.

The first problem (smp_processor_id() is not stable) could be solved if we
use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.

  This means that __get_cpu_var(reap_work) returns a wrong struct 
  delayed_work.
  This is absolutely harmless right now, but may be it's better to use
  container_of(unused, struct delayed_work, work).
 
 Well seems that we have a set of unresolved issues with workqueues and cpu 
 hotplug.

Yes.

Oleg.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

   The slab would need a notification 
  that the workqueue for a processor was shutdown in order to set work.func 
  = NULL.
 
 The slab has a notification: CPU_XXX events. It should cancel a pending per
 cpu reap_work timer.

Ahh. Okay then the following patch would fix it?

Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.20-rc6/mm/slab.c
===
--- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600
+++ linux-2.6.20-rc6/mm/slab.c  2007-01-29 13:08:05.773928988 -0600
@@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
break;
 #ifdef CONFIG_HOTPLUG_CPU
case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+
mutex_lock(cache_chain_mutex);
break;
case CPU_DOWN_FAILED:
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

   Even if smp_processor_id() was stable during the execution of 
   cache_reap(),
   this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
   avoid this, and this is correct.
  
  Uhh This may not be correct in terms of how the slab operates.
 
 But this is practically impossible to avoid. We can't delay CPU_DOWN until all
 workqueues flush their cwq-worklist. This is livelockable, the work can 
 re-queue
 itself, and new works can be added since the dying CPU is still on 
 cpu_online_map.
 This means that some pending works will be processed on another CPU.

But we could delay CPU_DOWN in the handler for the slab until we know that 
the cache_reaper is no longer running?

   This means that __get_cpu_var(reap_work) returns a wrong struct 
   delayed_work.
   This is absolutely harmless right now, but may be it's better to use
   container_of(unused, struct delayed_work, work).

There is more where that is coming from. cache_reap determines the current 
cpu in order to find the correct per cpu cache and also determines the 
current node. If you move cache_reap to another processor / node then it 
will just clean that one and not do anything for the processor that you 
wanted it to run for. If we change processors in the middle of the run
then it may do some actions on one cpu and some on another

Having said that here is the patch that insures that makes cache_reap no 
longer use per_cpu to access the work structure. This may be a cleanup on
its own but it will not address your issues.



Use parameter passed to cache_reap to determine pointer to work structure

Use the pointer passed to cache_reap to determine the work
pointer and consolidate exit paths.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.20-rc6/mm/slab.c
===
--- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-29 13:08:05.0 -0600
+++ linux-2.6.20-rc6/mm/slab.c  2007-01-29 13:17:01.695680898 -0600
@@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach
  * If we cannot acquire the cache chain mutex then just give up - we'll try
  * again on the next iteration.
  */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *w)
 {
struct kmem_cache *searchp;
struct kmem_list3 *l3;
int node = numa_node_id();
+   struct delayed_work *work =
+   container_of(w, struct delayed_work, work);
 
-   if (!mutex_trylock(cache_chain_mutex)) {
+   if (!mutex_trylock(cache_chain_mutex))
/* Give up. Setup the next iteration. */
-   schedule_delayed_work(__get_cpu_var(reap_work),
- round_jiffies_relative(REAPTIMEOUT_CPUC));
-   return;
-   }
+   goto out;
 
list_for_each_entry(searchp, cache_chain, next) {
check_irq_on();
@@ -4075,9 +4074,9 @@ next:
mutex_unlock(cache_chain_mutex);
next_reap_node();
refresh_cpu_vm_stats(smp_processor_id());
+out:
/* Set up the next iteration */
-   schedule_delayed_work(__get_cpu_var(reap_work),
-   round_jiffies_relative(REAPTIMEOUT_CPUC));
+   schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));
 }
 
 #ifdef CONFIG_PROC_FS
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 On Mon, 29 Jan 2007, Oleg Nesterov wrote:
 
The slab would need a notification 
   that the workqueue for a processor was shutdown in order to set work.func 
   = NULL.
  
  The slab has a notification: CPU_XXX events. It should cancel a pending per
  cpu reap_work timer.
 
 Ahh. Okay then the following patch would fix it?
 
 Shutdown cache_reaper when cpu goes down
 
 Shutdown the cache_reaper in slab.c if the cpu is brought down
 and set the cache_reap.func to NULL. Otherwise hotplug shuts
 down the reaper for good.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 Index: linux-2.6.20-rc6/mm/slab.c
 ===
 --- linux-2.6.20-rc6.orig/mm/slab.c   2007-01-24 20:19:28.0 -0600
 +++ linux-2.6.20-rc6/mm/slab.c2007-01-29 13:08:05.773928988 -0600
 @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
   break;
  #ifdef CONFIG_HOTPLUG_CPU
   case CPU_DOWN_PREPARE:
 + /* Shutdown cache reaper */
 + cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
 + per_cpu(reap_work, cpu).work.func = NULL;
 +
   mutex_lock(cache_chain_mutex);
   break;
   case CPU_DOWN_FAILED:

Then CPU_DOWN_FAILED should do start_cpu_timer(cpu).

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 On Mon, 29 Jan 2007, Oleg Nesterov wrote:
 
Even if smp_processor_id() was stable during the execution of 
cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We 
can't
avoid this, and this is correct.
   
   Uhh This may not be correct in terms of how the slab operates.
  
  But this is practically impossible to avoid. We can't delay CPU_DOWN until 
  all
  workqueues flush their cwq-worklist. This is livelockable, the work can 
  re-queue
  itself, and new works can be added since the dying CPU is still on 
  cpu_online_map.
  This means that some pending works will be processed on another CPU.
 
 But we could delay CPU_DOWN in the handler for the slab until we know that 
 the cache_reaper is no longer running?

Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
like you did in the previous patch. Did you mean this? If yes - then yes :)

This means that __get_cpu_var(reap_work) returns a wrong struct 
delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).
 
 There is more where that is coming from. cache_reap determines the current 
 cpu in order to find the correct per cpu cache and also determines the 
 current node. If you move cache_reap to another processor / node then it 
 will just clean that one and not do anything for the processor that you 
 wanted it to run for.

Worse, we can have 2 handlers running in parallel on the same CPU. But this
is fixed by your previous patch, I believe.

If we change processors in the middle of the run
 then it may do some actions on one cpu and some on another

Yep. For example, next_reap_node() will not be happy if we change CPU in
the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN,
and cache_reap() should not care about this.

 -static void cache_reap(struct work_struct *unused)
 +static void cache_reap(struct work_struct *w)
  {
   struct kmem_cache *searchp;
   struct kmem_list3 *l3;
   int node = numa_node_id();
 + struct delayed_work *work =
 + container_of(w, struct delayed_work, work);

 ...

 + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));

Actually, I was wrong. Yes, this should work, but only with your previous
patch. Otherwise, if the handler runs on the wrong CPU (this is not possible
since you added cancel_delayed_work()), we are in fact starting a second reaper
on the same CPU, not good.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote:

  But we could delay CPU_DOWN in the handler for the slab until we know that 
  the cache_reaper is no longer running?
 
 Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
 like you did in the previous patch. Did you mean this? If yes - then yes :)

Yes sure.

 Worse, we can have 2 handlers running in parallel on the same CPU. But this
 is fixed by your previous patch, I believe.

Good.

Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
CPU_DOWN_FAILED somehow vanished in mm?





Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm1/mm/slab.c
===
--- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 -0600
+++ linux-2.6.20-rc6-mm1/mm/slab.c  2007-01-29 14:21:18.119155877 -0600
@@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
start_cpu_timer(cpu);
break;
 #ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+
+   mutex_lock(cache_chain_mutex);
+   break;
+   case CPU_DOWN_FAILED:
+   mutex_unlock(cache_chain_mutex);
+   start_cpu_timer(cpu);
+   break;
case CPU_DEAD:
/*
 * Even if all the cpus of a node are down, we don't free the
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
 CPU_DOWN_FAILED somehow vanished in mm?

No, no, there are still in place, so I believe your patch is good.

Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
but this is not related.

 Shutdown cache_reaper when cpu goes down
 
 Shutdown the cache_reaper in slab.c if the cpu is brought down
 and set the cache_reap.func to NULL. Otherwise hotplug shuts
 down the reaper for good.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 Index: linux-2.6.20-rc6-mm1/mm/slab.c
 ===
 --- linux-2.6.20-rc6-mm1.orig/mm/slab.c   2007-01-29 14:18:37.0 
 -0600
 +++ linux-2.6.20-rc6-mm1/mm/slab.c2007-01-29 14:21:18.119155877 -0600
 @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
   start_cpu_timer(cpu);
   break;
  #ifdef CONFIG_HOTPLUG_CPU
 + case CPU_DOWN_PREPARE:
 + /* Shutdown cache reaper */
 + cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
 + per_cpu(reap_work, cpu).work.func = NULL;
 +
 + mutex_lock(cache_chain_mutex);
 + break;
 + case CPU_DOWN_FAILED:
 + mutex_unlock(cache_chain_mutex);
 + start_cpu_timer(cpu);
 + break;
   case CPU_DEAD:
   /*
* Even if all the cpus of a node are down, we don't free the

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Tue, 30 Jan 2007, Oleg Nesterov wrote:

 Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
 so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
 but this is not related.

Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
anymore, right? Which brings us to this form of the patch:


Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm2/mm/slab.c
===
--- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600
+++ linux-2.6.20-rc6-mm2/mm/slab.c  2007-01-29 15:47:18.293962726 -0600
@@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
start_cpu_timer(cpu);
break;
 #ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DOWN_PREPARE:
+   /* Shutdown cache reaper */
+   cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
+   per_cpu(reap_work, cpu).work.func = NULL;
+   break;
+   case CPU_DOWN_FAILED:
+   start_cpu_timer(cpu);
+   break;
case CPU_DEAD:
/*
 * Even if all the cpus of a node are down, we don't free the
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote:

 On Tue, 30 Jan 2007, Oleg Nesterov wrote:
 
  Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
  so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
  but this is not related.
 
 Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
 anymore, right? Which brings us to this form of the patch:

Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so
on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted,
sorry for the confusion!

Note: with the patch below we are doing cancel_rearming_delayed_work()
under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(),
so deadlock is not possible. However, this means that mutex_trylock() becomes
mandatory for cache_reap(), probably a little comment will be good.

 Shutdown cache_reaper when cpu goes down
 
 Shutdown the cache_reaper in slab.c if the cpu is brought down
 and set the cache_reap.func to NULL. Otherwise hotplug shuts
 down the reaper for good.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 Index: linux-2.6.20-rc6-mm2/mm/slab.c
 ===
 --- linux-2.6.20-rc6-mm2.orig/mm/slab.c   2007-01-29 14:27:34.199229828 
 -0600
 +++ linux-2.6.20-rc6-mm2/mm/slab.c2007-01-29 15:47:18.293962726 -0600
 @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
   start_cpu_timer(cpu);
   break;
  #ifdef CONFIG_HOTPLUG_CPU
 + case CPU_DOWN_PREPARE:
 + /* Shutdown cache reaper */
 + cancel_rearming_delayed_work(per_cpu(reap_work, cpu));
 + per_cpu(reap_work, cpu).work.func = NULL;
 + break;
 + case CPU_DOWN_FAILED:
 + start_cpu_timer(cpu);
 + break;
   case CPU_DEAD:
   /*
* Even if all the cpus of a node are down, we don't free the

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-28 Thread Oleg Nesterov
For the beginning, about another (but related) minor problem,

debug_smp_processor_id:

/*
 * Kernel threads bound to a single CPU can safely use
 * smp_processor_id():
 */

This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when
the task takes a preemption or sleeps. I think we need #ifndef here.


Now,
static void __devinit start_cpu_timer(int cpu)
{
struct delayed_work *reap_work = _cpu(reap_work, cpu);

if (keventd_up() && reap_work->work.func == NULL) {
init_reap_node(cpu);
INIT_DELAYED_WORK(reap_work, cache_reap);
schedule_delayed_work_on(cpu, reap_work,
__round_jiffies_relative(HZ, 
cpu));
}
}

This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
CPU_UP will not restart a per-cpu "cache_reap timer".


With or without recent changes, it is possible that work->func() will run on
another CPU (not that to which it was submitted) if CPU goes down. In fact,
this can happen while work->func() is running, so even smp_processor_id()
is not safe to use in work->func().

However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
this is the second problem.

Is my understanding correct?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-28 Thread Oleg Nesterov
For the beginning, about another (but related) minor problem,

debug_smp_processor_id:

/*
 * Kernel threads bound to a single CPU can safely use
 * smp_processor_id():
 */

This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when
the task takes a preemption or sleeps. I think we need #ifndef here.


Now,
static void __devinit start_cpu_timer(int cpu)
{
struct delayed_work *reap_work = per_cpu(reap_work, cpu);

if (keventd_up()  reap_work-work.func == NULL) {
init_reap_node(cpu);
INIT_DELAYED_WORK(reap_work, cache_reap);
schedule_delayed_work_on(cpu, reap_work,
__round_jiffies_relative(HZ, 
cpu));
}
}

This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
CPU_UP will not restart a per-cpu cache_reap timer.


With or without recent changes, it is possible that work-func() will run on
another CPU (not that to which it was submitted) if CPU goes down. In fact,
this can happen while work-func() is running, so even smp_processor_id()
is not safe to use in work-func().

However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
this is the second problem.

Is my understanding correct?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/