Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > Ok. Sounds like disabling cache_reaper is a better option for now. Like you > said > it's unlikely that slabs will grow much if that cpu is not heavily used by the > kernel. Running for prolonged times without cache_reaper is no good. What we are talking about here is to disable the cache_reaper during cpu shutdown. The slab cpu shutdown will clean the per cpu caches anyways so we really do not need the slab_reaper running during cpu shutdown. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. A cpu must have a per cpu cache in order to do slab allocations. The locking in the slab allocator depends on it. If the isolated cpus have no need for slab allocations then you will also not need to run the slab_reaper(). Ok. Sounds like disabling cache_reaper is a better option for now. Like you said it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. Thanks Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I guess I kind of hijacked the thread. The second part of my first email was > dropped. Basically I was saying that I'm working on CPU isolation extensions. > Where an isolated CPU is not supposed to do much kernel work. In which case > you'd want to run slab cache reaper on some other CPU on behalf of the > isolated > one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now > that you guys fixed the hotplug case it does not help in that scenario. A cpu must have a per cpu cache in order to do slab allocations. The locking in the slab allocator depends on it. If the isolated cpus have no need for slab allocations then you will also not need to run the slab_reaper(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I agree that running the reaper on the wrong CPU is not the best way to go > about it. > But it seems like disabling it is even worse, unless I missing something. ie > wasting > memory. Disabling during shutdown is no problem because the per cpu caches are going to be drained anyways by the cpu down code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Oleg Nesterov wrote: On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. I personally agree. Besides, cache_reaper is not alone. Note the comment in debug_smp_processor_id() about cpu-bound processes. The slab does correct thing right now, stops the timer on CPU_DEAD. Other problems imho should be solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. The cache_reaper() is of no importance to hotplug. You just need to make sure that it is not in the way (disable it and if its running wait until the cache_reaper has finished). I agree that running the reaper on the wrong CPU is not the best way to go about it. But it seems like disabling it is even worse, unless I missing something. ie wasting memory. btw What kind of troubles were you talking about ? Performance or robustness ? As I said performance wise it does not make sense to run reaper on the wrong CPU but it does seems to work just fine from the correctness (locking, etc) perspective. Again it's totally possible that I'm missing something here :). Thanks Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 02/20, Christoph Lameter wrote: > > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > > > Well seems that we have a set of unresolved issues with workqueues and > > > > cpu > > > > hotplug. > > > > How about storing 'cpu' explicitly in the work queue instead of relying on > > the > > smp_processor_id() and friends ? That way there is no ambiguity when > > threads/timers get > > moved around. > > The slab functionality is designed to work on the processor with the > queue. These tricks will only cause more trouble in the future. The > cache_reaper needs to be either disabled or run on the right processor. It > should never run on the wrong processor. I personally agree. Besides, cache_reaper is not alone. Note the comment in debug_smp_processor_id() about cpu-bound processes. The slab does correct thing right now, stops the timer on CPU_DEAD. Other problems imho should be solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Folks, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq->worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. delayed_work is even worse, the timer can migrate as well. The first problem (smp_processor_id() is not stable) could be solved if we use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. I'm cooking a set of patches to extend cpu isolation concept a bit. In which case I'd like one CPU to run cache_reap timer on behalf of another cpu. See the patch below. diff --git a/mm/slab.c b/mm/slab.c index c610062..0f46d11 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -766,7 +766,17 @@ int slab_is_available(void) return g_cpucache_up == FULL; } -static DEFINE_PER_CPU(struct delayed_work, reap_work); +struct slab_work { + struct delayed_work dw; + unsigned int cpu; +}; + +static DEFINE_PER_CPU(struct slab_work, reap_work); + +static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, unsigned int cpu) +{ + return cachep->array[cpu]; +} static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep) { @@ -915,9 +925,9 @@ static void init_reap_node(int cpu) per_cpu(reap_node, cpu) = node; } -static void next_reap_node(void) +static void next_reap_node(unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); /* * Also drain per cpu pages on remote zones @@ -928,12 +938,12 @@ static void next_reap_node(void) node = next_node(node, node_online_map); if (unlikely(node >= MAX_NUMNODES)) node = first_node(node_online_map); - __get_cpu_var(reap_node) = node; + per_cpu(reap_node, cpu) = node; } #else #define init_reap_node(cpu) do { } while (0) -#define next_reap_node(void) do { } while (0) +#define next_reap_node(cpu) do { } while (0) #endif /* @@ -945,17 +955,18 @@ static void next_reap_node(void) */ static void __devinit start_cpu_timer(int cpu) { - struct delayed_work *reap_work = _cpu(reap_work, cpu); + struct slab_work *reap_work = _cpu(reap_work, cpu); /* * When this gets called from do_initcalls via cpucache_init(), * init_workqueues() has already run, so keventd will be setup * at that time. */ - if (keventd_up() && reap_work->work.func == NULL) { + if (keventd_up() && reap_work->dw.work.func == NULL) { init_reap_node(cpu); - INIT_DELAYED_WORK(reap_work, cache_reap); - schedule_delayed_work_on(cpu, reap_work, + INIT_DELAYED_WORK(_work->dw, cache_reap); + reap_work->cpu = cpu; + schedule_delayed_work_on(cpu, _work->dw, __round_jiffies_relative(HZ, cpu)); } } @@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to, #ifndef CONFIG_NUMA #define drain_alien_cache(cachep, alien) do { } while (0) -#define reap_alien(cachep, l3) do { } while (0) +#define reap_alien(cachep, l3, cpu) do { } while (0) static inline struct array_cache **alloc_alien_cache(int node, int limit) { @@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep, /* * Called from cache_reap() to regularly drain alien caches round robin. */ -static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3) +static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); if (l3->alien) { struct array_cache *ac = l3->alien[node]; @@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3, * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *_work) { struct kmem_cache *searchp; struct
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > Well seems that we have a set of unresolved issues with workqueues and cpu > > > hotplug. > > How about storing 'cpu' explicitly in the work queue instead of relying on the > smp_processor_id() and friends ? That way there is no ambiguity when > threads/timers get > moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. The cache_reaper() is of no importance to hotplug. You just need to make sure that it is not in the way (disable it and if its running wait until the cache_reaper has finished). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. The cache_reaper() is of no importance to hotplug. You just need to make sure that it is not in the way (disable it and if its running wait until the cache_reaper has finished). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Folks, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq-worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. delayed_work is even worse, the timer can migrate as well. The first problem (smp_processor_id() is not stable) could be solved if we use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. I'm cooking a set of patches to extend cpu isolation concept a bit. In which case I'd like one CPU to run cache_reap timer on behalf of another cpu. See the patch below. diff --git a/mm/slab.c b/mm/slab.c index c610062..0f46d11 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -766,7 +766,17 @@ int slab_is_available(void) return g_cpucache_up == FULL; } -static DEFINE_PER_CPU(struct delayed_work, reap_work); +struct slab_work { + struct delayed_work dw; + unsigned int cpu; +}; + +static DEFINE_PER_CPU(struct slab_work, reap_work); + +static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, unsigned int cpu) +{ + return cachep-array[cpu]; +} static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep) { @@ -915,9 +925,9 @@ static void init_reap_node(int cpu) per_cpu(reap_node, cpu) = node; } -static void next_reap_node(void) +static void next_reap_node(unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); /* * Also drain per cpu pages on remote zones @@ -928,12 +938,12 @@ static void next_reap_node(void) node = next_node(node, node_online_map); if (unlikely(node = MAX_NUMNODES)) node = first_node(node_online_map); - __get_cpu_var(reap_node) = node; + per_cpu(reap_node, cpu) = node; } #else #define init_reap_node(cpu) do { } while (0) -#define next_reap_node(void) do { } while (0) +#define next_reap_node(cpu) do { } while (0) #endif /* @@ -945,17 +955,18 @@ static void next_reap_node(void) */ static void __devinit start_cpu_timer(int cpu) { - struct delayed_work *reap_work = per_cpu(reap_work, cpu); + struct slab_work *reap_work = per_cpu(reap_work, cpu); /* * When this gets called from do_initcalls via cpucache_init(), * init_workqueues() has already run, so keventd will be setup * at that time. */ - if (keventd_up() reap_work-work.func == NULL) { + if (keventd_up() reap_work-dw.work.func == NULL) { init_reap_node(cpu); - INIT_DELAYED_WORK(reap_work, cache_reap); - schedule_delayed_work_on(cpu, reap_work, + INIT_DELAYED_WORK(reap_work-dw, cache_reap); + reap_work-cpu = cpu; + schedule_delayed_work_on(cpu, reap_work-dw, __round_jiffies_relative(HZ, cpu)); } } @@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to, #ifndef CONFIG_NUMA #define drain_alien_cache(cachep, alien) do { } while (0) -#define reap_alien(cachep, l3) do { } while (0) +#define reap_alien(cachep, l3, cpu) do { } while (0) static inline struct array_cache **alloc_alien_cache(int node, int limit) { @@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep, /* * Called from cache_reap() to regularly drain alien caches round robin. */ -static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3) +static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); if (l3-alien) { struct array_cache *ac = l3-alien[node]; @@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3, * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *_work) { struct kmem_cache *searchp; struct
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. I personally agree. Besides, cache_reaper is not alone. Note the comment in debug_smp_processor_id() about cpu-bound processes. The slab does correct thing right now, stops the timer on CPU_DEAD. Other problems imho should be solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. The cache_reaper() is of no importance to hotplug. You just need to make sure that it is not in the way (disable it and if its running wait until the cache_reaper has finished). I agree that running the reaper on the wrong CPU is not the best way to go about it. But it seems like disabling it is even worse, unless I missing something. ie wasting memory. btw What kind of troubles were you talking about ? Performance or robustness ? As I said performance wise it does not make sense to run reaper on the wrong CPU but it does seems to work just fine from the correctness (locking, etc) perspective. Again it's totally possible that I'm missing something here :). Thanks Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Oleg Nesterov wrote: On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. I personally agree. Besides, cache_reaper is not alone. Note the comment in debug_smp_processor_id() about cpu-bound processes. The slab does correct thing right now, stops the timer on CPU_DEAD. Other problems imho should be solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: I agree that running the reaper on the wrong CPU is not the best way to go about it. But it seems like disabling it is even worse, unless I missing something. ie wasting memory. Disabling during shutdown is no problem because the per cpu caches are going to be drained anyways by the cpu down code. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. A cpu must have a per cpu cache in order to do slab allocations. The locking in the slab allocator depends on it. If the isolated cpus have no need for slab allocations then you will also not need to run the slab_reaper(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. A cpu must have a per cpu cache in order to do slab allocations. The locking in the slab allocator depends on it. If the isolated cpus have no need for slab allocations then you will also not need to run the slab_reaper(). Ok. Sounds like disabling cache_reaper is a better option for now. Like you said it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. Thanks Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 20 Feb 2007, Max Krasnyansky wrote: Ok. Sounds like disabling cache_reaper is a better option for now. Like you said it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. Running for prolonged times without cache_reaper is no good. What we are talking about here is to disable the cache_reaper during cpu shutdown. The slab cpu shutdown will clean the per cpu caches anyways so we really do not need the slab_reaper running during cpu shutdown. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > On Tue, 30 Jan 2007, Oleg Nesterov wrote: > > > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > > but this is not related. > > Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX > anymore, right? Which brings us to this form of the patch: Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted, sorry for the confusion! Note: with the patch below we are doing cancel_rearming_delayed_work() under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(), so deadlock is not possible. However, this means that mutex_trylock() becomes mandatory for cache_reap(), probably a little comment will be good. > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.20-rc6-mm2/mm/slab.c > === > --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 > -0600 > +++ linux-2.6.20-rc6-mm2/mm/slab.c2007-01-29 15:47:18.293962726 -0600 > @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru > start_cpu_timer(cpu); > break; > #ifdef CONFIG_HOTPLUG_CPU > + case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + break; > + case CPU_DOWN_FAILED: > + start_cpu_timer(cpu); > + break; > case CPU_DEAD: > /* >* Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 30 Jan 2007, Oleg Nesterov wrote: > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which brings us to this form of the patch: Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.20-rc6-mm2/mm/slab.c === --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600 +++ linux-2.6.20-rc6-mm2/mm/slab.c 2007-01-29 15:47:18.293962726 -0600 @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + break; + case CPU_DOWN_FAILED: + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and > CPU_DOWN_FAILED somehow vanished in mm? No, no, there are still in place, so I believe your patch is good. Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.20-rc6-mm1/mm/slab.c > === > --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 > -0600 > +++ linux-2.6.20-rc6-mm1/mm/slab.c2007-01-29 14:21:18.119155877 -0600 > @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru > start_cpu_timer(cpu); > break; > #ifdef CONFIG_HOTPLUG_CPU > + case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + > + mutex_lock(_chain_mutex); > + break; > + case CPU_DOWN_FAILED: > + mutex_unlock(_chain_mutex); > + start_cpu_timer(cpu); > + break; > case CPU_DEAD: > /* >* Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > But we could delay CPU_DOWN in the handler for the slab until we know that > > the cache_reaper is no longer running? > > Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper > like you did in the previous patch. Did you mean this? If yes - then yes :) Yes sure. > Worse, we can have 2 handlers running in parallel on the same CPU. But this > is fixed by your previous patch, I believe. Good. Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and CPU_DOWN_FAILED somehow vanished in mm? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.20-rc6-mm1/mm/slab.c === --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 -0600 +++ linux-2.6.20-rc6-mm1/mm/slab.c 2007-01-29 14:21:18.119155877 -0600 @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + + mutex_lock(_chain_mutex); + break; + case CPU_DOWN_FAILED: + mutex_unlock(_chain_mutex); + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > Even if smp_processor_id() was stable during the execution of > > > > cache_reap(), > > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We > > > > can't > > > > avoid this, and this is correct. > > > > > > Uhh This may not be correct in terms of how the slab operates. > > > > But this is practically impossible to avoid. We can't delay CPU_DOWN until > > all > > workqueues flush their cwq->worklist. This is livelockable, the work can > > re-queue > > itself, and new works can be added since the dying CPU is still on > > cpu_online_map. > > This means that some pending works will be processed on another CPU. > > But we could delay CPU_DOWN in the handler for the slab until we know that > the cache_reaper is no longer running? Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper like you did in the previous patch. Did you mean this? If yes - then yes :) > > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct > > > > delayed_work. > > > > This is absolutely harmless right now, but may be it's better to use > > > > container_of(unused, struct delayed_work, work). > > There is more where that is coming from. cache_reap determines the current > cpu in order to find the correct per cpu cache and also determines the > current node. If you move cache_reap to another processor / node then it > will just clean that one and not do anything for the processor that you > wanted it to run for. Worse, we can have 2 handlers running in parallel on the same CPU. But this is fixed by your previous patch, I believe. >If we change processors in the middle of the run > then it may do some actions on one cpu and some on another Yep. For example, next_reap_node() will not be happy if we change CPU in the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN, and cache_reap() should not care about this. > -static void cache_reap(struct work_struct *unused) > +static void cache_reap(struct work_struct *w) > { > struct kmem_cache *searchp; > struct kmem_list3 *l3; > int node = numa_node_id(); > + struct delayed_work *work = > + container_of(w, struct delayed_work, work); > > ... > > + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); Actually, I was wrong. Yes, this should work, but only with your previous patch. Otherwise, if the handler runs on the "wrong" CPU (this is not possible since you added cancel_delayed_work()), we are in fact starting a second reaper on the same CPU, not good. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > The slab would need a notification > > > that the workqueue for a processor was shutdown in order to set work.func > > > = NULL. > > > > The slab has a notification: CPU_XXX events. It should cancel a pending per > > cpu "reap_work timer". > > Ahh. Okay then the following patch would fix it? > > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.20-rc6/mm/slab.c > === > --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600 > +++ linux-2.6.20-rc6/mm/slab.c2007-01-29 13:08:05.773928988 -0600 > @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru > break; > #ifdef CONFIG_HOTPLUG_CPU > case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + > mutex_lock(_chain_mutex); > break; > case CPU_DOWN_FAILED: Then CPU_DOWN_FAILED should do start_cpu_timer(cpu). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Even if smp_processor_id() was stable during the execution of > > > cache_reap(), > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > > avoid this, and this is correct. > > > > Uhh This may not be correct in terms of how the slab operates. > > But this is practically impossible to avoid. We can't delay CPU_DOWN until all > workqueues flush their cwq->worklist. This is livelockable, the work can > re-queue > itself, and new works can be added since the dying CPU is still on > cpu_online_map. > This means that some pending works will be processed on another CPU. But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct > > > delayed_work. > > > This is absolutely harmless right now, but may be it's better to use > > > container_of(unused, struct delayed_work, work). There is more where that is coming from. cache_reap determines the current cpu in order to find the correct per cpu cache and also determines the current node. If you move cache_reap to another processor / node then it will just clean that one and not do anything for the processor that you wanted it to run for. If we change processors in the middle of the run then it may do some actions on one cpu and some on another Having said that here is the patch that insures that makes cache_reap no longer use per_cpu to access the work structure. This may be a cleanup on its own but it will not address your issues. Use parameter passed to cache_reap to determine pointer to work structure Use the pointer passed to cache_reap to determine the work pointer and consolidate exit paths. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.20-rc6/mm/slab.c === --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-29 13:08:05.0 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:17:01.695680898 -0600 @@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *w) { struct kmem_cache *searchp; struct kmem_list3 *l3; int node = numa_node_id(); + struct delayed_work *work = + container_of(w, struct delayed_work, work); - if (!mutex_trylock(_chain_mutex)) { + if (!mutex_trylock(_chain_mutex)) /* Give up. Setup the next iteration. */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); - return; - } + goto out; list_for_each_entry(searchp, _chain, next) { check_irq_on(); @@ -4075,9 +4074,9 @@ next: mutex_unlock(_chain_mutex); next_reap_node(); refresh_cpu_vm_stats(smp_processor_id()); +out: /* Set up the next iteration */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); } #ifdef CONFIG_PROC_FS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > The slab would need a notification > > that the workqueue for a processor was shutdown in order to set work.func > > = NULL. > > The slab has a notification: CPU_XXX events. It should cancel a pending per > cpu "reap_work timer". Ahh. Okay then the following patch would fix it? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.20-rc6/mm/slab.c === --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:08:05.773928988 -0600 @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru break; #ifdef CONFIG_HOTPLUG_CPU case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + mutex_lock(_chain_mutex); break; case CPU_DOWN_FAILED: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The > > > > last > > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > > > Why? > > > > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != > > NULL > > we don't do schedule_delayed_work_on(). I think (if I am right) this is a > > slab's > > bug. > > The CPU_DOWN would need to set work.func == NULL for this to work. But > then the slab does not shut down the work queues for the processor. Isnt > this another issue with workqueues? I think no, please see below. Actually, this is more related to timers for this particular case. > The slab would need a notification > that the workqueue for a processor was shutdown in order to set work.func > = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu "reap_work timer". > > I think cache_reap() is not alone, and this is not its fault. > > > > But please note another minor problem, > > > > void cache_reap(struct work_struct *unused) > > { > > ... > > > > schedule_delayed_work(&__get_cpu_var(reap_work), ...); > > } > > > > Even if smp_processor_id() was stable during the execution of cache_reap(), > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > avoid this, and this is correct. > > Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq->worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. delayed_work is even worse, the timer can migrate as well. The first problem (smp_processor_id() is not stable) could be solved if we use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. > > This means that __get_cpu_var(reap_work) returns a "wrong" struct > > delayed_work. > > This is absolutely harmless right now, but may be it's better to use > > container_of(unused, struct delayed_work, work). > > Well seems that we have a set of unresolved issues with workqueues and cpu > hotplug. Yes. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > Why? > > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL > we don't do schedule_delayed_work_on(). I think (if I am right) this is a > slab's > bug. The CPU_DOWN would need to set work.func == NULL for this to work. But then the slab does not shut down the work queues for the processor. Isnt this another issue with workqueues? The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. > I think cache_reap() is not alone, and this is not its fault. > > But please note another minor problem, > > void cache_reap(struct work_struct *unused) > { > ... > > schedule_delayed_work(&__get_cpu_var(reap_work), ...); > } > > Even if smp_processor_id() was stable during the execution of cache_reap(), > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. > This means that __get_cpu_var(reap_work) returns a "wrong" struct > delayed_work. > This is absolutely harmless right now, but may be it's better to use > container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Now, > > static void __devinit start_cpu_timer(int cpu) > > { > > struct delayed_work *reap_work = _cpu(reap_work, cpu); > > > > if (keventd_up() && reap_work->work.func == NULL) { > > init_reap_node(cpu); > > INIT_DELAYED_WORK(reap_work, cache_reap); > > schedule_delayed_work_on(cpu, reap_work, > > __round_jiffies_relative(HZ, > > cpu)); > > } > > } > > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > CPU_UP will not restart a per-cpu "cache_reap timer". > > Why? Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's bug. > > With or without recent changes, it is possible that work->func() will run on > > another CPU (not that to which it was submitted) if CPU goes down. In fact, > > this can happen while work->func() is running, so even smp_processor_id() > > is not safe to use in work->func(). > > But the work func was scheduled by schedule_delayed_work_on(). Isnt that a > general problem with schedule_delayed_work_on() and keventd? I think this is yet another problem with workqueues/cpu-hotplug interaction. Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't garantee that smp_processor_id() is stable even if the thread is pinned to specific processor. > > However, cache_reap() seems to wrongly assume that smp_processor_id() is > > stable, > > this is the second problem. > > > > Is my understanding correct? > > cache_reap assumes that the processor id is stable based on the kevent > thread being pinned to a processor. I think cache_reap() is not alone, and this is not its fault. But please note another minor problem, void cache_reap(struct work_struct *unused) { ... schedule_delayed_work(&__get_cpu_var(reap_work), ...); } Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > Now, > static void __devinit start_cpu_timer(int cpu) > { > struct delayed_work *reap_work = _cpu(reap_work, cpu); > > if (keventd_up() && reap_work->work.func == NULL) { > init_reap_node(cpu); > INIT_DELAYED_WORK(reap_work, cache_reap); > schedule_delayed_work_on(cpu, reap_work, > __round_jiffies_relative(HZ, > cpu)); > } > } > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > CPU_UP will not restart a per-cpu "cache_reap timer". Why? > With or without recent changes, it is possible that work->func() will run on > another CPU (not that to which it was submitted) if CPU goes down. In fact, > this can happen while work->func() is running, so even smp_processor_id() > is not safe to use in work->func(). But the work func was scheduled by schedule_delayed_work_on(). Isnt that a general problem with schedule_delayed_work_on() and keventd? > However, cache_reap() seems to wrongly assume that smp_processor_id() is > stable, > this is the second problem. > > Is my understanding correct? cache_reap assumes that the processor id is stable based on the kevent thread being pinned to a processor. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = per_cpu(reap_work, cpu); if (keventd_up() reap_work-work.func == NULL) { init_reap_node(cpu); INIT_DELAYED_WORK(reap_work, cache_reap); schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu)); } } This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? With or without recent changes, it is possible that work-func() will run on another CPU (not that to which it was submitted) if CPU goes down. In fact, this can happen while work-func() is running, so even smp_processor_id() is not safe to use in work-func(). But the work func was scheduled by schedule_delayed_work_on(). Isnt that a general problem with schedule_delayed_work_on() and keventd? However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, this is the second problem. Is my understanding correct? cache_reap assumes that the processor id is stable based on the kevent thread being pinned to a processor. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = per_cpu(reap_work, cpu); if (keventd_up() reap_work-work.func == NULL) { init_reap_node(cpu); INIT_DELAYED_WORK(reap_work, cache_reap); schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu)); } } This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's bug. With or without recent changes, it is possible that work-func() will run on another CPU (not that to which it was submitted) if CPU goes down. In fact, this can happen while work-func() is running, so even smp_processor_id() is not safe to use in work-func(). But the work func was scheduled by schedule_delayed_work_on(). Isnt that a general problem with schedule_delayed_work_on() and keventd? I think this is yet another problem with workqueues/cpu-hotplug interaction. Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't garantee that smp_processor_id() is stable even if the thread is pinned to specific processor. However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, this is the second problem. Is my understanding correct? cache_reap assumes that the processor id is stable based on the kevent thread being pinned to a processor. I think cache_reap() is not alone, and this is not its fault. But please note another minor problem, void cache_reap(struct work_struct *unused) { ... schedule_delayed_work(__get_cpu_var(reap_work), ...); } Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's bug. The CPU_DOWN would need to set work.func == NULL for this to work. But then the slab does not shut down the work queues for the processor. Isnt this another issue with workqueues? The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. I think cache_reap() is not alone, and this is not its fault. But please note another minor problem, void cache_reap(struct work_struct *unused) { ... schedule_delayed_work(__get_cpu_var(reap_work), ...); } Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's bug. The CPU_DOWN would need to set work.func == NULL for this to work. But then the slab does not shut down the work queues for the processor. Isnt this another issue with workqueues? I think no, please see below. Actually, this is more related to timers for this particular case. The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu reap_work timer. I think cache_reap() is not alone, and this is not its fault. But please note another minor problem, void cache_reap(struct work_struct *unused) { ... schedule_delayed_work(__get_cpu_var(reap_work), ...); } Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq-worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. delayed_work is even worse, the timer can migrate as well. The first problem (smp_processor_id() is not stable) could be solved if we use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. Yes. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu reap_work timer. Ahh. Okay then the following patch would fix it? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6/mm/slab.c === --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:08:05.773928988 -0600 @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru break; #ifdef CONFIG_HOTPLUG_CPU case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + mutex_lock(cache_chain_mutex); break; case CPU_DOWN_FAILED: - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq-worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). There is more where that is coming from. cache_reap determines the current cpu in order to find the correct per cpu cache and also determines the current node. If you move cache_reap to another processor / node then it will just clean that one and not do anything for the processor that you wanted it to run for. If we change processors in the middle of the run then it may do some actions on one cpu and some on another Having said that here is the patch that insures that makes cache_reap no longer use per_cpu to access the work structure. This may be a cleanup on its own but it will not address your issues. Use parameter passed to cache_reap to determine pointer to work structure Use the pointer passed to cache_reap to determine the work pointer and consolidate exit paths. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6/mm/slab.c === --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-29 13:08:05.0 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:17:01.695680898 -0600 @@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *w) { struct kmem_cache *searchp; struct kmem_list3 *l3; int node = numa_node_id(); + struct delayed_work *work = + container_of(w, struct delayed_work, work); - if (!mutex_trylock(cache_chain_mutex)) { + if (!mutex_trylock(cache_chain_mutex)) /* Give up. Setup the next iteration. */ - schedule_delayed_work(__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); - return; - } + goto out; list_for_each_entry(searchp, cache_chain, next) { check_irq_on(); @@ -4075,9 +4074,9 @@ next: mutex_unlock(cache_chain_mutex); next_reap_node(); refresh_cpu_vm_stats(smp_processor_id()); +out: /* Set up the next iteration */ - schedule_delayed_work(__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); } #ifdef CONFIG_PROC_FS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu reap_work timer. Ahh. Okay then the following patch would fix it? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6/mm/slab.c === --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.0 -0600 +++ linux-2.6.20-rc6/mm/slab.c2007-01-29 13:08:05.773928988 -0600 @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru break; #ifdef CONFIG_HOTPLUG_CPU case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + mutex_lock(cache_chain_mutex); break; case CPU_DOWN_FAILED: Then CPU_DOWN_FAILED should do start_cpu_timer(cpu). Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq-worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper like you did in the previous patch. Did you mean this? If yes - then yes :) This means that __get_cpu_var(reap_work) returns a wrong struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). There is more where that is coming from. cache_reap determines the current cpu in order to find the correct per cpu cache and also determines the current node. If you move cache_reap to another processor / node then it will just clean that one and not do anything for the processor that you wanted it to run for. Worse, we can have 2 handlers running in parallel on the same CPU. But this is fixed by your previous patch, I believe. If we change processors in the middle of the run then it may do some actions on one cpu and some on another Yep. For example, next_reap_node() will not be happy if we change CPU in the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN, and cache_reap() should not care about this. -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *w) { struct kmem_cache *searchp; struct kmem_list3 *l3; int node = numa_node_id(); + struct delayed_work *work = + container_of(w, struct delayed_work, work); ... + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); Actually, I was wrong. Yes, this should work, but only with your previous patch. Otherwise, if the handler runs on the wrong CPU (this is not possible since you added cancel_delayed_work()), we are in fact starting a second reaper on the same CPU, not good. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Mon, 29 Jan 2007, Oleg Nesterov wrote: But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper like you did in the previous patch. Did you mean this? If yes - then yes :) Yes sure. Worse, we can have 2 handlers running in parallel on the same CPU. But this is fixed by your previous patch, I believe. Good. Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and CPU_DOWN_FAILED somehow vanished in mm? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6-mm1/mm/slab.c === --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 -0600 +++ linux-2.6.20-rc6-mm1/mm/slab.c 2007-01-29 14:21:18.119155877 -0600 @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + + mutex_lock(cache_chain_mutex); + break; + case CPU_DOWN_FAILED: + mutex_unlock(cache_chain_mutex); + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and CPU_DOWN_FAILED somehow vanished in mm? No, no, there are still in place, so I believe your patch is good. Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6-mm1/mm/slab.c === --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.0 -0600 +++ linux-2.6.20-rc6-mm1/mm/slab.c2007-01-29 14:21:18.119155877 -0600 @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + + mutex_lock(cache_chain_mutex); + break; + case CPU_DOWN_FAILED: + mutex_unlock(cache_chain_mutex); + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On Tue, 30 Jan 2007, Oleg Nesterov wrote: Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which brings us to this form of the patch: Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6-mm2/mm/slab.c === --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600 +++ linux-2.6.20-rc6-mm2/mm/slab.c 2007-01-29 15:47:18.293962726 -0600 @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + break; + case CPU_DOWN_FAILED: + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
On 01/29, Christoph Lameter wrote: On Tue, 30 Jan 2007, Oleg Nesterov wrote: Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which brings us to this form of the patch: Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted, sorry for the confusion! Note: with the patch below we are doing cancel_rearming_delayed_work() under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(), so deadlock is not possible. However, this means that mutex_trylock() becomes mandatory for cache_reap(), probably a little comment will be good. Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.20-rc6-mm2/mm/slab.c === --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600 +++ linux-2.6.20-rc6-mm2/mm/slab.c2007-01-29 15:47:18.293962726 -0600 @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + break; + case CPU_DOWN_FAILED: + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
For the beginning, about another (but related) minor problem, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when the task takes a preemption or sleeps. I think we need #ifndef here. Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = _cpu(reap_work, cpu); if (keventd_up() && reap_work->work.func == NULL) { init_reap_node(cpu); INIT_DELAYED_WORK(reap_work, cache_reap); schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu)); } } This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu "cache_reap timer". With or without recent changes, it is possible that work->func() will run on another CPU (not that to which it was submitted) if CPU goes down. In fact, this can happen while work->func() is running, so even smp_processor_id() is not safe to use in work->func(). However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, this is the second problem. Is my understanding correct? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
For the beginning, about another (but related) minor problem, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when the task takes a preemption or sleeps. I think we need #ifndef here. Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = per_cpu(reap_work, cpu); if (keventd_up() reap_work-work.func == NULL) { init_reap_node(cpu); INIT_DELAYED_WORK(reap_work, cache_reap); schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu)); } } This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. With or without recent changes, it is possible that work-func() will run on another CPU (not that to which it was submitted) if CPU goes down. In fact, this can happen while work-func() is running, so even smp_processor_id() is not safe to use in work-func(). However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, this is the second problem. Is my understanding correct? Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/