[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Pavel Emelyanov wrote: This allows us two things basically: Pavel, Do you have any further updates on this. I think we need a way of being able to implement reclaim per hierarchy as mentioned earlier. Do you want me to take a look at it? -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
YAMAMOTO Takashi wrote: @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; +struct res_counter *c, *unroll_c; + +local_irq_save(flags); +for (c = counter; c != NULL; c = c-parent) { +spin_lock(c-lock); +ret = res_counter_charge_locked(c, val); +spin_unlock(c-lock); +if (ret 0) +goto unroll; +} +local_irq_restore(flags); +return 0; -spin_lock_irqsave(counter-lock, flags); -ret = res_counter_charge_locked(counter, val); -spin_unlock_irqrestore(counter-lock, flags); +unroll: +for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { +spin_lock(unroll_c-lock); +res_counter_uncharge_locked(unroll_c, val); +spin_unlock(unroll_c-lock); +} +local_irq_restore(flags); return ret; } what prevents the topology (in particular, -parent pointers) from changing behind us? The res_counter client must provide this. Currently cgroup subsystem does this. YAMAMOTO Takashi ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
@@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; + struct res_counter *c, *unroll_c; + + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + ret = res_counter_charge_locked(c, val); + spin_unlock(c-lock); + if (ret 0) + goto unroll; + } + local_irq_restore(flags); + return 0; - spin_lock_irqsave(counter-lock, flags); - ret = res_counter_charge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); +unroll: + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { + spin_lock(unroll_c-lock); + res_counter_uncharge_locked(unroll_c, val); + spin_unlock(unroll_c-lock); + } + local_irq_restore(flags); return ret; } what prevents the topology (in particular, -parent pointers) from changing behind us? YAMAMOTO Takashi ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
@@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; + struct res_counter *c, *unroll_c; + + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + ret = res_counter_charge_locked(c, val); + spin_unlock(c-lock); + if (ret 0) + goto unroll; + } + local_irq_restore(flags); + return 0; - spin_lock_irqsave(counter-lock, flags); - ret = res_counter_charge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); +unroll: + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { + spin_lock(unroll_c-lock); + res_counter_uncharge_locked(unroll_c, val); + spin_unlock(unroll_c-lock); + } + local_irq_restore(flags); return ret; } what prevents the topology (in particular, -parent pointers) from changing behind us? YAMAMOTO Takashi to answer myself: cgroupfs rename doesn't allow topological changes in the first place. btw, i think you need to do the same for res_counter_limit_check_locked as well. i'm skeptical about doing these complicated stuffs in kernel, esp. in this potentially performance critical code. YAMAMOTO Takashi ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Pavel Emelyanov wrote: Balbir Singh wrote: Pavel Emelyanov wrote: This allows us two things basically: 1. If the subgroup has the limit higher than its parent has then the one will get more memory than allowed. But should we allow such configuration? I suspect that we should catch such things at the time of writing the limit. We cannot catch this at the limit-set-time. See, if you have a cgroup A with a 1GB limit and the usage is 999Mb, then creating a subgroup B with even 500MB limit will cause the A group consume 1.5GB of memory effectively. No... If you propagate the charge of the child up to the parent, then it won't. If each page charged to a child is also charged to the parent, this cannot happen. The code you have below does that right? 2. When we will need to account for a resource in more than one place, we'll be able to use this technics. Look, consider we have a memory limit and swap limit. The memory limit is the limit for the sum of RSS, page cache and swap usage. To account for this gracefuly, we'll set two counters: res_counter mem_counter; res_counter swap_counter; attach mm to the swap one mm-mem_cnt = swap_counter; and make the swap_counter be mem's child. That's it. If we want hierarchical support, then the tree will look like this: mem_counter_top swap_counter_top - mm_struct living at top mem_counter_sub swap_counter_sub - mm_struct living at sub Hmm... not sure about this one. What I want to see is a resource counter hierarchy to mimic the container hierarchy. Then ensure that all limits are set sanely. I am planning to implement shares support on to of resource counters. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/linux/res_counter.h | 11 ++- kernel/res_counter.c| 36 +--- mm/memcontrol.c |9 ++--- 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 2c4deb5..a27105e 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -41,6 +41,10 @@ struct res_counter { * the routines below consider this to be IRQ-safe */ spinlock_t lock; + /* +* the parent counter. used for hierarchical resource accounting +*/ + struct res_counter *parent; }; /** @@ -80,7 +84,12 @@ enum { * helpers for accounting */ -void res_counter_init(struct res_counter *counter); +/* + * the parent pointer is set only once - during the counter + * initialization. caller then must itself provide that this + * pointer is valid during the new counter lifetime + */ +void res_counter_init(struct res_counter *counter, struct res_counter *parent); /* * charge - try to consume more resource. diff --git a/kernel/res_counter.c b/kernel/res_counter.c index f1f20c2..046f6f4 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -13,10 +13,11 @@ #include linux/res_counter.h #include linux/uaccess.h -void res_counter_init(struct res_counter *counter) +void res_counter_init(struct res_counter *counter, struct res_counter *parent) { spin_lock_init(counter-lock); counter-limit = (unsigned long long)LLONG_MAX; + counter-parent = parent; } int res_counter_charge_locked(struct res_counter *counter, unsigned long val) @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; + struct res_counter *c, *unroll_c; + + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + ret = res_counter_charge_locked(c, val); + spin_unlock(c-lock); + if (ret 0) + goto unroll; We'd like to know which resource counter failed to allow charging, so that we can reclaim from that mem_res_cgroup. This is also important, so that we can reclaim from the nodes that go over their limit. + } + local_irq_restore(flags); + return 0; - spin_lock_irqsave(counter-lock, flags); - ret = res_counter_charge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); +unroll: + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { + spin_lock(unroll_c-lock); + res_counter_uncharge_locked(unroll_c, val); + spin_unlock(unroll_c-lock); + } + local_irq_restore(flags); return ret; } @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) void res_counter_uncharge(struct res_counter *counter, unsigned long val) { unsigned long flags; + struct res_counter *c; - spin_lock_irqsave(counter-lock, flags); - res_counter_uncharge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); + local_irq_save(flags); +
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Balbir Singh wrote: Pavel Emelyanov wrote: This allows us two things basically: 1. If the subgroup has the limit higher than its parent has then the one will get more memory than allowed. But should we allow such configuration? I suspect that we should catch such things at the time of writing the limit. We cannot catch this at the limit-set-time. See, if you have a cgroup A with a 1GB limit and the usage is 999Mb, then creating a subgroup B with even 500MB limit will cause the A group consume 1.5GB of memory effectively. 2. When we will need to account for a resource in more than one place, we'll be able to use this technics. Look, consider we have a memory limit and swap limit. The memory limit is the limit for the sum of RSS, page cache and swap usage. To account for this gracefuly, we'll set two counters: res_counter mem_counter; res_counter swap_counter; attach mm to the swap one mm-mem_cnt = swap_counter; and make the swap_counter be mem's child. That's it. If we want hierarchical support, then the tree will look like this: mem_counter_top swap_counter_top - mm_struct living at top mem_counter_sub swap_counter_sub - mm_struct living at sub Hmm... not sure about this one. What I want to see is a resource counter hierarchy to mimic the container hierarchy. Then ensure that all limits are set sanely. I am planning to implement shares support on to of resource counters. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/linux/res_counter.h | 11 ++- kernel/res_counter.c| 36 +--- mm/memcontrol.c |9 ++--- 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 2c4deb5..a27105e 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -41,6 +41,10 @@ struct res_counter { * the routines below consider this to be IRQ-safe */ spinlock_t lock; +/* + * the parent counter. used for hierarchical resource accounting + */ +struct res_counter *parent; }; /** @@ -80,7 +84,12 @@ enum { * helpers for accounting */ -void res_counter_init(struct res_counter *counter); +/* + * the parent pointer is set only once - during the counter + * initialization. caller then must itself provide that this + * pointer is valid during the new counter lifetime + */ +void res_counter_init(struct res_counter *counter, struct res_counter *parent); /* * charge - try to consume more resource. diff --git a/kernel/res_counter.c b/kernel/res_counter.c index f1f20c2..046f6f4 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -13,10 +13,11 @@ #include linux/res_counter.h #include linux/uaccess.h -void res_counter_init(struct res_counter *counter) +void res_counter_init(struct res_counter *counter, struct res_counter *parent) { spin_lock_init(counter-lock); counter-limit = (unsigned long long)LLONG_MAX; +counter-parent = parent; } int res_counter_charge_locked(struct res_counter *counter, unsigned long val) @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; +struct res_counter *c, *unroll_c; + +local_irq_save(flags); +for (c = counter; c != NULL; c = c-parent) { +spin_lock(c-lock); +ret = res_counter_charge_locked(c, val); +spin_unlock(c-lock); +if (ret 0) +goto unroll; We'd like to know which resource counter failed to allow charging, so that we can reclaim from that mem_res_cgroup. +} +local_irq_restore(flags); +return 0; -spin_lock_irqsave(counter-lock, flags); -ret = res_counter_charge_locked(counter, val); -spin_unlock_irqrestore(counter-lock, flags); +unroll: +for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { +spin_lock(unroll_c-lock); +res_counter_uncharge_locked(unroll_c, val); +spin_unlock(unroll_c-lock); +} +local_irq_restore(flags); return ret; } @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) void res_counter_uncharge(struct res_counter *counter, unsigned long val) { unsigned long flags; +struct res_counter *c; -spin_lock_irqsave(counter-lock, flags); -res_counter_uncharge_locked(counter, val); -spin_unlock_irqrestore(counter-lock, flags); +local_irq_save(flags); +for (c = counter; c != NULL; c = c-parent) { +spin_lock(c-lock); +res_counter_uncharge_locked(c, val); +spin_unlock(c-lock); +} +local_irq_restore(flags); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Balbir Singh wrote: Pavel Emelyanov wrote: Balbir Singh wrote: Pavel Emelyanov wrote: This allows us two things basically: 1. If the subgroup has the limit higher than its parent has then the one will get more memory than allowed. But should we allow such configuration? I suspect that we should catch such things at the time of writing the limit. We cannot catch this at the limit-set-time. See, if you have a cgroup A with a 1GB limit and the usage is 999Mb, then creating a subgroup B with even 500MB limit will cause the A group consume 1.5GB of memory effectively. No... If you propagate the charge of the child up to the parent, then it won't. If each page charged to a child is also charged to the parent, this cannot happen. The code you have below does that right? Yup! What you described is available with this patch only. 2. When we will need to account for a resource in more than one place, we'll be able to use this technics. Look, consider we have a memory limit and swap limit. The memory limit is the limit for the sum of RSS, page cache and swap usage. To account for this gracefuly, we'll set two counters: res_counter mem_counter; res_counter swap_counter; attach mm to the swap one mm-mem_cnt = swap_counter; and make the swap_counter be mem's child. That's it. If we want hierarchical support, then the tree will look like this: mem_counter_top swap_counter_top - mm_struct living at top mem_counter_sub swap_counter_sub - mm_struct living at sub Hmm... not sure about this one. What I want to see is a resource counter hierarchy to mimic the container hierarchy. Then ensure that all limits are set sanely. I am planning to implement shares support on to of resource counters. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/linux/res_counter.h | 11 ++- kernel/res_counter.c| 36 +--- mm/memcontrol.c |9 ++--- 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 2c4deb5..a27105e 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -41,6 +41,10 @@ struct res_counter { * the routines below consider this to be IRQ-safe */ spinlock_t lock; + /* + * the parent counter. used for hierarchical resource accounting + */ + struct res_counter *parent; }; /** @@ -80,7 +84,12 @@ enum { * helpers for accounting */ -void res_counter_init(struct res_counter *counter); +/* + * the parent pointer is set only once - during the counter + * initialization. caller then must itself provide that this + * pointer is valid during the new counter lifetime + */ +void res_counter_init(struct res_counter *counter, struct res_counter *parent); /* * charge - try to consume more resource. diff --git a/kernel/res_counter.c b/kernel/res_counter.c index f1f20c2..046f6f4 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -13,10 +13,11 @@ #include linux/res_counter.h #include linux/uaccess.h -void res_counter_init(struct res_counter *counter) +void res_counter_init(struct res_counter *counter, struct res_counter *parent) { spin_lock_init(counter-lock); counter-limit = (unsigned long long)LLONG_MAX; + counter-parent = parent; } int res_counter_charge_locked(struct res_counter *counter, unsigned long val) @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; + struct res_counter *c, *unroll_c; + + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + ret = res_counter_charge_locked(c, val); + spin_unlock(c-lock); + if (ret 0) + goto unroll; We'd like to know which resource counter failed to allow charging, so that we can reclaim from that mem_res_cgroup. This is also important, so that we can reclaim from the nodes that go over their limit. Agree. I'll think over how to provide this facility. + } + local_irq_restore(flags); + return 0; - spin_lock_irqsave(counter-lock, flags); - ret = res_counter_charge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); +unroll: + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { + spin_lock(unroll_c-lock); + res_counter_uncharge_locked(unroll_c, val); + spin_unlock(unroll_c-lock); + } + local_irq_restore(flags); return ret; } @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) void res_counter_uncharge(struct res_counter *counter, unsigned long val) { unsigned long flags; + struct res_counter *c; - spin_lock_irqsave(counter-lock, flags); -
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
On Tue, Mar 11, 2008 at 1:15 AM, Pavel Emelyanov [EMAIL PROTECTED] wrote: mem_couter_0 + -- swap_counter_0 + -- mem_counter_1 | + -- swap_counter_1 | + -- mem_counter_11 | | + -- swap_counter_11 | + -- mem_counter_12 | + -- swap_counter_12 + -- mem_counter_2 | + -- swap_counter_2 | + -- mem_counter_21 | | + -- swap_counter_21 | + -- mem_counter_22 | + -- swap_counter_22 + -- mem_counter_N + -- swap_counter_N + -- mem_counter_N1 | + -- swap_counter_N1 + -- mem_counter_N2 + -- swap_counter_N2 The idea of hierarchy is good, but I don't think this particular hierarchy works for memory. Main memory and swap space are very different resources, with very different performance characteristics. Suppose you have a 2G machine, and you want to guarantee each job 1GB of main memory, plus give them the option of 1GB of swap for when they go over the 1G main memory limit. With the hierarchy given above, you've need to give each job a 2GB mem.limit and a 1GB swap.limit, and so there would be no main memory isolation. My feeling is that people are going to want to limit swap and main memory usage as two independent resource hierarchies more often than they're going to want to limit overall virtual memory. But assuming that there are people who need to do the latter, then you should make it configurable how the hierarchies fit together. Alternatively, you could make it possible for a res_counter to have multiple parents (each of which constrains the overall usage of it and its siblings), and have three counters for each cgroup: - vm_counter: overall virtual memory limit for group, parent = parent_mem_cgroup-vm_counter - mem_counter: main memory limit for group, parents = vm_counter, parent_mem_cgroup-mem_counter - swap_counter: swap limit for group, parents = vm_counter, parent_mem_cgroup-swap_counter Paul ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
On Tue, 11 Mar 2008 14:46:58 +0530 Balbir Singh [EMAIL PROTECTED] wrote: Paul Menage wrote: On Tue, Mar 11, 2008 at 2:13 AM, KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: or remove all relationship among counters of *different* type of resources. user-land-daemon will do enough jobs. Yes, that would be my preferred choice, if people agree that hierarchically limiting overall virtual memory isn't useful. (I don't think I have a use for it myself). Virtual limits are very useful. I have a patch ready to send out. They limit the amount of paging a cgroup can do (virtual limit - RSS limit). Some times end users want to set virtual limit == RSS limit, so that the cgroup OOMs on cross the RSS limit. I have no objection to adding virtual limit itself. (It can be considered as extended ulimit.) But if you'd like to add relationship between virtual-limit/memory-usage-limit, please take care to make it clear that relationship is reaseonable. - memory-usage includes page-cache. - memory-usage doesn't include hugepages. - How to treat MAP_NORESERVE is depends on over-commit-memory type. how cgroup does ? - shared memory will be conuted per mmap. Thanks, -Kame ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Paul Menage wrote: On Tue, Mar 11, 2008 at 2:16 AM, Balbir Singh [EMAIL PROTECTED] wrote: Paul Menage wrote: On Tue, Mar 11, 2008 at 2:13 AM, KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: or remove all relationship among counters of *different* type of resources. user-land-daemon will do enough jobs. Yes, that would be my preferred choice, if people agree that hierarchically limiting overall virtual memory isn't useful. (I don't think I have a use for it myself). Virtual limits are very useful. I have a patch ready to send out. They limit the amount of paging a cgroup can do (virtual limit - RSS limit). Ah, from this should I assume that you're talking about virtual address space limits, not virtual memory limits? My comment above was referring to Pavel's proposal to limit total virtual memory (RAM + swap) for a cgroup, and then limit swap as a subset of that, which basically makes it impossible to limit the RAM usage of cgroups properly if you also want to allow swap usage. Virtual address space limits are somewhat orthogonal to that. Yes, I was referring to Virtual address limits (along the lines of RLIMIT_AS). I guess it's just confusing terminology. I have patches for Virtual address limits. I should send them out soon. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH 2/2] Make res_counter hierarchical
Pavel Emelyanov wrote: This allows us two things basically: 1. If the subgroup has the limit higher than its parent has then the one will get more memory than allowed. But should we allow such configuration? I suspect that we should catch such things at the time of writing the limit. 2. When we will need to account for a resource in more than one place, we'll be able to use this technics. Look, consider we have a memory limit and swap limit. The memory limit is the limit for the sum of RSS, page cache and swap usage. To account for this gracefuly, we'll set two counters: res_counter mem_counter; res_counter swap_counter; attach mm to the swap one mm-mem_cnt = swap_counter; and make the swap_counter be mem's child. That's it. If we want hierarchical support, then the tree will look like this: mem_counter_top swap_counter_top - mm_struct living at top mem_counter_sub swap_counter_sub - mm_struct living at sub Hmm... not sure about this one. What I want to see is a resource counter hierarchy to mimic the container hierarchy. Then ensure that all limits are set sanely. I am planning to implement shares support on to of resource counters. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/linux/res_counter.h | 11 ++- kernel/res_counter.c| 36 +--- mm/memcontrol.c |9 ++--- 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index 2c4deb5..a27105e 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -41,6 +41,10 @@ struct res_counter { * the routines below consider this to be IRQ-safe */ spinlock_t lock; + /* + * the parent counter. used for hierarchical resource accounting + */ + struct res_counter *parent; }; /** @@ -80,7 +84,12 @@ enum { * helpers for accounting */ -void res_counter_init(struct res_counter *counter); +/* + * the parent pointer is set only once - during the counter + * initialization. caller then must itself provide that this + * pointer is valid during the new counter lifetime + */ +void res_counter_init(struct res_counter *counter, struct res_counter *parent); /* * charge - try to consume more resource. diff --git a/kernel/res_counter.c b/kernel/res_counter.c index f1f20c2..046f6f4 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -13,10 +13,11 @@ #include linux/res_counter.h #include linux/uaccess.h -void res_counter_init(struct res_counter *counter) +void res_counter_init(struct res_counter *counter, struct res_counter *parent) { spin_lock_init(counter-lock); counter-limit = (unsigned long long)LLONG_MAX; + counter-parent = parent; } int res_counter_charge_locked(struct res_counter *counter, unsigned long val) @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, unsigned long val) { int ret; unsigned long flags; + struct res_counter *c, *unroll_c; + + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + ret = res_counter_charge_locked(c, val); + spin_unlock(c-lock); + if (ret 0) + goto unroll; We'd like to know which resource counter failed to allow charging, so that we can reclaim from that mem_res_cgroup. + } + local_irq_restore(flags); + return 0; - spin_lock_irqsave(counter-lock, flags); - ret = res_counter_charge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); +unroll: + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) { + spin_lock(unroll_c-lock); + res_counter_uncharge_locked(unroll_c, val); + spin_unlock(unroll_c-lock); + } + local_irq_restore(flags); return ret; } @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) void res_counter_uncharge(struct res_counter *counter, unsigned long val) { unsigned long flags; + struct res_counter *c; - spin_lock_irqsave(counter-lock, flags); - res_counter_uncharge_locked(counter, val); - spin_unlock_irqrestore(counter-lock, flags); + local_irq_save(flags); + for (c = counter; c != NULL; c = c-parent) { + spin_lock(c-lock); + res_counter_uncharge_locked(c, val); + spin_unlock(c-lock); + } + local_irq_restore(flags); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e5c741a..61db79c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -976,19 +976,22 @@ static void free_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node) static struct