[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-04-02 Thread Balbir Singh
Pavel Emelyanov wrote:
 This allows us two things basically:
 

Pavel,

Do you have any further updates on this. I think we need a way of being able to
implement reclaim per hierarchy as mentioned earlier. Do you want me to take a
look at it?

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-13 Thread Pavel Emelyanov
YAMAMOTO Takashi wrote:
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
  int ret;
  unsigned long flags;
 +struct res_counter *c, *unroll_c;
 +
 +local_irq_save(flags);
 +for (c = counter; c != NULL; c = c-parent) {
 +spin_lock(c-lock);
 +ret = res_counter_charge_locked(c, val);
 +spin_unlock(c-lock);
 +if (ret  0)
 +goto unroll;
 +}
 +local_irq_restore(flags);
 +return 0;
  
 -spin_lock_irqsave(counter-lock, flags);
 -ret = res_counter_charge_locked(counter, val);
 -spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 +for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 +spin_lock(unroll_c-lock);
 +res_counter_uncharge_locked(unroll_c, val);
 +spin_unlock(unroll_c-lock);
 +}
 +local_irq_restore(flags);
  return ret;
  }
 
 what prevents the topology (in particular, -parent pointers) from
 changing behind us?

The res_counter client must provide this. Currently cgroup subsystem does this.

 YAMAMOTO Takashi
 

___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-12 Thread YAMAMOTO Takashi
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
   int ret;
   unsigned long flags;
 + struct res_counter *c, *unroll_c;
 +
 + local_irq_save(flags);
 + for (c = counter; c != NULL; c = c-parent) {
 + spin_lock(c-lock);
 + ret = res_counter_charge_locked(c, val);
 + spin_unlock(c-lock);
 + if (ret  0)
 + goto unroll;
 + }
 + local_irq_restore(flags);
 + return 0;
  
 - spin_lock_irqsave(counter-lock, flags);
 - ret = res_counter_charge_locked(counter, val);
 - spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 + spin_lock(unroll_c-lock);
 + res_counter_uncharge_locked(unroll_c, val);
 + spin_unlock(unroll_c-lock);
 + }
 + local_irq_restore(flags);
   return ret;
  }

what prevents the topology (in particular, -parent pointers) from
changing behind us?

YAMAMOTO Takashi
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-12 Thread YAMAMOTO Takashi
  @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
  unsigned long val)
   {
  int ret;
  unsigned long flags;
  +   struct res_counter *c, *unroll_c;
  +
  +   local_irq_save(flags);
  +   for (c = counter; c != NULL; c = c-parent) {
  +   spin_lock(c-lock);
  +   ret = res_counter_charge_locked(c, val);
  +   spin_unlock(c-lock);
  +   if (ret  0)
  +   goto unroll;
  +   }
  +   local_irq_restore(flags);
  +   return 0;
   
  -   spin_lock_irqsave(counter-lock, flags);
  -   ret = res_counter_charge_locked(counter, val);
  -   spin_unlock_irqrestore(counter-lock, flags);
  +unroll:
  +   for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
  +   spin_lock(unroll_c-lock);
  +   res_counter_uncharge_locked(unroll_c, val);
  +   spin_unlock(unroll_c-lock);
  +   }
  +   local_irq_restore(flags);
  return ret;
   }
 
 what prevents the topology (in particular, -parent pointers) from
 changing behind us?
 
 YAMAMOTO Takashi

to answer myself: cgroupfs rename doesn't allow topological changes
in the first place.

btw, i think you need to do the same for res_counter_limit_check_locked
as well.  i'm skeptical about doing these complicated stuffs in kernel,
esp. in this potentially performance critical code.

YAMAMOTO Takashi
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread Balbir Singh
Pavel Emelyanov wrote:
 Balbir Singh wrote:
 Pavel Emelyanov wrote:
 This allows us two things basically:

 1. If the subgroup has the limit higher than its parent has
then the one will get more memory than allowed.
 But should we allow such configuration? I suspect that we should catch such
 things at the time of writing the limit.
 
 We cannot catch this at the limit-set-time. See, if you have a cgroup A
 with a 1GB limit and the usage is 999Mb, then creating a subgroup B with
 even 500MB limit will cause the A group consume 1.5GB of memory
 effectively.
 

No... If you propagate the charge of the child up to the parent, then it won't.
If each page charged to a child is also charged to the parent, this cannot
happen. The code you have below does that right?

 2. When we will need to account for a resource in more than
one place, we'll be able to use this technics.

Look, consider we have a memory limit and swap limit. The
memory limit is the limit for the sum of RSS, page cache
and swap usage. To account for this gracefuly, we'll set
two counters:

res_counter mem_counter;
res_counter swap_counter;

attach mm to the swap one

mm-mem_cnt = swap_counter;

and make the swap_counter be mem's child. That's it. If we
want hierarchical support, then the tree will look like this:

mem_counter_top
 swap_counter_top - mm_struct living at top
  mem_counter_sub
   swap_counter_sub - mm_struct living at sub

 Hmm... not sure about this one. What I want to see is a resource counter
 hierarchy to mimic the container hierarchy. Then ensure that all limits are 
 set
 sanely. I am planning to implement shares support on to of resource counters.


 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

 ---
  include/linux/res_counter.h |   11 ++-
  kernel/res_counter.c|   36 +---
  mm/memcontrol.c |9 ++---
  3 files changed, 45 insertions(+), 11 deletions(-)

 diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
 index 2c4deb5..a27105e 100644
 --- a/include/linux/res_counter.h
 +++ b/include/linux/res_counter.h
 @@ -41,6 +41,10 @@ struct res_counter {
  * the routines below consider this to be IRQ-safe
  */
 spinlock_t lock;
 +   /*
 +* the parent counter. used for hierarchical resource accounting
 +*/
 +   struct res_counter *parent;
  };

  /**
 @@ -80,7 +84,12 @@ enum {
   * helpers for accounting
   */

 -void res_counter_init(struct res_counter *counter);
 +/*
 + * the parent pointer is set only once - during the counter
 + * initialization. caller then must itself provide that this
 + * pointer is valid during the new counter lifetime
 + */
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent);

  /*
   * charge - try to consume more resource.
 diff --git a/kernel/res_counter.c b/kernel/res_counter.c
 index f1f20c2..046f6f4 100644
 --- a/kernel/res_counter.c
 +++ b/kernel/res_counter.c
 @@ -13,10 +13,11 @@
  #include linux/res_counter.h
  #include linux/uaccess.h

 -void res_counter_init(struct res_counter *counter)
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent)
  {
 spin_lock_init(counter-lock);
 counter-limit = (unsigned long long)LLONG_MAX;
 +   counter-parent = parent;
  }

  int res_counter_charge_locked(struct res_counter *counter, unsigned long 
 val)
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
 int ret;
 unsigned long flags;
 +   struct res_counter *c, *unroll_c;
 +
 +   local_irq_save(flags);
 +   for (c = counter; c != NULL; c = c-parent) {
 +   spin_lock(c-lock);
 +   ret = res_counter_charge_locked(c, val);
 +   spin_unlock(c-lock);
 +   if (ret  0)
 +   goto unroll;
 We'd like to know which resource counter failed to allow charging, so that we
 can reclaim from that mem_res_cgroup.


This is also important, so that we can reclaim from the nodes that go over their
limit.

 +   }
 +   local_irq_restore(flags);
 +   return 0;

 -   spin_lock_irqsave(counter-lock, flags);
 -   ret = res_counter_charge_locked(counter, val);
 -   spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 +   for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 +   spin_lock(unroll_c-lock);
 +   res_counter_uncharge_locked(unroll_c, val);
 +   spin_unlock(unroll_c-lock);
 +   }
 +   local_irq_restore(flags);
 return ret;
  }

 @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter 
 *counter, unsigned long val)
  void res_counter_uncharge(struct res_counter *counter, unsigned long val)
  {
 unsigned long flags;
 +   struct res_counter *c;

 -   spin_lock_irqsave(counter-lock, flags);
 -   res_counter_uncharge_locked(counter, val);
 -   spin_unlock_irqrestore(counter-lock, flags);
 +   local_irq_save(flags);
 + 

[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread Pavel Emelyanov
Balbir Singh wrote:
 Pavel Emelyanov wrote:
 This allows us two things basically:

 1. If the subgroup has the limit higher than its parent has
then the one will get more memory than allowed.
 
 But should we allow such configuration? I suspect that we should catch such
 things at the time of writing the limit.

We cannot catch this at the limit-set-time. See, if you have a cgroup A
with a 1GB limit and the usage is 999Mb, then creating a subgroup B with
even 500MB limit will cause the A group consume 1.5GB of memory
effectively.

 2. When we will need to account for a resource in more than
one place, we'll be able to use this technics.

Look, consider we have a memory limit and swap limit. The
memory limit is the limit for the sum of RSS, page cache
and swap usage. To account for this gracefuly, we'll set
two counters:

 res_counter mem_counter;
 res_counter swap_counter;

attach mm to the swap one

 mm-mem_cnt = swap_counter;

and make the swap_counter be mem's child. That's it. If we
want hierarchical support, then the tree will look like this:

mem_counter_top
 swap_counter_top - mm_struct living at top
  mem_counter_sub
   swap_counter_sub - mm_struct living at sub

 
 Hmm... not sure about this one. What I want to see is a resource counter
 hierarchy to mimic the container hierarchy. Then ensure that all limits are 
 set
 sanely. I am planning to implement shares support on to of resource counters.
 
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

 ---
  include/linux/res_counter.h |   11 ++-
  kernel/res_counter.c|   36 +---
  mm/memcontrol.c |9 ++---
  3 files changed, 45 insertions(+), 11 deletions(-)

 diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
 index 2c4deb5..a27105e 100644
 --- a/include/linux/res_counter.h
 +++ b/include/linux/res_counter.h
 @@ -41,6 +41,10 @@ struct res_counter {
   * the routines below consider this to be IRQ-safe
   */
  spinlock_t lock;
 +/*
 + * the parent counter. used for hierarchical resource accounting
 + */
 +struct res_counter *parent;
  };

  /**
 @@ -80,7 +84,12 @@ enum {
   * helpers for accounting
   */

 -void res_counter_init(struct res_counter *counter);
 +/*
 + * the parent pointer is set only once - during the counter
 + * initialization. caller then must itself provide that this
 + * pointer is valid during the new counter lifetime
 + */
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent);

  /*
   * charge - try to consume more resource.
 diff --git a/kernel/res_counter.c b/kernel/res_counter.c
 index f1f20c2..046f6f4 100644
 --- a/kernel/res_counter.c
 +++ b/kernel/res_counter.c
 @@ -13,10 +13,11 @@
  #include linux/res_counter.h
  #include linux/uaccess.h

 -void res_counter_init(struct res_counter *counter)
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent)
  {
  spin_lock_init(counter-lock);
  counter-limit = (unsigned long long)LLONG_MAX;
 +counter-parent = parent;
  }

  int res_counter_charge_locked(struct res_counter *counter, unsigned long 
 val)
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
  int ret;
  unsigned long flags;
 +struct res_counter *c, *unroll_c;
 +
 +local_irq_save(flags);
 +for (c = counter; c != NULL; c = c-parent) {
 +spin_lock(c-lock);
 +ret = res_counter_charge_locked(c, val);
 +spin_unlock(c-lock);
 +if (ret  0)
 +goto unroll;
 
 We'd like to know which resource counter failed to allow charging, so that we
 can reclaim from that mem_res_cgroup.
 
 +}
 +local_irq_restore(flags);
 +return 0;

 -spin_lock_irqsave(counter-lock, flags);
 -ret = res_counter_charge_locked(counter, val);
 -spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 +for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 +spin_lock(unroll_c-lock);
 +res_counter_uncharge_locked(unroll_c, val);
 +spin_unlock(unroll_c-lock);
 +}
 +local_irq_restore(flags);
  return ret;
  }

 @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter 
 *counter, unsigned long val)
  void res_counter_uncharge(struct res_counter *counter, unsigned long val)
  {
  unsigned long flags;
 +struct res_counter *c;

 -spin_lock_irqsave(counter-lock, flags);
 -res_counter_uncharge_locked(counter, val);
 -spin_unlock_irqrestore(counter-lock, flags);
 +local_irq_save(flags);
 +for (c = counter; c != NULL; c = c-parent) {
 +spin_lock(c-lock);
 +res_counter_uncharge_locked(c, val);
 +spin_unlock(c-lock);
 +}
 +local_irq_restore(flags);
  }


 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index 

[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread Pavel Emelyanov
Balbir Singh wrote:
 Pavel Emelyanov wrote:
 Balbir Singh wrote:
 Pavel Emelyanov wrote:
 This allows us two things basically:

 1. If the subgroup has the limit higher than its parent has
then the one will get more memory than allowed.
 But should we allow such configuration? I suspect that we should catch such
 things at the time of writing the limit.
 We cannot catch this at the limit-set-time. See, if you have a cgroup A
 with a 1GB limit and the usage is 999Mb, then creating a subgroup B with
 even 500MB limit will cause the A group consume 1.5GB of memory
 effectively.

 
 No... If you propagate the charge of the child up to the parent, then it 
 won't.
 If each page charged to a child is also charged to the parent, this cannot
 happen. The code you have below does that right?

Yup! What you described is available with this patch only.

 2. When we will need to account for a resource in more than
one place, we'll be able to use this technics.

Look, consider we have a memory limit and swap limit. The
memory limit is the limit for the sum of RSS, page cache
and swap usage. To account for this gracefuly, we'll set
two counters:

   res_counter mem_counter;
   res_counter swap_counter;

attach mm to the swap one

   mm-mem_cnt = swap_counter;

and make the swap_counter be mem's child. That's it. If we
want hierarchical support, then the tree will look like this:

mem_counter_top
 swap_counter_top - mm_struct living at top
  mem_counter_sub
   swap_counter_sub - mm_struct living at sub

 Hmm... not sure about this one. What I want to see is a resource counter
 hierarchy to mimic the container hierarchy. Then ensure that all limits are 
 set
 sanely. I am planning to implement shares support on to of resource 
 counters.


 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

 ---
  include/linux/res_counter.h |   11 ++-
  kernel/res_counter.c|   36 +---
  mm/memcontrol.c |9 ++---
  3 files changed, 45 insertions(+), 11 deletions(-)

 diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
 index 2c4deb5..a27105e 100644
 --- a/include/linux/res_counter.h
 +++ b/include/linux/res_counter.h
 @@ -41,6 +41,10 @@ struct res_counter {
 * the routines below consider this to be IRQ-safe
 */
spinlock_t lock;
 +  /*
 +   * the parent counter. used for hierarchical resource accounting
 +   */
 +  struct res_counter *parent;
  };

  /**
 @@ -80,7 +84,12 @@ enum {
   * helpers for accounting
   */

 -void res_counter_init(struct res_counter *counter);
 +/*
 + * the parent pointer is set only once - during the counter
 + * initialization. caller then must itself provide that this
 + * pointer is valid during the new counter lifetime
 + */
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent);

  /*
   * charge - try to consume more resource.
 diff --git a/kernel/res_counter.c b/kernel/res_counter.c
 index f1f20c2..046f6f4 100644
 --- a/kernel/res_counter.c
 +++ b/kernel/res_counter.c
 @@ -13,10 +13,11 @@
  #include linux/res_counter.h
  #include linux/uaccess.h

 -void res_counter_init(struct res_counter *counter)
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent)
  {
spin_lock_init(counter-lock);
counter-limit = (unsigned long long)LLONG_MAX;
 +  counter-parent = parent;
  }

  int res_counter_charge_locked(struct res_counter *counter, unsigned long 
 val)
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
int ret;
unsigned long flags;
 +  struct res_counter *c, *unroll_c;
 +
 +  local_irq_save(flags);
 +  for (c = counter; c != NULL; c = c-parent) {
 +  spin_lock(c-lock);
 +  ret = res_counter_charge_locked(c, val);
 +  spin_unlock(c-lock);
 +  if (ret  0)
 +  goto unroll;
 We'd like to know which resource counter failed to allow charging, so that 
 we
 can reclaim from that mem_res_cgroup.

 
 This is also important, so that we can reclaim from the nodes that go over 
 their
 limit.

Agree. I'll think over how to provide this facility.

 +  }
 +  local_irq_restore(flags);
 +  return 0;

 -  spin_lock_irqsave(counter-lock, flags);
 -  ret = res_counter_charge_locked(counter, val);
 -  spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 +  for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 +  spin_lock(unroll_c-lock);
 +  res_counter_uncharge_locked(unroll_c, val);
 +  spin_unlock(unroll_c-lock);
 +  }
 +  local_irq_restore(flags);
return ret;
  }

 @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter 
 *counter, unsigned long val)
  void res_counter_uncharge(struct res_counter *counter, unsigned long val)
  {
unsigned long flags;
 +  struct res_counter *c;

 -  spin_lock_irqsave(counter-lock, flags);
 -  

[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread Paul Menage
On Tue, Mar 11, 2008 at 1:15 AM, Pavel Emelyanov [EMAIL PROTECTED] wrote:

  mem_couter_0
   + -- swap_counter_0
   + -- mem_counter_1
   | + -- swap_counter_1
   | + -- mem_counter_11
   | | + -- swap_counter_11
   | + -- mem_counter_12
   |   + -- swap_counter_12
   + -- mem_counter_2
   | + -- swap_counter_2
   | + -- mem_counter_21
   | | + -- swap_counter_21
   | + -- mem_counter_22
   |   + -- swap_counter_22
   + -- mem_counter_N
+ -- swap_counter_N
+ -- mem_counter_N1
| + -- swap_counter_N1
+ -- mem_counter_N2
  + -- swap_counter_N2


The idea of hierarchy is good, but I don't think this particular
hierarchy works for memory.

Main memory and swap space are very different resources, with very
different performance characteristics. Suppose you have a 2G machine,
and you want to guarantee each job 1GB of main memory, plus give them
the option of 1GB of swap for when they go over the 1G main memory
limit. With the hierarchy given above, you've need to give each job a
2GB mem.limit and a 1GB swap.limit, and so there would be no main
memory isolation.

My feeling is that people are going to want to limit swap and main
memory usage as two independent resource hierarchies more often than
they're going to want to limit overall virtual memory. But assuming
that there are people who need to do the latter, then you should make
it configurable how the hierarchies fit together.

Alternatively, you could make it possible for a res_counter to have
multiple parents (each of which constrains the overall usage of it and
its siblings), and have three counters for each cgroup:

- vm_counter: overall virtual memory limit for group, parent =
parent_mem_cgroup-vm_counter

- mem_counter: main memory limit for group, parents = vm_counter,
parent_mem_cgroup-mem_counter

- swap_counter: swap limit for group, parents = vm_counter,
parent_mem_cgroup-swap_counter

Paul
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread KAMEZAWA Hiroyuki
On Tue, 11 Mar 2008 14:46:58 +0530
Balbir Singh [EMAIL PROTECTED] wrote:

 Paul Menage wrote:
  On Tue, Mar 11, 2008 at 2:13 AM, KAMEZAWA Hiroyuki
  [EMAIL PROTECTED] wrote:
   or remove all relationship among counters of *different* type of 
  resources.
   user-land-daemon will do enough jobs.
 
  
  Yes, that would be my preferred choice, if people agree that
  hierarchically limiting overall virtual memory isn't useful. (I don't
  think I have a use for it myself).
  
 
 Virtual limits are very useful. I have a patch ready to send out.
 They limit the amount of paging a cgroup can do (virtual limit - RSS limit).
 Some times end users want to set virtual limit == RSS limit, so that the 
 cgroup
 OOMs on cross the RSS limit.
 
I have no objection to adding virtual limit itself.
(It can be considered as extended ulimit.)

But if you'd like to add relationship between virtual-limit/memory-usage-limit,
please take care to make it clear that relationship is reaseonable.

- memory-usage includes page-cache.
- memory-usage doesn't include hugepages.
- How to treat MAP_NORESERVE is depends on over-commit-memory type.
  how cgroup does ?
- shared memory will be conuted per mmap.


Thanks,
-Kame

___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-11 Thread Balbir Singh
Paul Menage wrote:
 On Tue, Mar 11, 2008 at 2:16 AM, Balbir Singh [EMAIL PROTECTED] wrote:
 Paul Menage wrote:
   On Tue, Mar 11, 2008 at 2:13 AM, KAMEZAWA Hiroyuki
   [EMAIL PROTECTED] wrote:
or remove all relationship among counters of *different* type of 
 resources.
user-land-daemon will do enough jobs.
  
  
   Yes, that would be my preferred choice, if people agree that
   hierarchically limiting overall virtual memory isn't useful. (I don't
   think I have a use for it myself).
  

  Virtual limits are very useful. I have a patch ready to send out.
  They limit the amount of paging a cgroup can do (virtual limit - RSS limit).
 
 Ah, from this should I assume that you're talking about virtual
 address space limits, not virtual memory limits?
 
 My comment above was referring to Pavel's proposal to limit total
 virtual memory (RAM + swap) for a cgroup, and then limit swap as a
 subset of that, which basically makes it impossible to limit the RAM
 usage of cgroups properly if you also want to allow swap usage.
 
 Virtual address space limits are somewhat orthogonal to that.
 


Yes, I was referring to Virtual address limits (along the lines of RLIMIT_AS). I
guess it's just confusing terminology. I have patches for Virtual address
limits. I should send them out soon.


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 2/2] Make res_counter hierarchical

2008-03-08 Thread Balbir Singh
Pavel Emelyanov wrote:
 This allows us two things basically:
 
 1. If the subgroup has the limit higher than its parent has
then the one will get more memory than allowed.

But should we allow such configuration? I suspect that we should catch such
things at the time of writing the limit.

 2. When we will need to account for a resource in more than
one place, we'll be able to use this technics.
 
Look, consider we have a memory limit and swap limit. The
memory limit is the limit for the sum of RSS, page cache
and swap usage. To account for this gracefuly, we'll set
two counters:
 
  res_counter mem_counter;
  res_counter swap_counter;
 
attach mm to the swap one
 
  mm-mem_cnt = swap_counter;
 
and make the swap_counter be mem's child. That's it. If we
want hierarchical support, then the tree will look like this:
 
mem_counter_top
 swap_counter_top - mm_struct living at top
  mem_counter_sub
   swap_counter_sub - mm_struct living at sub
 

Hmm... not sure about this one. What I want to see is a resource counter
hierarchy to mimic the container hierarchy. Then ensure that all limits are set
sanely. I am planning to implement shares support on to of resource counters.


 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 
 ---
  include/linux/res_counter.h |   11 ++-
  kernel/res_counter.c|   36 +---
  mm/memcontrol.c |9 ++---
  3 files changed, 45 insertions(+), 11 deletions(-)
 
 diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
 index 2c4deb5..a27105e 100644
 --- a/include/linux/res_counter.h
 +++ b/include/linux/res_counter.h
 @@ -41,6 +41,10 @@ struct res_counter {
* the routines below consider this to be IRQ-safe
*/
   spinlock_t lock;
 + /*
 +  * the parent counter. used for hierarchical resource accounting
 +  */
 + struct res_counter *parent;
  };
 
  /**
 @@ -80,7 +84,12 @@ enum {
   * helpers for accounting
   */
 
 -void res_counter_init(struct res_counter *counter);
 +/*
 + * the parent pointer is set only once - during the counter
 + * initialization. caller then must itself provide that this
 + * pointer is valid during the new counter lifetime
 + */
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent);
 
  /*
   * charge - try to consume more resource.
 diff --git a/kernel/res_counter.c b/kernel/res_counter.c
 index f1f20c2..046f6f4 100644
 --- a/kernel/res_counter.c
 +++ b/kernel/res_counter.c
 @@ -13,10 +13,11 @@
  #include linux/res_counter.h
  #include linux/uaccess.h
 
 -void res_counter_init(struct res_counter *counter)
 +void res_counter_init(struct res_counter *counter, struct res_counter 
 *parent)
  {
   spin_lock_init(counter-lock);
   counter-limit = (unsigned long long)LLONG_MAX;
 + counter-parent = parent;
  }
 
  int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 @@ -36,10 +37,26 @@ int res_counter_charge(struct res_counter *counter, 
 unsigned long val)
  {
   int ret;
   unsigned long flags;
 + struct res_counter *c, *unroll_c;
 +
 + local_irq_save(flags);
 + for (c = counter; c != NULL; c = c-parent) {
 + spin_lock(c-lock);
 + ret = res_counter_charge_locked(c, val);
 + spin_unlock(c-lock);
 + if (ret  0)
 + goto unroll;

We'd like to know which resource counter failed to allow charging, so that we
can reclaim from that mem_res_cgroup.

 + }
 + local_irq_restore(flags);
 + return 0;
 
 - spin_lock_irqsave(counter-lock, flags);
 - ret = res_counter_charge_locked(counter, val);
 - spin_unlock_irqrestore(counter-lock, flags);
 +unroll:
 + for (unroll_c = counter; unroll_c != c; unroll_c = unroll_c-parent) {
 + spin_lock(unroll_c-lock);
 + res_counter_uncharge_locked(unroll_c, val);
 + spin_unlock(unroll_c-lock);
 + }
 + local_irq_restore(flags);
   return ret;
  }
 
 @@ -54,10 +71,15 @@ void res_counter_uncharge_locked(struct res_counter 
 *counter, unsigned long val)
  void res_counter_uncharge(struct res_counter *counter, unsigned long val)
  {
   unsigned long flags;
 + struct res_counter *c;
 
 - spin_lock_irqsave(counter-lock, flags);
 - res_counter_uncharge_locked(counter, val);
 - spin_unlock_irqrestore(counter-lock, flags);
 + local_irq_save(flags);
 + for (c = counter; c != NULL; c = c-parent) {
 + spin_lock(c-lock);
 + res_counter_uncharge_locked(c, val);
 + spin_unlock(c-lock);
 + }
 + local_irq_restore(flags);
  }
 
 
 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index e5c741a..61db79c 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -976,19 +976,22 @@ static void free_mem_cgroup_per_zone_info(struct 
 mem_cgroup *mem, int node)
  static struct