[RFC][PATCH][0/4] Memory controller (RSS Control) (v2)

2007-02-25 Thread Balbir Singh
This is a repost of the patches at
http://lkml.org/lkml/2007/2/24/65
The previous post had a misleading subject which ended with a "(".


This patch applies on top of Paul Menage's container patches (V7) posted at

http://lkml.org/lkml/2007/2/12/88

It implements a controller within the containers framework for limiting
memory usage (RSS usage).

The memory controller was discussed at length in the RFC posted to lkml
http://lkml.org/lkml/2006/10/30/51

This is version 2 of the patch, version 1 was posted at
http://lkml.org/lkml/2007/2/19/10

I have tried to incorporate all comments, more details can be found
in the changelog's of induvidual patches. Any remaining mistakes are
all my fault.

The next question could be why release version 2?

1. It serves a decision point to decide if we should move to a per-container
   LRU list. Walking through the global LRU is slow, in this patchset I've
   tried to address the LRU churning issue. The patch
   memcontrol-reclaim-on-limit has more details
2. I've included fixes for several of the comments/issues raised in version 1

Steps to use the controller
--
0. Download the patches, apply the patches
1. Turn on CONFIG_CONTAINER_MEMCONTROL in kernel config, build the kernel
   and boot into the new kernel
2. mount -t container container -o memcontrol /
3. cd /
   optionally do (mkdir ; cd ) under /
4. echo $$ > tasks (attaches the current shell to the container)
5. echo -n (limit value) > memcontrol_limit
6. cat memcontrol_usage
7. Run tasks, check the usage of the controller, reclaim behaviour
8. Report bugs, get bug fixes and iterate (goto step 0).

Advantages of the patchset
--
1. Zero overhead in struct page (struct page is not expanded)
2. Minimal changes to the core-mm code
3. Shared pages are not reclaimed unless all mappings belong to overlimit
   containers.
4. It can be used to debug drivers/applications/kernel components in a
   constrained memory environment (similar to mem=XXX option), except that
   several containers can be created simultaneously without rebooting and
   the limits can be changed. NOTE: There is no support for limiting
   kernel memory allocations and page cache control (presently).

Testing
---
Created containers, attached tasks to containers with lower limits than
the memory the tasks require (memory hog tests) and ran some basic tests on
them.
Tested the patches on UML and PowerPC. On UML tried the patches with the
config enabled and disabled (sanity check) and with containers enabled
but the memory controller disabled.

TODO's and improvement areas

1. Come up with cool page replacement algorithms for containers - still holds
   good (if possible without any changes to struct page)
2. Add page cache control
3. Add kernel memory allocator control
4. Extract benchmark numbers and overhead data

Comments & criticism are welcome.

Series
--
memcontrol-setup.patch
memcontrol-acct.patch
memcontrol-reclaim-on-limit.patch
memcontrol-doc.patch

-- 
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Memcontrol patchset (was Re: [RFC][PATCH][0/4] Memory controller (RSS Control) ()

2007-02-24 Thread Balbir Singh

Hi,

My script could not parse the (#2) and posted the patches as subject
followed by "( " instead

I apologize,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH][0/4] Memory controller (RSS Control) (

2007-02-24 Thread Balbir Singh
This patch applies on top of Paul Menage's container patches (V7) posted at

http://lkml.org/lkml/2007/2/12/88

It implements a controller within the containers framework for limiting
memory usage (RSS usage).

The memory controller was discussed at length in the RFC posted to lkml
http://lkml.org/lkml/2006/10/30/51

This is version 2 of the patch, version 1 was posted at
http://lkml.org/lkml/2007/2/19/10

I have tried to incorporate all comments, more details can be found
in the changelog's of induvidual patches. Any remaining mistakes are
all my fault.

The next question could be why release version 2?

1. It serves a decision point to decide if we should move to a per-container
   LRU list. Walking through the global LRU is slow, in this patchset I've
   tried to address the LRU churning issue. The patch
   memcontrol-reclaim-on-limit has more details
2. I;ve included fixes for several of the comments/issues raised in version 1

Steps to use the controller
--
0. Download the patches, apply the patches
1. Turn on CONFIG_CONTAINER_MEMCONTROL in kernel config, build the kernel
   and boot into the new kernel
2. mount -t container container -o memcontrol /
3. cd /
   optionally do (mkdir ; cd ) under /
4. echo $$ > tasks (attaches the current shell to the container)
5. echo -n (limit value) > memcontrol_limit
6. cat memcontrol_usage
7. Run tasks, check the usage of the controller, reclaim behaviour
8. Report bugs, get bug fixes and iterate (goto step 0).

Advantages of the patchset
--
1. Zero overhead in struct page (struct page is not expanded)
2. Minimal changes to the core-mm code
3. Shared pages are not reclaimed unless all mappings belong to overlimit
   containers.
4. It can be used to debug drivers/applications/kernel components in a
   constrained memory environment (similar to mem=XXX option), except that
   several containers can be created simultaneously without rebooting and
   the limits can be changed. NOTE: There is no support for limiting
   kernel memory allocations and page cache control (presently).

Testing
---
Created containers, attached tasks to containers with lower limits than
the memory the tasks require (memory hog tests) and ran some basic tests on
them.
Tested the patches on UML and PowerPC. On UML tried the patches with the
config enabled and disabled (sanity check) and with containers enabled
but the memory controller disabled.

TODO's and improvement areas

1. Come up with cool page replacement algorithms for containers - still holds
   good (if possible without any changes to struct page)
2. Add page cache control
3. Add kernel memory allocator control
4. Extract benchmark numbers and overhead data

Comments & criticism are welcome.

Series
--
memcontrol-setup.patch
memcontrol-acct.patch
memcontrol-reclaim-on-limit.patch
memcontrol-doc.patch

-- 
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Balbir Singh

Magnus Damm wrote:

On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:

Magnus Damm wrote:
> On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
>> On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]>
>> wrote:
>>
>> > This patch applies on top of Paul Menage's container patches (V7)
>> posted at
>> >
>> >   http://lkml.org/lkml/2007/2/12/88
>> >
>> > It implements a controller within the containers framework for 
limiting

>> > memory usage (RSS usage).
>
>> The key part of this patchset is the reclaim algorithm:
>>
>> Alas, I fear this might have quite bad worst-case behaviour.  One 
small

>> container which is under constant memory pressure will churn the
>> system-wide LRUs like mad, and will consume rather a lot of system 
time.

>> So it's a point at which container A can deleteriously affect things
>> which
>> are running in other containers, which is exactly what we're 
supposed to

>> not do.
>
> Nice with a simple memory controller. The downside seems to be that it
> doesn't scale very well when it comes to reclaim, but maybe that just
> comes with being simple. Step by step, and maybe this is a good first
> step?
>

Thanks, I totally agree.

> Ideally I'd like to see unmapped pages handled on a per-container LRU
> with a fallback to the system-wide LRUs. Shared/mapped pages could be
> handled using PTE ageing/unmapping instead of page ageing, but that
> may consume too much resources to be practical.
>
> / magnus

Keeping unmapped pages per container sounds interesting. I am not quite
sure what PTE ageing, will it look it up.


You will most likely have no luck looking it up, so here is what I
mean by PTE ageing:

The most common unit for memory resource control seems to be physical
pages. Keeping track of pages is simple in the case of a single user
per page, but for shared pages tracking the owner becomes more
complex.

I consider unmapped pages to only have a single user at a time, so the
unit for unmapped memory resource control is physical pages. Apart
from implementation details such as fun with struct page and
scalability, handling this case is not so complicated.

Mapped or shared pages should be handled in a different way IMO. PTEs
should be used instead of using physical pages as unit for resource
control and reclaim. For the user this looks pretty much the same as
physical pages, apart for memory overcommit.

So instead of using a global page reclaim policy and reserving
physical pages per container I propose that resource controlled shared
pages should be handled using a PTE replacement policy. This policy is
used to keep the most active PTEs in the container backed by physical
pages. Inactive PTEs gets unmapped in favour over newer PTEs.

One way to implement this could be by populating the address space of
resource controlled processes with multiple smaller LRU2Qs. The
compact data structure that I have in mind is basically an array of
256 bytes, one byte per PTE. Associated with this data strucuture are
start indexes and lengths for two lists. The indexes are used in a
FAT-type of chain to form single linked lists. So we create active and
inactive list here - and we move PTEs between the lists when we check
the young bits from the page reclaim and when we apply memory
pressure. Unmapping is done through the normal page reclaimer but
using information from the PTE LRUs.

In my mind this should lead to more fair resource control of mapped
pages, but if it is possible to implement with low overhead, that's
another question. =)

Thanks for listening.

/ magnus



Thanks for explaining PTE aging.

--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Magnus Damm

On 2/19/07, Balbir Singh <[EMAIL PROTECTED]> wrote:

Magnus Damm wrote:
> On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
>> On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]>
>> wrote:
>>
>> > This patch applies on top of Paul Menage's container patches (V7)
>> posted at
>> >
>> >   http://lkml.org/lkml/2007/2/12/88
>> >
>> > It implements a controller within the containers framework for limiting
>> > memory usage (RSS usage).
>
>> The key part of this patchset is the reclaim algorithm:
>>
>> Alas, I fear this might have quite bad worst-case behaviour.  One small
>> container which is under constant memory pressure will churn the
>> system-wide LRUs like mad, and will consume rather a lot of system time.
>> So it's a point at which container A can deleteriously affect things
>> which
>> are running in other containers, which is exactly what we're supposed to
>> not do.
>
> Nice with a simple memory controller. The downside seems to be that it
> doesn't scale very well when it comes to reclaim, but maybe that just
> comes with being simple. Step by step, and maybe this is a good first
> step?
>

Thanks, I totally agree.

> Ideally I'd like to see unmapped pages handled on a per-container LRU
> with a fallback to the system-wide LRUs. Shared/mapped pages could be
> handled using PTE ageing/unmapping instead of page ageing, but that
> may consume too much resources to be practical.
>
> / magnus

Keeping unmapped pages per container sounds interesting. I am not quite
sure what PTE ageing, will it look it up.


You will most likely have no luck looking it up, so here is what I
mean by PTE ageing:

The most common unit for memory resource control seems to be physical
pages. Keeping track of pages is simple in the case of a single user
per page, but for shared pages tracking the owner becomes more
complex.

I consider unmapped pages to only have a single user at a time, so the
unit for unmapped memory resource control is physical pages. Apart
from implementation details such as fun with struct page and
scalability, handling this case is not so complicated.

Mapped or shared pages should be handled in a different way IMO. PTEs
should be used instead of using physical pages as unit for resource
control and reclaim. For the user this looks pretty much the same as
physical pages, apart for memory overcommit.

So instead of using a global page reclaim policy and reserving
physical pages per container I propose that resource controlled shared
pages should be handled using a PTE replacement policy. This policy is
used to keep the most active PTEs in the container backed by physical
pages. Inactive PTEs gets unmapped in favour over newer PTEs.

One way to implement this could be by populating the address space of
resource controlled processes with multiple smaller LRU2Qs. The
compact data structure that I have in mind is basically an array of
256 bytes, one byte per PTE. Associated with this data strucuture are
start indexes and lengths for two lists. The indexes are used in a
FAT-type of chain to form single linked lists. So we create active and
inactive list here - and we move PTEs between the lists when we check
the young bits from the page reclaim and when we apply memory
pressure. Unmapping is done through the normal page reclaimer but
using information from the PTE LRUs.

In my mind this should lead to more fair resource control of mapped
pages, but if it is possible to implement with low overhead, that's
another question. =)

Thanks for listening.

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Balbir Singh

Magnus Damm wrote:

On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]> 
wrote:


> This patch applies on top of Paul Menage's container patches (V7) 
posted at

>
>   http://lkml.org/lkml/2007/2/12/88
>
> It implements a controller within the containers framework for limiting
> memory usage (RSS usage).



The key part of this patchset is the reclaim algorithm:

Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time.
So it's a point at which container A can deleteriously affect things 
which

are running in other containers, which is exactly what we're supposed to
not do.


Nice with a simple memory controller. The downside seems to be that it
doesn't scale very well when it comes to reclaim, but maybe that just
comes with being simple. Step by step, and maybe this is a good first
step?



Thanks, I totally agree.


Ideally I'd like to see unmapped pages handled on a per-container LRU
with a fallback to the system-wide LRUs. Shared/mapped pages could be
handled using PTE ageing/unmapping instead of page ageing, but that
may consume too much resources to be practical.

/ magnus


Keeping unmapped pages per container sounds interesting. I am not quite
sure what PTE ageing, will it look it up.


--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Balbir Singh

Paul Menage wrote:

On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:


Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time.
So it's a point at which container A can deleteriously affect things 
which

are running in other containers, which is exactly what we're supposed to
not do.


I think it's OK for a container to consume lots of system time during
reclaim, as long as we can account that time to the container involved
(i.e. if it's done during direct reclaim rather than by something like
kswapd).

Churning the LRU could well be bad though, I agree.



I completely agree with you on reclaim consuming time.

Churning the LRU can be avoided by the means I mentioned before

1. Add a container pointer (per page struct), it is also
   useful for the page cache controller
2. Check if the page belongs to a particular container before
   the list_del(&page->lru), so that those pages can be skipped.
3. Use a double LRU list by overloading the lru list_head of
   struct page.


Paul




--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Balbir Singh

Kirill Korotaev wrote:

On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:


Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time.
So it's a point at which container A can deleteriously affect things which
are running in other containers, which is exactly what we're supposed to
not do.


I think it's OK for a container to consume lots of system time during
reclaim, as long as we can account that time to the container involved
(i.e. if it's done during direct reclaim rather than by something like
kswapd).

hmm, is it ok to scan 100Gb of RAM for 10MB RAM container?
in UBC patch set we used page beancounters to track containter pages.
This allows to make efficient scan contoler and reclamation.

Thanks,
Kirill


Hi, Kirill,

Yes, that's a problem, but I think it's a problem that can be solved
in steps. First step, add reclaim. Second step, optimize reclaim.

--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


This patch applies on top of Paul Menage's container patches (V7) posted at

http://lkml.org/lkml/2007/2/12/88

It implements a controller within the containers framework for limiting
memory usage (RSS usage).


It's good to see someone building on someone else's work for once, rather
than everyone going off in different directions.  It makes one hope that we
might actually achieve something at last.



Thanks! It's good to know we are headed in the right direction.



The key part of this patchset is the reclaim algorithm:


@@ -636,6 +642,15 @@ static unsigned long isolate_lru_pages(u
 
 		list_del(&page->lru);

target = src;
+   /*
+* For containers, do not scan the page unless it
+* belongs to the container we are reclaiming for
+*/
+   if (container && !page_in_container(page, zone, container)) {
+   scan--;
+   goto done;
+   }


Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time. 
So it's a point at which container A can deleteriously affect things which

are running in other containers, which is exactly what we're supposed to
not do.



Hmm.. I guess it's space vs time then :-) A CPU controller could
control how much time is spent reclaiming ;)

Coming back, I see the problem you mentioned and we have been thinking
of several possible solutions. In my introduction I pointed out

   "Come up with cool page replacement algorithms for containers
   (if possible without any changes to struct page)"


The solutions we have looked at are

1. Overload the LRU list_head in struct page to have a global
   LRU + a per container LRU

+--+   prev   +--+
| page +->| page +-
|  0   |<-+  1   |<
+--+   next   +--+

 Global LRU

+--+
   +--- + prev |
   || next +---+
   |+--+   |
   V^  V
+--+|  prev   +--+ +--+
| page ++ + page +-.   | page +
|  0   |->+  1   |<|  n   |
+--+   next   +--+ +--+

 Global LRU + Container LRU


Page 1 and n belong to the same container, to get to page 0, you need
two de-references


2. Modify struct page to point to a container and allow each container to
   have a per-container LRU along with the global LRU


For efficiency we need the container LRU and we don't want to split
the global LRU either.

We need to optimize the reclaim path, but I thought of that as a secondary
problem. Once we all agree that the controller looks simple, accounts well
and works. We can/should definitely optimize the reclaim path.


--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Paul Menage

On 2/19/07, Kirill Korotaev <[EMAIL PROTECTED]> wrote:

>
> I think it's OK for a container to consume lots of system time during
> reclaim, as long as we can account that time to the container involved
> (i.e. if it's done during direct reclaim rather than by something like
> kswapd).
hmm, is it ok to scan 100Gb of RAM for 10MB RAM container?
in UBC patch set we used page beancounters to track containter pages.
This allows to make efficient scan contoler and reclamation.


I don't mean that we shouldn't go for the most efficient method that's
practical. If we can do reclaim without spinning across so much of the
LRU, then that's obviously better.

But if the best approach in the general case results in a process in
the container spending lots of CPU time trying to do the reclaim,
that's probably OK as long as we can account for that time and (once
we have a CPU controller) throttle back the container in that case. So
then, a container can only hurt itself by thrashing/reclaiming, rather
than hurting other containers. (LRU churn notwithstanding ...)

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Kirill Korotaev
> On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
>>Alas, I fear this might have quite bad worst-case behaviour.  One small
>>container which is under constant memory pressure will churn the
>>system-wide LRUs like mad, and will consume rather a lot of system time.
>>So it's a point at which container A can deleteriously affect things which
>>are running in other containers, which is exactly what we're supposed to
>>not do.
> 
> 
> I think it's OK for a container to consume lots of system time during
> reclaim, as long as we can account that time to the container involved
> (i.e. if it's done during direct reclaim rather than by something like
> kswapd).
hmm, is it ok to scan 100Gb of RAM for 10MB RAM container?
in UBC patch set we used page beancounters to track containter pages.
This allows to make efficient scan contoler and reclamation.

Thanks,
Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Magnus Damm

On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> This patch applies on top of Paul Menage's container patches (V7) posted at
>
>   http://lkml.org/lkml/2007/2/12/88
>
> It implements a controller within the containers framework for limiting
> memory usage (RSS usage).



The key part of this patchset is the reclaim algorithm:

Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time.
So it's a point at which container A can deleteriously affect things which
are running in other containers, which is exactly what we're supposed to
not do.


Nice with a simple memory controller. The downside seems to be that it
doesn't scale very well when it comes to reclaim, but maybe that just
comes with being simple. Step by step, and maybe this is a good first
step?

Ideally I'd like to see unmapped pages handled on a per-container LRU
with a fallback to the system-wide LRUs. Shared/mapped pages could be
handled using PTE ageing/unmapping instead of page ageing, but that
may consume too much resources to be practical.

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Paul Menage

On 2/19/07, Andrew Morton <[EMAIL PROTECTED]> wrote:


Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time.
So it's a point at which container A can deleteriously affect things which
are running in other containers, which is exactly what we're supposed to
not do.


I think it's OK for a container to consume lots of system time during
reclaim, as long as we can account that time to the container involved
(i.e. if it's done during direct reclaim rather than by something like
kswapd).

Churning the LRU could well be bad though, I agree.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-19 Thread Andrew Morton
On Mon, 19 Feb 2007 12:20:19 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> This patch applies on top of Paul Menage's container patches (V7) posted at
> 
>   http://lkml.org/lkml/2007/2/12/88
> 
> It implements a controller within the containers framework for limiting
> memory usage (RSS usage).

It's good to see someone building on someone else's work for once, rather
than everyone going off in different directions.  It makes one hope that we
might actually achieve something at last.


The key part of this patchset is the reclaim algorithm:

> @@ -636,6 +642,15 @@ static unsigned long isolate_lru_pages(u
>  
>   list_del(&page->lru);
>   target = src;
> + /*
> +  * For containers, do not scan the page unless it
> +  * belongs to the container we are reclaiming for
> +  */
> + if (container && !page_in_container(page, zone, container)) {
> + scan--;
> + goto done;
> + }

Alas, I fear this might have quite bad worst-case behaviour.  One small
container which is under constant memory pressure will churn the
system-wide LRUs like mad, and will consume rather a lot of system time. 
So it's a point at which container A can deleteriously affect things which
are running in other containers, which is exactly what we're supposed to
not do.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH][0/4] Memory controller (RSS Control)

2007-02-18 Thread Balbir Singh
This patch applies on top of Paul Menage's container patches (V7) posted at

http://lkml.org/lkml/2007/2/12/88

It implements a controller within the containers framework for limiting
memory usage (RSS usage).

The memory controller was discussed at length in the RFC posted to lkml
http://lkml.org/lkml/2006/10/30/51

Steps to use the controller
--


0. Download the patches, apply the patches
1. Turn on CONFIG_CONTAINER_MEMCTLR in kernel config, build the kernel
   and boot into the new kernel
2. mount -t container container -o memctlr /
3. cd /
   optionally do (mkdir ; cd ) under /
4. echo $$ > tasks (attaches the current shell to the container)
5. echo -n (limit value) > memctlr_limit
6. cat memctlr_usage
7. Run tasks, check the usage of the controller, reclaim behaviour
8. Report bugs, get bug fixes and iterate (goto step 0).

Advantages of the patchset
--

1. Zero overhead in struct page (struct page is not expanded)
2. Minimal changes to the core-mm code
3. Shared pages are not reclaimed unless all mappings belong to overlimit
   containers.
4. It can be used to debug drivers/applications/kernel components in a
   constrained memory environment (similar to mem=XXX option), except that
   several containers can be created simultaneously without rebooting and
   the limits can be changed. NOTE: There is no support for limiting
   kernel memory allocations and page cache control (presently).

Testing
---
Ran kernbench and lmbench with containers enabled (container filesystem not
mounted), they seemed to run fine
Created containers, attached tasks to containers with lower limits than
the memory the tasks require (memory hog tests) and ran some basic tests on
them

TODO's and improvement areas

1. Come up with cool page replacement algorithms for containers
   (if possible without any changes to struct page)
2. Add page cache control
3. Add kernel memory allocator control
4. Extract benchmark numbers and overhead data

Comments & criticism are welcome.

Series
--
memctlr-setup.patch
memctlr-acct.patch
memctlr-reclaim-on-limit.patch
memctlr-doc.patch

-- 
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/