Re: [patch 05/19] split LRU lists into anon & file sets

2008-02-06 Thread Rik van Riel
On Thu, 07 Feb 2008 10:20:39 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> looks good.
> thank you clean up code.

Yeah, it looks good.

Too bad it still does not work :)

Oh well, I'll look at that tomorrow.  Jet lag is catching up
with me, so I should get some rest first...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-02-06 Thread KOSAKI Motohiro
Hi Rik

Welcome back :)

> > I found number of scan pages calculation bug.
> 
> My latest version of get_scan_ratio() works differently, with the
> percentages always adding up to 100.  However, your patch gave me
> the inspiration to (hopefully) find the bug in my version of the
> code.

OK.


> > 2. wrong fraction omission
> > 
> > nr[l] = zone->nr_scan[l] * percent[file] / 100;
> > 
> > when percent is very small,
> > nr[l] become 0.
> 
> This is probably where the problem is.  Kind of.
> 
> I believe that the problem is that we scale nr[l] by the percentage,
> instead of scaling the amount we add to zone->nr_scan[l] by the
> percentage!

Aahh,
you are right.


> Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
> ===
> --- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-02-06 19:23:16.0 
> -0500
> +++ linux-2.6.24-rc6-mm1/mm/vmscan.c  2008-02-06 19:22:55.0 -0500
> @@ -1275,13 +1275,17 @@ static unsigned long shrink_zone(int pri
>   for_each_lru(l) {
>   if (scan_global_lru(sc)) {
>   int file = is_file_lru(l);
> + int scan;
>   /*
>* Add one to nr_to_scan just to make sure that the
> -  * kernel will slowly sift through the active list.
> +  * kernel will slowly sift through each list.
>*/
> - zone->nr_scan[l] += (zone_page_state(zone,
> - NR_INACTIVE_ANON + l) >> priority) + 1;
> - nr[l] = zone->nr_scan[l] * percent[file] / 100;
> + scan = zone_page_state(zone, NR_INACTIVE_ANON + l);
> + scan >>= priority;
> + scan = (scan * percent[file]) / 100;
> +
> + zone->nr_scan[l] += scan + 1;
> + nr[l] = zone->nr_scan[l];
>   if (nr[l] >= sc->swap_cluster_max)
>   zone->nr_scan[l] = 0;
>   else

looks good.
thank you clean up code.


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-02-06 Thread Rik van Riel
On Wed, 30 Jan 2008 17:57:54 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> I found number of scan pages calculation bug.

My latest version of get_scan_ratio() works differently, with the
percentages always adding up to 100.  However, your patch gave me
the inspiration to (hopefully) find the bug in my version of the
code.
 
> 1. wrong calculation order

I do not believe my new code has this problem.  Of course, this
is purely due to luck :)

> 2. wrong fraction omission
> 
>   nr[l] = zone->nr_scan[l] * percent[file] / 100;
> 
>   when percent is very small,
>   nr[l] become 0.

This is probably where the problem is.  Kind of.

I believe that the problem is that we scale nr[l] by the percentage,
instead of scaling the amount we add to zone->nr_scan[l] by the
percentage!

> @@ -1409,7 +1410,11 @@ static unsigned long shrink_zone(int pri
>*/
>   zone->nr_scan[l] += (zone_page_state(zone,
>   NR_INACTIVE_ANON + l) >> priority) + 1;
> - nr[l] = zone->nr_scan[l] * percent[file] / 100;
> + nr[l] = (zone->nr_scan[l] * percent[file] / 100) + 1;
> + nr_max_scan = zone_page_state(zone, NR_INACTIVE_ANON+l);
> + if (nr[l] > nr_max_scan)
> + nr[l] = nr_max_scan;
> +
>   if (nr[l] >= sc->swap_cluster_max)
>   zone->nr_scan[l] = 0;
>   else

With the fix below (against my latest tree), we always add at least one
to zone->nr_scan[l] and always make that increment count later on!

I am still recovering from my trip home (thanks to the airline companies
I spent 25 hours travelling, from door to door), so I may not get around
to actually testing this today:

Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
===
--- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c   2008-02-06 19:23:16.0 
-0500
+++ linux-2.6.24-rc6-mm1/mm/vmscan.c2008-02-06 19:22:55.0 -0500
@@ -1275,13 +1275,17 @@ static unsigned long shrink_zone(int pri
for_each_lru(l) {
if (scan_global_lru(sc)) {
int file = is_file_lru(l);
+   int scan;
/*
 * Add one to nr_to_scan just to make sure that the
-* kernel will slowly sift through the active list.
+* kernel will slowly sift through each list.
 */
-   zone->nr_scan[l] += (zone_page_state(zone,
-   NR_INACTIVE_ANON + l) >> priority) + 1;
-   nr[l] = zone->nr_scan[l] * percent[file] / 100;
+   scan = zone_page_state(zone, NR_INACTIVE_ANON + l);
+   scan >>= priority;
+   scan = (scan * percent[file]) / 100;
+
+   zone->nr_scan[l] += scan + 1;
+   nr[l] = zone->nr_scan[l];
if (nr[l] >= sc->swap_cluster_max)
zone->nr_scan[l] = 0;
else

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-02-06 Thread Rik van Riel
On Wed, 30 Jan 2008 17:57:54 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 I found number of scan pages calculation bug.

My latest version of get_scan_ratio() works differently, with the
percentages always adding up to 100.  However, your patch gave me
the inspiration to (hopefully) find the bug in my version of the
code.
 
 1. wrong calculation order

I do not believe my new code has this problem.  Of course, this
is purely due to luck :)

 2. wrong fraction omission
 
   nr[l] = zone-nr_scan[l] * percent[file] / 100;
 
   when percent is very small,
   nr[l] become 0.

This is probably where the problem is.  Kind of.

I believe that the problem is that we scale nr[l] by the percentage,
instead of scaling the amount we add to zone-nr_scan[l] by the
percentage!

 @@ -1409,7 +1410,11 @@ static unsigned long shrink_zone(int pri
*/
   zone-nr_scan[l] += (zone_page_state(zone,
   NR_INACTIVE_ANON + l)  priority) + 1;
 - nr[l] = zone-nr_scan[l] * percent[file] / 100;
 + nr[l] = (zone-nr_scan[l] * percent[file] / 100) + 1;
 + nr_max_scan = zone_page_state(zone, NR_INACTIVE_ANON+l);
 + if (nr[l]  nr_max_scan)
 + nr[l] = nr_max_scan;
 +
   if (nr[l] = sc-swap_cluster_max)
   zone-nr_scan[l] = 0;
   else

With the fix below (against my latest tree), we always add at least one
to zone-nr_scan[l] and always make that increment count later on!

I am still recovering from my trip home (thanks to the airline companies
I spent 25 hours travelling, from door to door), so I may not get around
to actually testing this today:

Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
===
--- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c   2008-02-06 19:23:16.0 
-0500
+++ linux-2.6.24-rc6-mm1/mm/vmscan.c2008-02-06 19:22:55.0 -0500
@@ -1275,13 +1275,17 @@ static unsigned long shrink_zone(int pri
for_each_lru(l) {
if (scan_global_lru(sc)) {
int file = is_file_lru(l);
+   int scan;
/*
 * Add one to nr_to_scan just to make sure that the
-* kernel will slowly sift through the active list.
+* kernel will slowly sift through each list.
 */
-   zone-nr_scan[l] += (zone_page_state(zone,
-   NR_INACTIVE_ANON + l)  priority) + 1;
-   nr[l] = zone-nr_scan[l] * percent[file] / 100;
+   scan = zone_page_state(zone, NR_INACTIVE_ANON + l);
+   scan = priority;
+   scan = (scan * percent[file]) / 100;
+
+   zone-nr_scan[l] += scan + 1;
+   nr[l] = zone-nr_scan[l];
if (nr[l] = sc-swap_cluster_max)
zone-nr_scan[l] = 0;
else

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-02-06 Thread KOSAKI Motohiro
Hi Rik

Welcome back :)

  I found number of scan pages calculation bug.
 
 My latest version of get_scan_ratio() works differently, with the
 percentages always adding up to 100.  However, your patch gave me
 the inspiration to (hopefully) find the bug in my version of the
 code.

OK.


  2. wrong fraction omission
  
  nr[l] = zone-nr_scan[l] * percent[file] / 100;
  
  when percent is very small,
  nr[l] become 0.
 
 This is probably where the problem is.  Kind of.
 
 I believe that the problem is that we scale nr[l] by the percentage,
 instead of scaling the amount we add to zone-nr_scan[l] by the
 percentage!

Aahh,
you are right.


 Index: linux-2.6.24-rc6-mm1/mm/vmscan.c
 ===
 --- linux-2.6.24-rc6-mm1.orig/mm/vmscan.c 2008-02-06 19:23:16.0 
 -0500
 +++ linux-2.6.24-rc6-mm1/mm/vmscan.c  2008-02-06 19:22:55.0 -0500
 @@ -1275,13 +1275,17 @@ static unsigned long shrink_zone(int pri
   for_each_lru(l) {
   if (scan_global_lru(sc)) {
   int file = is_file_lru(l);
 + int scan;
   /*
* Add one to nr_to_scan just to make sure that the
 -  * kernel will slowly sift through the active list.
 +  * kernel will slowly sift through each list.
*/
 - zone-nr_scan[l] += (zone_page_state(zone,
 - NR_INACTIVE_ANON + l)  priority) + 1;
 - nr[l] = zone-nr_scan[l] * percent[file] / 100;
 + scan = zone_page_state(zone, NR_INACTIVE_ANON + l);
 + scan = priority;
 + scan = (scan * percent[file]) / 100;
 +
 + zone-nr_scan[l] += scan + 1;
 + nr[l] = zone-nr_scan[l];
   if (nr[l] = sc-swap_cluster_max)
   zone-nr_scan[l] = 0;
   else

looks good.
thank you clean up code.


- kosaki


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-02-06 Thread Rik van Riel
On Thu, 07 Feb 2008 10:20:39 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 looks good.
 thank you clean up code.

Yeah, it looks good.

Too bad it still does not work :)

Oh well, I'll look at that tomorrow.  Jet lag is catching up
with me, so I should get some rest first...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-31 Thread KOSAKI Motohiro
> I will integrate your fixes with my code when I
> get back from holidays.  Then things should work :)
> 
> Thank you for your analysis of the problem.

Thank you.
enjoy good vacation :)

-
kosaki



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 10:17:48 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> on my test environment, my patch solve incorrect OOM.
> because, too small reclaim cause OOM.

That makes sense.

The version you two are looking at can return
"percentages" way larger than 100 in get_scan_ratio.

A fixed version of get_scan_ratio, where the
percentages always add up to 100%, makes the
system go OOM before it seriously starts
swapping.

I will integrate your fixes with my code when I
get back from holidays.  Then things should work :)

Thank you for your analysis of the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 10:17:48 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 on my test environment, my patch solve incorrect OOM.
 because, too small reclaim cause OOM.

That makes sense.

The version you two are looking at can return
percentages way larger than 100 in get_scan_ratio.

A fixed version of get_scan_ratio, where the
percentages always add up to 100%, makes the
system go OOM before it seriously starts
swapping.

I will integrate your fixes with my code when I
get back from holidays.  Then things should work :)

Thank you for your analysis of the problem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-31 Thread KOSAKI Motohiro
 I will integrate your fixes with my code when I
 get back from holidays.  Then things should work :)
 
 Thank you for your analysis of the problem.

Thank you.
enjoy good vacation :)

-
kosaki



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-30 Thread KOSAKI Motohiro
Hi Lee-san

> Rik is currently out on holiday and I've been traveling.  Just getting
> back to rebasing to 24-rc8-mm1.  Thank you for your efforts in testing
> and tracking down the regressions.  I will add your fixes into my tree
> and try them out and let you know.  Rik mentioned to me that he has a
> fix for the "get_scan_ratio()" calculation that is causing us to OOM
> kill prematurely--i.e., when we still have lots of swap space to evict
> swappable anon.  I don't know if it's similar to what you have posted.
> Have to wait and see what he says.  Meantime, we'll try your patches.

thank you for your quick response.

on my test environment, my patch solve incorrect OOM.
because, too small reclaim cause OOM.

Please confirm.


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-30 Thread Lee Schermerhorn
On Wed, 2008-01-30 at 17:57 +0900, KOSAKI Motohiro wrote:
> Hi Rik, Lee
> 
> I found number of scan pages calculation bug.
> 
> 1. wrong calculation order
> 
>   ap *= rotate_sum / (zone->recent_rotated_anon + 1);
> 
>when recent_rotated_anon = 100 and recent_rotated_file = 0,
>
>  rotate_sum / (zone->recent_rotated_anon + 1)
>= 100 / 101
>= 0
> 
>at that time, ap become 0.
> 
> 2. wrong fraction omission
> 
>   nr[l] = zone->nr_scan[l] * percent[file] / 100;
> 
>   when percent is very small,
>   nr[l] become 0.
> 
> Test Result:
> (1) $ ./hackbench 150 process 1000
> (2) # sync; echo 3 > /proc/sys/vm/drop_caches
> $ dd if=tmp10G of=/dev/null
> $ ./hackbench 150 process 1000
> 
> rvr-split-lru + revert patch of previous mail
>   (1) 83.014
>   (2) 717.009
> 
> rvr-split-lru + revert patch of previous mail + below patch
>   (1) 61.965
>   (2) 85.444 !!
> 
> 
> Now, We got 1000% performance improvement against 2.6.24-rc8-mm1 :)
> 
> 
> 
> - kosaki
> 
> 
> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>



Kosaki-san:

Rik is currently out on holiday and I've been traveling.  Just getting
back to rebasing to 24-rc8-mm1.  Thank you for your efforts in testing
and tracking down the regressions.  I will add your fixes into my tree
and try them out and let you know.  Rik mentioned to me that he has a
fix for the "get_scan_ratio()" calculation that is causing us to OOM
kill prematurely--i.e., when we still have lots of swap space to evict
swappable anon.  I don't know if it's similar to what you have posted.
Have to wait and see what he says.  Meantime, we'll try your patches.

Again, thank you.

Regards,
Lee



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-30 Thread KOSAKI Motohiro
Hi Rik, Lee

I found number of scan pages calculation bug.

1. wrong calculation order

ap *= rotate_sum / (zone->recent_rotated_anon + 1);

   when recent_rotated_anon = 100 and recent_rotated_file = 0,
   
 rotate_sum / (zone->recent_rotated_anon + 1)
   = 100 / 101
   = 0

   at that time, ap become 0.

2. wrong fraction omission

nr[l] = zone->nr_scan[l] * percent[file] / 100;

when percent is very small,
nr[l] become 0.

Test Result:
(1) $ ./hackbench 150 process 1000
(2) # sync; echo 3 > /proc/sys/vm/drop_caches
$ dd if=tmp10G of=/dev/null
$ ./hackbench 150 process 1000

rvr-split-lru + revert patch of previous mail
(1) 83.014
(2) 717.009

rvr-split-lru + revert patch of previous mail + below patch
(1) 61.965
(2) 85.444 !!


Now, We got 1000% performance improvement against 2.6.24-rc8-mm1 :)



- kosaki


Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/vmscan.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-01-30 15:22:10.0 +0900
+++ b/mm/vmscan.c   2008-01-30 16:03:28.0 +0900
@@ -1355,7 +1355,7 @@ static void get_scan_ratio(struct zone *
 *   anon + file   rotate_sum
 */
ap = (anon_prio * anon) / (anon + file + 1);
-   ap *= rotate_sum / (zone->recent_rotated_anon + 1);
+   ap = (ap * rotate_sum) / (zone->recent_rotated_anon + 1);
if (ap == 0)
ap = 1;
else if (ap > 100)
@@ -1363,7 +1363,7 @@ static void get_scan_ratio(struct zone *
percent[0] = ap;
 
fp = (file_prio * file) / (anon + file + 1);
-   fp *= rotate_sum / (zone->recent_rotated_file + 1);
+   fp = (fp * rotate_sum) / (zone->recent_rotated_file + 1);
if (fp == 0)
fp = 1;
else if (fp > 100)
@@ -1402,6 +1402,7 @@ static unsigned long shrink_zone(int pri
 
for_each_reclaimable_lru(l) {
if (scan_global_lru(sc)) {
+   unsigned long nr_max_scan;
int file = is_file_lru(l);
/*
 * Add one to nr_to_scan just to make sure that the
@@ -1409,7 +1410,11 @@ static unsigned long shrink_zone(int pri
 */
zone->nr_scan[l] += (zone_page_state(zone,
NR_INACTIVE_ANON + l) >> priority) + 1;
-   nr[l] = zone->nr_scan[l] * percent[file] / 100;
+   nr[l] = (zone->nr_scan[l] * percent[file] / 100) + 1;
+   nr_max_scan = zone_page_state(zone, NR_INACTIVE_ANON+l);
+   if (nr[l] > nr_max_scan)
+   nr[l] = nr_max_scan;
+
if (nr[l] >= sc->swap_cluster_max)
zone->nr_scan[l] = 0;
else



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-30 Thread KOSAKI Motohiro
Hi Rik, Lee

I found number of scan pages calculation bug.

1. wrong calculation order

ap *= rotate_sum / (zone-recent_rotated_anon + 1);

   when recent_rotated_anon = 100 and recent_rotated_file = 0,
   
 rotate_sum / (zone-recent_rotated_anon + 1)
   = 100 / 101
   = 0

   at that time, ap become 0.

2. wrong fraction omission

nr[l] = zone-nr_scan[l] * percent[file] / 100;

when percent is very small,
nr[l] become 0.

Test Result:
(1) $ ./hackbench 150 process 1000
(2) # sync; echo 3  /proc/sys/vm/drop_caches
$ dd if=tmp10G of=/dev/null
$ ./hackbench 150 process 1000

rvr-split-lru + revert patch of previous mail
(1) 83.014
(2) 717.009

rvr-split-lru + revert patch of previous mail + below patch
(1) 61.965
(2) 85.444 !!


Now, We got 1000% performance improvement against 2.6.24-rc8-mm1 :)



- kosaki


Signed-off-by: KOSAKI Motohiro [EMAIL PROTECTED]

---
 mm/vmscan.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-01-30 15:22:10.0 +0900
+++ b/mm/vmscan.c   2008-01-30 16:03:28.0 +0900
@@ -1355,7 +1355,7 @@ static void get_scan_ratio(struct zone *
 *   anon + file   rotate_sum
 */
ap = (anon_prio * anon) / (anon + file + 1);
-   ap *= rotate_sum / (zone-recent_rotated_anon + 1);
+   ap = (ap * rotate_sum) / (zone-recent_rotated_anon + 1);
if (ap == 0)
ap = 1;
else if (ap  100)
@@ -1363,7 +1363,7 @@ static void get_scan_ratio(struct zone *
percent[0] = ap;
 
fp = (file_prio * file) / (anon + file + 1);
-   fp *= rotate_sum / (zone-recent_rotated_file + 1);
+   fp = (fp * rotate_sum) / (zone-recent_rotated_file + 1);
if (fp == 0)
fp = 1;
else if (fp  100)
@@ -1402,6 +1402,7 @@ static unsigned long shrink_zone(int pri
 
for_each_reclaimable_lru(l) {
if (scan_global_lru(sc)) {
+   unsigned long nr_max_scan;
int file = is_file_lru(l);
/*
 * Add one to nr_to_scan just to make sure that the
@@ -1409,7 +1410,11 @@ static unsigned long shrink_zone(int pri
 */
zone-nr_scan[l] += (zone_page_state(zone,
NR_INACTIVE_ANON + l)  priority) + 1;
-   nr[l] = zone-nr_scan[l] * percent[file] / 100;
+   nr[l] = (zone-nr_scan[l] * percent[file] / 100) + 1;
+   nr_max_scan = zone_page_state(zone, NR_INACTIVE_ANON+l);
+   if (nr[l]  nr_max_scan)
+   nr[l] = nr_max_scan;
+
if (nr[l] = sc-swap_cluster_max)
zone-nr_scan[l] = 0;
else



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-30 Thread Lee Schermerhorn
On Wed, 2008-01-30 at 17:57 +0900, KOSAKI Motohiro wrote:
 Hi Rik, Lee
 
 I found number of scan pages calculation bug.
 
 1. wrong calculation order
 
   ap *= rotate_sum / (zone-recent_rotated_anon + 1);
 
when recent_rotated_anon = 100 and recent_rotated_file = 0,

  rotate_sum / (zone-recent_rotated_anon + 1)
= 100 / 101
= 0
 
at that time, ap become 0.
 
 2. wrong fraction omission
 
   nr[l] = zone-nr_scan[l] * percent[file] / 100;
 
   when percent is very small,
   nr[l] become 0.
 
 Test Result:
 (1) $ ./hackbench 150 process 1000
 (2) # sync; echo 3  /proc/sys/vm/drop_caches
 $ dd if=tmp10G of=/dev/null
 $ ./hackbench 150 process 1000
 
 rvr-split-lru + revert patch of previous mail
   (1) 83.014
   (2) 717.009
 
 rvr-split-lru + revert patch of previous mail + below patch
   (1) 61.965
   (2) 85.444 !!
 
 
 Now, We got 1000% performance improvement against 2.6.24-rc8-mm1 :)
 
 
 
 - kosaki
 
 
 Signed-off-by: KOSAKI Motohiro [EMAIL PROTECTED]

snip

Kosaki-san:

Rik is currently out on holiday and I've been traveling.  Just getting
back to rebasing to 24-rc8-mm1.  Thank you for your efforts in testing
and tracking down the regressions.  I will add your fixes into my tree
and try them out and let you know.  Rik mentioned to me that he has a
fix for the get_scan_ratio() calculation that is causing us to OOM
kill prematurely--i.e., when we still have lots of swap space to evict
swappable anon.  I don't know if it's similar to what you have posted.
Have to wait and see what he says.  Meantime, we'll try your patches.

Again, thank you.

Regards,
Lee



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-30 Thread KOSAKI Motohiro
Hi Lee-san

 Rik is currently out on holiday and I've been traveling.  Just getting
 back to rebasing to 24-rc8-mm1.  Thank you for your efforts in testing
 and tracking down the regressions.  I will add your fixes into my tree
 and try them out and let you know.  Rik mentioned to me that he has a
 fix for the get_scan_ratio() calculation that is causing us to OOM
 kill prematurely--i.e., when we still have lots of swap space to evict
 swappable anon.  I don't know if it's similar to what you have posted.
 Have to wait and see what he says.  Meantime, we'll try your patches.

thank you for your quick response.

on my test environment, my patch solve incorrect OOM.
because, too small reclaim cause OOM.

Please confirm.


- kosaki


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-29 Thread KOSAKI Motohiro
Hi Rik, Lee

I tested new hackbench on rvr split LRU patch.
   http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

method of test

(1) $ ./hackbench 150 process 1000
(2) # sync; echo 3 > /proc/sys/vm/drop_caches
$ dd if=tmp10G of=/dev/null
$ ./hackbench 150 process 1000

test machine
CPU: Itanium2 x4 (logical 8cpu)
MEM: 8GB

A. vanilla 2.6.24-rc8-mm1
(1) 127.540
(2) 727.548

B. 2.6.24-rc8-mm1 + split-lru-patch-series
(1) 92.730
(2) 758.369

   comment:
(1) active/inactive anon ratio improve performance significant.
(2) incorrect page activation reduce performance.


I investigate reason and found reason is [05/19] change.
I tested a bit porton reverted split-lru-patch-series again.

C. 2.6.24-rc8-mm1 + split-lru-patch-series + my-revert-patch
(1) 83.014
(2) 717.009


Of course, We need reintroduce this portion after new page LRU
(aka LRU for used only page).
but now is too early.

I hope this patch series merge to -mm ASAP.
therefore, I hope remove any corner case regression.

Thanks!


- kosaki



Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/vmscan.c |   26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-01-29 15:59:17.0 +0900
+++ b/mm/vmscan.c   2008-01-30 11:53:42.0 +0900
@@ -247,6 +247,27 @@
return ret;
 }

+/* Called without lock on whether page is mapped, so answer is unstable */
+static inline int page_mapping_inuse(struct page *page)
+{
+   struct address_space *mapping;
+
+   /* Page is in somebody's page tables. */
+   if (page_mapped(page))
+   return 1;
+
+   /* Be more reluctant to reclaim swapcache than pagecache */
+   if (PageSwapCache(page))
+   return 1;
+
+   mapping = page_mapping(page);
+   if (!mapping)
+   return 0;
+
+   /* File is mmap'd by somebody? */
+   return mapping_mapped(mapping);
+}
+
 static inline int is_page_cache_freeable(struct page *page)
 {
return page_count(page) - !!PagePrivate(page) == 2;
@@ -515,7 +536,8 @@

referenced = page_referenced(page, 1, sc->mem_cgroup);
/* In active use or really unfreeable?  Activate it. */
-   if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
+   if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
+   referenced && page_mapping_inuse(page))
goto activate_locked;

 #ifdef CONFIG_SWAP
@@ -550,6 +572,8 @@
}

if (PageDirty(page)) {
+   if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
+   goto keep_locked;
if (!may_enter_fs) {
sc->nr_io_pages++;
goto keep_locked;





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-14 Thread KOSAKI Motohiro
Hi

> > Why drop (total_swap_pages == 0 && PageAnon(page)) condition?
> > in embedded sysmtem, 
> > CONFIG_NORECLAIM is OFF (because almost embedded cpu is 32bit) and
> > that anon move to inactive list is meaningless because it doesn't have swap.
> 
> That was a mistake, kind of.  Since all swap backed pages are on their
> own LRU lists, we should not scan those lists at all any more if we are
> out of swap space.
> 
> The patch that fixes get_scan_ratio() adds that test.
> 
> Having said that, with the nr_swap_pages==0 test in get_scan_ratio(),
> we no longer need to test for that condition in shrink_active_list().

Oh I see!
thank you for your kindful lecture.

your implementation is very cute.


> > below code is more good, may be.
> > but I don't understand yet why ignore page_referenced() result at anon page 
> > ;-)
> 
> On modern systems, swapping out anonymous pages is a relatively rare
> event.  All anonymous pages start out as active and referenced, so
> testing for that condition does (1) not add any information and (2)
> mean we need to scan ALL of the anonymous pages, in order to find one
> candidate to swap out (since they are all referenced).
> 
> Simply deactivating a few pages and checking whether they were referenced
> again while on the (smaller) inactive_anon_list means we can find candidates
> to page out with a lot less CPU time used.

thanks, I understand, may be.


- kosaki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-14 Thread KOSAKI Motohiro
Hi

  Why drop (total_swap_pages == 0  PageAnon(page)) condition?
  in embedded sysmtem, 
  CONFIG_NORECLAIM is OFF (because almost embedded cpu is 32bit) and
  that anon move to inactive list is meaningless because it doesn't have swap.
 
 That was a mistake, kind of.  Since all swap backed pages are on their
 own LRU lists, we should not scan those lists at all any more if we are
 out of swap space.
 
 The patch that fixes get_scan_ratio() adds that test.
 
 Having said that, with the nr_swap_pages==0 test in get_scan_ratio(),
 we no longer need to test for that condition in shrink_active_list().

Oh I see!
thank you for your kindful lecture.

your implementation is very cute.


  below code is more good, may be.
  but I don't understand yet why ignore page_referenced() result at anon page 
  ;-)
 
 On modern systems, swapping out anonymous pages is a relatively rare
 event.  All anonymous pages start out as active and referenced, so
 testing for that condition does (1) not add any information and (2)
 mean we need to scan ALL of the anonymous pages, in order to find one
 candidate to swap out (since they are all referenced).
 
 Simply deactivating a few pages and checking whether they were referenced
 again while on the (smaller) inactive_anon_list means we can find candidates
 to page out with a lot less CPU time used.

thanks, I understand, may be.


- kosaki

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 11:15 -0500, Rik van Riel wrote:
> On Fri, 11 Jan 2008 10:59:18 -0500
> Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
> > > On Fri, 11 Jan 2008 15:24:34 +0900
> > > KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:
> > > 
> > > > below patch is a bit cleanup proposal.
> > > > i think LRU_FILE is more clarify than "/2".
> > > > 
> > > > What do you think it?
> > > 
> > > Thank you for the cleanup, your version looks a lot nicer.  
> > > I have applied your patch to my series.
> > > 
> > 
> > Rik:  
> > 
> > I think we also want to do something like:
> > 
> > -   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
> > +   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
> > +   NR_LRU_LISTS > 6);
> > 
> > Then we'll be warned if future change might break our implicit
> > assumption that any lru_list value with '0x2' set is a file lru.
> 
> Restoring the code to your original version makes things work again.
> 
> OTOH, I almost wonder if we should not simply define it to
> 
>   return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE)
> 
> and just deal with it.
> 
> Your version of the code is correct and probably faster, but not as
> easy to read and probably not in a hot path :)

Sure.  Whatever you think will fly...

Lee
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 10:59:18 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
> > On Fri, 11 Jan 2008 15:24:34 +0900
> > KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:
> > 
> > > below patch is a bit cleanup proposal.
> > > i think LRU_FILE is more clarify than "/2".
> > > 
> > > What do you think it?
> > 
> > Thank you for the cleanup, your version looks a lot nicer.  
> > I have applied your patch to my series.
> > 
> 
> Rik:  
> 
> I think we also want to do something like:
> 
> - BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
> + BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
> + NR_LRU_LISTS > 6);
> 
> Then we'll be warned if future change might break our implicit
> assumption that any lru_list value with '0x2' set is a file lru.

Restoring the code to your original version makes things work again.

OTOH, I almost wonder if we should not simply define it to

return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE)

and just deal with it.

Your version of the code is correct and probably faster, but not as
easy to read and probably not in a hot path :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 10:50:09 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> Again, my doing.  I agree that the calculation is a bit strange, but I
> wanted to "future-proof" this function in case we ever get to a value of
> '6' for the lru_list enum.  In that case, the AND will evaluate to
> non-zero for what may not be a file LRU.  Between the build time
> assertion and the division [which could just be a 'l >> 1', I suppose]
> we should be safe.

Good point.  I did not guess that.

I'll restore the code to your original test.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
> On Fri, 11 Jan 2008 15:24:34 +0900
> KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:
> 
> > below patch is a bit cleanup proposal.
> > i think LRU_FILE is more clarify than "/2".
> > 
> > What do you think it?
> 
> Thank you for the cleanup, your version looks a lot nicer.  
> I have applied your patch to my series.
> 

Rik:  

I think we also want to do something like:

-   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
+   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
+   NR_LRU_LISTS > 6);

Then we'll be warned if future change might break our implicit
assumption that any lru_list value with '0x2' set is a file lru.

Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 15:24 +0900, KOSAKI Motohiro wrote:
> Hi Rik
> 
> > +static inline int is_file_lru(enum lru_list l)
> > +{
> > +   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
> > +   return (l/2 == 1);
> > +}
> 
> below patch is a bit cleanup proposal.
> i think LRU_FILE is more clarify than "/2".
> 
> What do you think it?
> 
> 
> 
> Index: linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h
> ===
> --- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/mmzone.h2008-01-11 
> 11:10:30.0 +0900
> +++ linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h 2008-01-11 
> 14:40:31.0 +0900
> @@ -147,7 +147,7 @@
>  static inline int is_file_lru(enum lru_list l)
>  {
> BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
> -   return (l/2 == 1);
> +   return !!(l & LRU_FILE);
>  }
> 
>  struct per_cpu_pages {
> 

Kosaki-san:

Again, my doing.  I agree that the calculation is a bit strange, but I
wanted to "future-proof" this function in case we ever get to a value of
'6' for the lru_list enum.  In that case, the AND will evaluate to
non-zero for what may not be a file LRU.  Between the build time
assertion and the division [which could just be a 'l >> 1', I suppose]
we should be safe.

Thanks,
Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 16:35:24 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> Why drop (total_swap_pages == 0 && PageAnon(page)) condition?
> in embedded sysmtem, 
> CONFIG_NORECLAIM is OFF (because almost embedded cpu is 32bit) and
> that anon move to inactive list is meaningless because it doesn't have swap.

That was a mistake, kind of.  Since all swap backed pages are on their
own LRU lists, we should not scan those lists at all any more if we are
out of swap space.

The patch that fixes get_scan_ratio() adds that test.

Having said that, with the nr_swap_pages==0 test in get_scan_ratio(),
we no longer need to test for that condition in shrink_active_list().

> below code is more good, may be.
> but I don't understand yet why ignore page_referenced() result at anon page 
> ;-)

On modern systems, swapping out anonymous pages is a relatively rare
event.  All anonymous pages start out as active and referenced, so
testing for that condition does (1) not add any information and (2)
mean we need to scan ALL of the anonymous pages, in order to find one
candidate to swap out (since they are all referenced).

Simply deactivating a few pages and checking whether they were referenced
again while on the (smaller) inactive_anon_list means we can find candidates
to page out with a lot less CPU time used.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 15:24:34 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> below patch is a bit cleanup proposal.
> i think LRU_FILE is more clarify than "/2".
> 
> What do you think it?

Thank you for the cleanup, your version looks a lot nicer.  
I have applied your patch to my series.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 12:59:31 +0900
KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> Hi Rik
> 
> > -static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
> > -   struct zone *zone, int priority)
> > +static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
> > +   struct zone *zone, int priority,
> > +   int active, int file)
> >  {
> > return 0;
> >  }
> 
> it can't compile if memcgroup turn off.

Doh!  Good point.

Thank you for pointing out this error.  I applied your fix to my tree,
it will be in the next version.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 15:24 +0900, KOSAKI Motohiro wrote:
 Hi Rik
 
  +static inline int is_file_lru(enum lru_list l)
  +{
  +   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
  +   return (l/2 == 1);
  +}
 
 below patch is a bit cleanup proposal.
 i think LRU_FILE is more clarify than /2.
 
 What do you think it?
 
 
 
 Index: linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h
 ===
 --- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/mmzone.h2008-01-11 
 11:10:30.0 +0900
 +++ linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h 2008-01-11 
 14:40:31.0 +0900
 @@ -147,7 +147,7 @@
  static inline int is_file_lru(enum lru_list l)
  {
 BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
 -   return (l/2 == 1);
 +   return !!(l  LRU_FILE);
  }
 
  struct per_cpu_pages {
 

Kosaki-san:

Again, my doing.  I agree that the calculation is a bit strange, but I
wanted to future-proof this function in case we ever get to a value of
'6' for the lru_list enum.  In that case, the AND will evaluate to
non-zero for what may not be a file LRU.  Between the build time
assertion and the division [which could just be a 'l  1', I suppose]
we should be safe.

Thanks,
Lee

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 16:35:24 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 Why drop (total_swap_pages == 0  PageAnon(page)) condition?
 in embedded sysmtem, 
 CONFIG_NORECLAIM is OFF (because almost embedded cpu is 32bit) and
 that anon move to inactive list is meaningless because it doesn't have swap.

That was a mistake, kind of.  Since all swap backed pages are on their
own LRU lists, we should not scan those lists at all any more if we are
out of swap space.

The patch that fixes get_scan_ratio() adds that test.

Having said that, with the nr_swap_pages==0 test in get_scan_ratio(),
we no longer need to test for that condition in shrink_active_list().

 below code is more good, may be.
 but I don't understand yet why ignore page_referenced() result at anon page 
 ;-)

On modern systems, swapping out anonymous pages is a relatively rare
event.  All anonymous pages start out as active and referenced, so
testing for that condition does (1) not add any information and (2)
mean we need to scan ALL of the anonymous pages, in order to find one
candidate to swap out (since they are all referenced).

Simply deactivating a few pages and checking whether they were referenced
again while on the (smaller) inactive_anon_list means we can find candidates
to page out with a lot less CPU time used.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 12:59:31 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 Hi Rik
 
  -static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
  -   struct zone *zone, int priority)
  +static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
  +   struct zone *zone, int priority,
  +   int active, int file)
   {
  return 0;
   }
 
 it can't compile if memcgroup turn off.

Doh!  Good point.

Thank you for pointing out this error.  I applied your fix to my tree,
it will be in the next version.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
 On Fri, 11 Jan 2008 15:24:34 +0900
 KOSAKI Motohiro [EMAIL PROTECTED] wrote:
 
  below patch is a bit cleanup proposal.
  i think LRU_FILE is more clarify than /2.
  
  What do you think it?
 
 Thank you for the cleanup, your version looks a lot nicer.  
 I have applied your patch to my series.
 

Rik:  

I think we also want to do something like:

-   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
+   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
+   NR_LRU_LISTS  6);

Then we'll be warned if future change might break our implicit
assumption that any lru_list value with '0x2' set is a file lru.

Lee

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 15:24:34 +0900
KOSAKI Motohiro [EMAIL PROTECTED] wrote:

 below patch is a bit cleanup proposal.
 i think LRU_FILE is more clarify than /2.
 
 What do you think it?

Thank you for the cleanup, your version looks a lot nicer.  
I have applied your patch to my series.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 10:59:18 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
  On Fri, 11 Jan 2008 15:24:34 +0900
  KOSAKI Motohiro [EMAIL PROTECTED] wrote:
  
   below patch is a bit cleanup proposal.
   i think LRU_FILE is more clarify than /2.
   
   What do you think it?
  
  Thank you for the cleanup, your version looks a lot nicer.  
  I have applied your patch to my series.
  
 
 Rik:  
 
 I think we also want to do something like:
 
 - BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
 + BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
 + NR_LRU_LISTS  6);
 
 Then we'll be warned if future change might break our implicit
 assumption that any lru_list value with '0x2' set is a file lru.

Restoring the code to your original version makes things work again.

OTOH, I almost wonder if we should not simply define it to

return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE)

and just deal with it.

Your version of the code is correct and probably faster, but not as
easy to read and probably not in a hot path :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Rik van Riel
On Fri, 11 Jan 2008 10:50:09 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 Again, my doing.  I agree that the calculation is a bit strange, but I
 wanted to future-proof this function in case we ever get to a value of
 '6' for the lru_list enum.  In that case, the AND will evaluate to
 non-zero for what may not be a file LRU.  Between the build time
 assertion and the division [which could just be a 'l  1', I suppose]
 we should be safe.

Good point.  I did not guess that.

I'll restore the code to your original test.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 11:15 -0500, Rik van Riel wrote:
 On Fri, 11 Jan 2008 10:59:18 -0500
 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  On Fri, 2008-01-11 at 10:42 -0500, Rik van Riel wrote:
   On Fri, 11 Jan 2008 15:24:34 +0900
   KOSAKI Motohiro [EMAIL PROTECTED] wrote:
   
below patch is a bit cleanup proposal.
i think LRU_FILE is more clarify than /2.

What do you think it?
   
   Thank you for the cleanup, your version looks a lot nicer.  
   I have applied your patch to my series.
   
  
  Rik:  
  
  I think we also want to do something like:
  
  -   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
  +   BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3 ||
  +   NR_LRU_LISTS  6);
  
  Then we'll be warned if future change might break our implicit
  assumption that any lru_list value with '0x2' set is a file lru.
 
 Restoring the code to your original version makes things work again.
 
 OTOH, I almost wonder if we should not simply define it to
 
   return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE)
 
 and just deal with it.
 
 Your version of the code is correct and probably faster, but not as
 easy to read and probably not in a hot path :)

Sure.  Whatever you think will fly...

Lee
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

> @@ -1128,64 +1026,65 @@ static void shrink_active_list(unsigned 
  (snip)

> + /*
> +  * For sorting active vs inactive pages, we'll use the 'anon'
> +  * elements of the local list[] array and sort out the file vs
> +  * anon pages below.
> +  */
>   while (!list_empty(_hold)) {
> + lru = LRU_INACTIVE_ANON;
>   cond_resched();
>   page = lru_to_page(_hold);
>   list_del(>lru);
> - if (page_mapped(page)) {
> - if (!reclaim_mapped ||
> - (total_swap_pages == 0 && PageAnon(page)) ||
> - page_referenced(page, 0, sc->mem_cgroup)) {
> - list_add(>lru, [LRU_ACTIVE]);
> - continue;
> - }
> - } else if (TestClearPageReferenced(page)) {
> - list_add(>lru, [LRU_ACTIVE]);
> - continue;
> - }
> - list_add(>lru, [LRU_INACTIVE]);
> + if (page_referenced(page, 0, sc->mem_cgroup))
> + lru = LRU_ACTIVE_ANON;
> + list_add(>lru, [lru]);
>   }

Why drop (total_swap_pages == 0 && PageAnon(page)) condition?
in embedded sysmtem, 
CONFIG_NORECLAIM is OFF (because almost embedded cpu is 32bit) and
that anon move to inactive list is meaningless because it doesn't have swap.

below code is more good, may be.
but I don't understand yet why ignore page_referenced() result at anon page ;-)


- kosaki


---
 mm/vmscan.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.24-rc6-mm1-rvr/mm/vmscan.c
===
--- linux-2.6.24-rc6-mm1-rvr.orig/mm/vmscan.c   2008-01-11 13:59:12.0 
+0900
+++ linux-2.6.24-rc6-mm1-rvr/mm/vmscan.c2008-01-11 16:16:44.0 
+0900
@@ -1147,7 +1147,7 @@ static void shrink_active_list(unsigned
}

if (page_referenced(page, 0, sc->mem_cgroup)) {
-   if (file)
+   if (file || (total_swap_pages == 0))
/* Referenced file pages stay active. */
lru = LRU_ACTIVE_ANON;
else


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

> +static inline int is_file_lru(enum lru_list l)
> +{
> + BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
> + return (l/2 == 1);
> +}

below patch is a bit cleanup proposal.
i think LRU_FILE is more clarify than "/2".

What do you think it?



Index: linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h
===
--- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/mmzone.h2008-01-11 
11:10:30.0 +0900
+++ linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h 2008-01-11 
14:40:31.0 +0900
@@ -147,7 +147,7 @@
 static inline int is_file_lru(enum lru_list l)
 {
BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
-   return (l/2 == 1);
+   return !!(l & LRU_FILE);
 }

 struct per_cpu_pages {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

> -static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
> - struct zone *zone, int priority)
> +static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
> + struct zone *zone, int priority,
> + int active, int file)
>  {
>   return 0;
>  }

it can't compile if memcgroup turn off.

because current mem_cgroup_calc_reclaim type is below.

long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone *zone,
int priority, enum lru_list lru)

after patched below, it can compile.
I hope you don't think unpleasant by a trivial point out.

regard.

- kosaki


Index: linux-2.6.24-rc6-mm1-rvr/include/linux/memcontrol.h
===
--- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/memcontrol.h2008-01-11 
11:10:16.0 +0900
+++ linux-2.6.24-rc6-mm1-rvr/include/linux/memcontrol.h 2008-01-11 
12:08:29.0 +0900
@@ -168,9 +168,8 @@
 {
 }

-static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
-   struct zone *zone, int priority,
-   int active, int file)
+static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone 
*zone,
+   int priority, enum lru_list lru)
 {
return 0;
 }




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

 -static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
 - struct zone *zone, int priority)
 +static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
 + struct zone *zone, int priority,
 + int active, int file)
  {
   return 0;
  }

it can't compile if memcgroup turn off.

because current mem_cgroup_calc_reclaim type is below.

long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone *zone,
int priority, enum lru_list lru)

after patched below, it can compile.
I hope you don't think unpleasant by a trivial point out.

regard.

- kosaki


Index: linux-2.6.24-rc6-mm1-rvr/include/linux/memcontrol.h
===
--- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/memcontrol.h2008-01-11 
11:10:16.0 +0900
+++ linux-2.6.24-rc6-mm1-rvr/include/linux/memcontrol.h 2008-01-11 
12:08:29.0 +0900
@@ -168,9 +168,8 @@
 {
 }

-static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
-   struct zone *zone, int priority,
-   int active, int file)
+static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone 
*zone,
+   int priority, enum lru_list lru)
 {
return 0;
 }




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

 +static inline int is_file_lru(enum lru_list l)
 +{
 + BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
 + return (l/2 == 1);
 +}

below patch is a bit cleanup proposal.
i think LRU_FILE is more clarify than /2.

What do you think it?



Index: linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h
===
--- linux-2.6.24-rc6-mm1-rvr.orig/include/linux/mmzone.h2008-01-11 
11:10:30.0 +0900
+++ linux-2.6.24-rc6-mm1-rvr/include/linux/mmzone.h 2008-01-11 
14:40:31.0 +0900
@@ -147,7 +147,7 @@
 static inline int is_file_lru(enum lru_list l)
 {
BUILD_BUG_ON(LRU_INACTIVE_FILE != 2 || LRU_ACTIVE_FILE != 3);
-   return (l/2 == 1);
+   return !!(l  LRU_FILE);
 }

 struct per_cpu_pages {



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jan 2008 08:56:31 +0530
Balbir Singh <[EMAIL PROTECTED]> wrote:

> > > The control_type feature is gone. We still have cached page
> > > accounting, but we do not allow control of only RSS pages anymore. We
> > > need to control both RSS+cached pages. I do not understand your
> > > question about new plan? Is it about adding back control_type?
> > > 
> > Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
> > if page_file_cache() function and split-LRU is introduced.
> > 
> 
> Earlier we would have had a problem, since we even accounted for swap
> cache with PAGE_CGROUP_FLAG_CACHE and I think page_file_cache() does
> not account swap cache pages with page_file_cache(). Our accounting
> is based on mapped vs unmapped whereas the new code from Rik accounts
> file vs anonymous. I suspect we could live a little while longer
> with PAGE_CGROUP_FLAG_CACHE and then if we do not need it at all,
> we can mark it down for removal. What do you think?

Okay, I have no objection. 

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread Balbir Singh
* KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> [2008-01-10 11:36:18]:

> On Thu, 10 Jan 2008 07:51:33 +0530
> Balbir Singh <[EMAIL PROTECTED]> wrote:
> 
> > > >  #define PAGE_CGROUP_FLAG_CACHE (0x1)   /* charged as cache */
> > > >  #define PAGE_CGROUP_FLAG_ACTIVE (0x2)  /* page is active in this 
> > > > cgroup */
> > > > +#define PAGE_CGROUP_FLAG_FILE  (0x4)   /* page is file system backed */
> > > > 
> > > 
> > > Now, we don't have control_type and a feature for accounting only CACHE.
> > > Balbir-san, do you have some new plan ?
> > >
> > 
> > Hi, KAMEZAWA-San,
> > 
> > The control_type feature is gone. We still have cached page
> > accounting, but we do not allow control of only RSS pages anymore. We
> > need to control both RSS+cached pages. I do not understand your
> > question about new plan? Is it about adding back control_type?
> > 
> Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
> if page_file_cache() function and split-LRU is introduced.
> 

Earlier we would have had a problem, since we even accounted for swap
cache with PAGE_CGROUP_FLAG_CACHE and I think page_file_cache() does
not account swap cache pages with page_file_cache(). Our accounting
is based on mapped vs unmapped whereas the new code from Rik accounts
file vs anonymous. I suspect we could live a little while longer
with PAGE_CGROUP_FLAG_CACHE and then if we do not need it at all,
we can mark it down for removal. What do you think?


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread Rik van Riel
On Thu, 10 Jan 2008 11:28:49 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> Hmm, it seems..
> 
> When a program copies large amount of files, recent_rotated_file increases
> rapidly and 
> 
> rotate_sum
> --
> recent_rotated_anon
> 
> will be very big.
> 
> And %ap will be big regardless of vm_swappiness  if it's not 0.
> 
> I think # of recent_successful_pageout(anon/file) should be took into 
> account...
> 
> I'm sorry if I miss something.

You are right.  I wonder if this, again, is a case of myself or
Lee forward porting old code.  I remember having (had) a very
different version of get_scan_ratio() at some point in the past,
but I cannot remember if we discarded this version for that other
version, or the other way around :(

Lee, would you by any chance still have some alternative versions
of get_scan_ratio() around?  I'm searching through my systems, but
have not found it yet...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jan 2008 07:51:33 +0530
Balbir Singh <[EMAIL PROTECTED]> wrote:

> > >  #define PAGE_CGROUP_FLAG_CACHE   (0x1)   /* charged as cache */
> > >  #define PAGE_CGROUP_FLAG_ACTIVE (0x2)/* page is active in this 
> > > cgroup */
> > > +#define PAGE_CGROUP_FLAG_FILE(0x4)   /* page is file system backed */
> > > 
> > 
> > Now, we don't have control_type and a feature for accounting only CACHE.
> > Balbir-san, do you have some new plan ?
> >
> 
> Hi, KAMEZAWA-San,
> 
> The control_type feature is gone. We still have cached page
> accounting, but we do not allow control of only RSS pages anymore. We
> need to control both RSS+cached pages. I do not understand your
> question about new plan? Is it about adding back control_type?
> 
Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
if page_file_cache() function and split-LRU is introduced.


Thanks you.

-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Tue, 08 Jan 2008 15:59:44 -0500
Rik van Riel <[EMAIL PROTECTED]> wrote:

> + rotate_sum = zone->recent_rotated_file + zone->recent_rotated_anon;
> +
> + /* Keep a floating average of RECENT references. */
> + if (unlikely(rotate_sum > min(anon, file))) {
> + spin_lock_irq(>lru_lock);
> + zone->recent_rotated_file /= 2;
> + zone->recent_rotated_anon /= 2;
> + spin_unlock_irq(>lru_lock);
> + rotate_sum /= 2;
> + }
> +
> + /*
> +  * With swappiness at 100, anonymous and file have the same priority.
> +  * This scanning priority is essentially the inverse of IO cost.
> +  */
> + anon_prio = sc->swappiness;
> + file_prio = 200 - sc->swappiness;
> +
> + /*
> +  *  anon   recent_rotated_anon
> +  * %anon = 100 * --- / --- * IO cost
> +  *   anon + file   rotate_sum
> +  */
> + ap = (anon_prio * anon) / (anon + file + 1);
> + ap *= rotate_sum / (zone->recent_rotated_anon + 1);
> + if (ap == 0)
> + ap = 1;
> + else if (ap > 100)
> + ap = 100;
> + percent[0] = ap;
> +

Hmm, it seems..

When a program copies large amount of files, recent_rotated_file increases
rapidly and 

rotate_sum
--
recent_rotated_anon

will be very big.

And %ap will be big regardless of vm_swappiness  if it's not 0.

I think # of recent_successful_pageout(anon/file) should be took into account...

I'm sorry if I miss something.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-09 Thread Balbir Singh
* KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> [2008-01-09 13:41:32]:

> I like this patch set thank you.
> 
> On Tue, 08 Jan 2008 15:59:44 -0500
> Rik van Riel <[EMAIL PROTECTED]> wrote:
> > Index: linux-2.6.24-rc6-mm1/mm/memcontrol.c
> > ===
> > --- linux-2.6.24-rc6-mm1.orig/mm/memcontrol.c   2008-01-07 
> > 11:55:09.0 -0500
> > +++ linux-2.6.24-rc6-mm1/mm/memcontrol.c2008-01-07 17:32:53.0 
> > -0500
> 
> 
> > -enum mem_cgroup_zstat_index {
> > -   MEM_CGROUP_ZSTAT_ACTIVE,
> > -   MEM_CGROUP_ZSTAT_INACTIVE,
> > -
> > -   NR_MEM_CGROUP_ZSTAT,
> > -};
> > -
> >  struct mem_cgroup_per_zone {
> > /*
> >  * spin_lock to protect the per cgroup LRU
> >  */
> > spinlock_t  lru_lock;
> > -   struct list_headactive_list;
> > -   struct list_headinactive_list;
> > -   unsigned long count[NR_MEM_CGROUP_ZSTAT];
> > +   struct list_headlists[NR_LRU_LISTS];
> > +   unsigned long   count[NR_LRU_LISTS];
> >  };
> >  /* Macro for accessing counter */
> >  #define MEM_CGROUP_ZSTAT(mz, idx)  ((mz)->count[(idx)])
> > @@ -160,6 +152,7 @@ struct page_cgroup {
> >  };
> >  #define PAGE_CGROUP_FLAG_CACHE (0x1)   /* charged as cache */
> >  #define PAGE_CGROUP_FLAG_ACTIVE (0x2)  /* page is active in this 
> > cgroup */
> > +#define PAGE_CGROUP_FLAG_FILE  (0x4)   /* page is file system backed */
> > 
> 
> Now, we don't have control_type and a feature for accounting only CACHE.
> Balbir-san, do you have some new plan ?
>

Hi, KAMEZAWA-San,

The control_type feature is gone. We still have cached page
accounting, but we do not allow control of only RSS pages anymore. We
need to control both RSS+cached pages. I do not understand your
question about new plan? Is it about adding back control_type?

 
> BTW, is it better to use PageSwapBacked(pc->page) rather than adding a new 
> flag
> PAGE_CGROUP_FLAG_FILE ?
> 
> 
> PAGE_CGROUP_FLAG_ACTIVE is used because global reclaim can change
> ACTIVE/INACTIVE attribute without accessing memory cgroup.
> (Then, we cannot trust PageActive(pc->page))
> 

Yes, correct. A page active on the node's zone LRU need not be active
in the memory cgroup.

> ANON <-> FILE attribute can be changed dinamically (after added to LRU) ?
> 
> If no, using page_file_cache(pc->page) will be easy.
> 
> Thanks,
> -Kame
> 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread Balbir Singh
* KAMEZAWA Hiroyuki [EMAIL PROTECTED] [2008-01-09 13:41:32]:

 I like this patch set thank you.
 
 On Tue, 08 Jan 2008 15:59:44 -0500
 Rik van Riel [EMAIL PROTECTED] wrote:
  Index: linux-2.6.24-rc6-mm1/mm/memcontrol.c
  ===
  --- linux-2.6.24-rc6-mm1.orig/mm/memcontrol.c   2008-01-07 
  11:55:09.0 -0500
  +++ linux-2.6.24-rc6-mm1/mm/memcontrol.c2008-01-07 17:32:53.0 
  -0500
 snip
 
  -enum mem_cgroup_zstat_index {
  -   MEM_CGROUP_ZSTAT_ACTIVE,
  -   MEM_CGROUP_ZSTAT_INACTIVE,
  -
  -   NR_MEM_CGROUP_ZSTAT,
  -};
  -
   struct mem_cgroup_per_zone {
  /*
   * spin_lock to protect the per cgroup LRU
   */
  spinlock_t  lru_lock;
  -   struct list_headactive_list;
  -   struct list_headinactive_list;
  -   unsigned long count[NR_MEM_CGROUP_ZSTAT];
  +   struct list_headlists[NR_LRU_LISTS];
  +   unsigned long   count[NR_LRU_LISTS];
   };
   /* Macro for accessing counter */
   #define MEM_CGROUP_ZSTAT(mz, idx)  ((mz)-count[(idx)])
  @@ -160,6 +152,7 @@ struct page_cgroup {
   };
   #define PAGE_CGROUP_FLAG_CACHE (0x1)   /* charged as cache */
   #define PAGE_CGROUP_FLAG_ACTIVE (0x2)  /* page is active in this 
  cgroup */
  +#define PAGE_CGROUP_FLAG_FILE  (0x4)   /* page is file system backed */
  
 
 Now, we don't have control_type and a feature for accounting only CACHE.
 Balbir-san, do you have some new plan ?


Hi, KAMEZAWA-San,

The control_type feature is gone. We still have cached page
accounting, but we do not allow control of only RSS pages anymore. We
need to control both RSS+cached pages. I do not understand your
question about new plan? Is it about adding back control_type?

 
 BTW, is it better to use PageSwapBacked(pc-page) rather than adding a new 
 flag
 PAGE_CGROUP_FLAG_FILE ?
 
 
 PAGE_CGROUP_FLAG_ACTIVE is used because global reclaim can change
 ACTIVE/INACTIVE attribute without accessing memory cgroup.
 (Then, we cannot trust PageActive(pc-page))
 

Yes, correct. A page active on the node's zone LRU need not be active
in the memory cgroup.

 ANON - FILE attribute can be changed dinamically (after added to LRU) ?
 
 If no, using page_file_cache(pc-page) will be easy.
 
 Thanks,
 -Kame
 

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Tue, 08 Jan 2008 15:59:44 -0500
Rik van Riel [EMAIL PROTECTED] wrote:

 + rotate_sum = zone-recent_rotated_file + zone-recent_rotated_anon;
 +
 + /* Keep a floating average of RECENT references. */
 + if (unlikely(rotate_sum  min(anon, file))) {
 + spin_lock_irq(zone-lru_lock);
 + zone-recent_rotated_file /= 2;
 + zone-recent_rotated_anon /= 2;
 + spin_unlock_irq(zone-lru_lock);
 + rotate_sum /= 2;
 + }
 +
 + /*
 +  * With swappiness at 100, anonymous and file have the same priority.
 +  * This scanning priority is essentially the inverse of IO cost.
 +  */
 + anon_prio = sc-swappiness;
 + file_prio = 200 - sc-swappiness;
 +
 + /*
 +  *  anon   recent_rotated_anon
 +  * %anon = 100 * --- / --- * IO cost
 +  *   anon + file   rotate_sum
 +  */
 + ap = (anon_prio * anon) / (anon + file + 1);
 + ap *= rotate_sum / (zone-recent_rotated_anon + 1);
 + if (ap == 0)
 + ap = 1;
 + else if (ap  100)
 + ap = 100;
 + percent[0] = ap;
 +

Hmm, it seems..

When a program copies large amount of files, recent_rotated_file increases
rapidly and 

rotate_sum
--
recent_rotated_anon

will be very big.

And %ap will be big regardless of vm_swappiness  if it's not 0.

I think # of recent_successful_pageout(anon/file) should be took into account...

I'm sorry if I miss something.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jan 2008 07:51:33 +0530
Balbir Singh [EMAIL PROTECTED] wrote:

#define PAGE_CGROUP_FLAG_CACHE   (0x1)   /* charged as cache */
#define PAGE_CGROUP_FLAG_ACTIVE (0x2)/* page is active in this 
   cgroup */
   +#define PAGE_CGROUP_FLAG_FILE(0x4)   /* page is file system backed */
   
  
  Now, we don't have control_type and a feature for accounting only CACHE.
  Balbir-san, do you have some new plan ?
 
 
 Hi, KAMEZAWA-San,
 
 The control_type feature is gone. We still have cached page
 accounting, but we do not allow control of only RSS pages anymore. We
 need to control both RSS+cached pages. I do not understand your
 question about new plan? Is it about adding back control_type?
 
Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
if page_file_cache() function and split-LRU is introduced.


Thanks you.

-Kame

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread Rik van Riel
On Thu, 10 Jan 2008 11:28:49 +0900
KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 Hmm, it seems..
 
 When a program copies large amount of files, recent_rotated_file increases
 rapidly and 
 
 rotate_sum
 --
 recent_rotated_anon
 
 will be very big.
 
 And %ap will be big regardless of vm_swappiness  if it's not 0.
 
 I think # of recent_successful_pageout(anon/file) should be took into 
 account...
 
 I'm sorry if I miss something.

You are right.  I wonder if this, again, is a case of myself or
Lee forward porting old code.  I remember having (had) a very
different version of get_scan_ratio() at some point in the past,
but I cannot remember if we discarded this version for that other
version, or the other way around :(

Lee, would you by any chance still have some alternative versions
of get_scan_ratio() around?  I'm searching through my systems, but
have not found it yet...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread Balbir Singh
* KAMEZAWA Hiroyuki [EMAIL PROTECTED] [2008-01-10 11:36:18]:

 On Thu, 10 Jan 2008 07:51:33 +0530
 Balbir Singh [EMAIL PROTECTED] wrote:
 
 #define PAGE_CGROUP_FLAG_CACHE (0x1)   /* charged as cache */
 #define PAGE_CGROUP_FLAG_ACTIVE (0x2)  /* page is active in this 
cgroup */
+#define PAGE_CGROUP_FLAG_FILE  (0x4)   /* page is file system backed */

   
   Now, we don't have control_type and a feature for accounting only CACHE.
   Balbir-san, do you have some new plan ?
  
  
  Hi, KAMEZAWA-San,
  
  The control_type feature is gone. We still have cached page
  accounting, but we do not allow control of only RSS pages anymore. We
  need to control both RSS+cached pages. I do not understand your
  question about new plan? Is it about adding back control_type?
  
 Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
 if page_file_cache() function and split-LRU is introduced.
 

Earlier we would have had a problem, since we even accounted for swap
cache with PAGE_CGROUP_FLAG_CACHE and I think page_file_cache() does
not account swap cache pages with page_file_cache(). Our accounting
is based on mapped vs unmapped whereas the new code from Rik accounts
file vs anonymous. I suspect we could live a little while longer
with PAGE_CGROUP_FLAG_CACHE and then if we do not need it at all,
we can mark it down for removal. What do you think?


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-09 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jan 2008 08:56:31 +0530
Balbir Singh [EMAIL PROTECTED] wrote:

   The control_type feature is gone. We still have cached page
   accounting, but we do not allow control of only RSS pages anymore. We
   need to control both RSS+cached pages. I do not understand your
   question about new plan? Is it about adding back control_type?
   
  Ah, just wanted to confirm that we can drop PAGE_CGROUP_FLAG_CACHE
  if page_file_cache() function and split-LRU is introduced.
  
 
 Earlier we would have had a problem, since we even accounted for swap
 cache with PAGE_CGROUP_FLAG_CACHE and I think page_file_cache() does
 not account swap cache pages with page_file_cache(). Our accounting
 is based on mapped vs unmapped whereas the new code from Rik accounts
 file vs anonymous. I suspect we could live a little while longer
 with PAGE_CGROUP_FLAG_CACHE and then if we do not need it at all,
 we can mark it down for removal. What do you think?

Okay, I have no objection. 

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-08 Thread KAMEZAWA Hiroyuki
I like this patch set thank you.

On Tue, 08 Jan 2008 15:59:44 -0500
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Index: linux-2.6.24-rc6-mm1/mm/memcontrol.c
> ===
> --- linux-2.6.24-rc6-mm1.orig/mm/memcontrol.c 2008-01-07 11:55:09.0 
> -0500
> +++ linux-2.6.24-rc6-mm1/mm/memcontrol.c  2008-01-07 17:32:53.0 
> -0500


> -enum mem_cgroup_zstat_index {
> - MEM_CGROUP_ZSTAT_ACTIVE,
> - MEM_CGROUP_ZSTAT_INACTIVE,
> -
> - NR_MEM_CGROUP_ZSTAT,
> -};
> -
>  struct mem_cgroup_per_zone {
>   /*
>* spin_lock to protect the per cgroup LRU
>*/
>   spinlock_t  lru_lock;
> - struct list_headactive_list;
> - struct list_headinactive_list;
> - unsigned long count[NR_MEM_CGROUP_ZSTAT];
> + struct list_headlists[NR_LRU_LISTS];
> + unsigned long   count[NR_LRU_LISTS];
>  };
>  /* Macro for accessing counter */
>  #define MEM_CGROUP_ZSTAT(mz, idx)((mz)->count[(idx)])
> @@ -160,6 +152,7 @@ struct page_cgroup {
>  };
>  #define PAGE_CGROUP_FLAG_CACHE   (0x1)   /* charged as cache */
>  #define PAGE_CGROUP_FLAG_ACTIVE (0x2)/* page is active in this 
> cgroup */
> +#define PAGE_CGROUP_FLAG_FILE(0x4)   /* page is file system backed */
> 

Now, we don't have control_type and a feature for accounting only CACHE.
Balbir-san, do you have some new plan ?

BTW, is it better to use PageSwapBacked(pc->page) rather than adding a new flag
PAGE_CGROUP_FLAG_FILE ?


PAGE_CGROUP_FLAG_ACTIVE is used because global reclaim can change
ACTIVE/INACTIVE attribute without accessing memory cgroup.
(Then, we cannot trust PageActive(pc->page))

ANON <-> FILE attribute can be changed dinamically (after added to LRU) ?

If no, using page_file_cache(pc->page) will be easy.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-08 Thread Rik van Riel
On Tue, 8 Jan 2008 14:42:03 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Tue, 8 Jan 2008, Rik van Riel wrote:
> 
> > > Also would it be possible to create generic functions that can move pages 
> > > in pagevecs to an arbitrary lru list?
> > 
> > What would you use those functions for?
> 
> We keep on duplicating the pagevec lru operation functions in mm/swap.c. 
> Some generic stuff would reduce the code size.

Good idea.  Added to my TODO list :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-08 Thread Christoph Lameter
On Tue, 8 Jan 2008, Rik van Riel wrote:

> > Also would it be possible to create generic functions that can move pages 
> > in pagevecs to an arbitrary lru list?
> 
> What would you use those functions for?

We keep on duplicating the pagevec lru operation functions in mm/swap.c. 
Some generic stuff would reduce the code size.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-08 Thread Rik van Riel
On Tue, 8 Jan 2008 14:22:38 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> It may be good to coordinate this with Andrea Arcangeli's OOM fixes.

Probably.  With the split LRU lists (and the noreclaim LRUs), we can
simplify the OOM test a lot:

If free + file_active + file_inactive <= zone->pages_high and swap
space is full, the system is doomed.  No need for guesswork.

> Also would it be possible to create generic functions that can move pages 
> in pagevecs to an arbitrary lru list?

What would you use those functions for?

Or am I simply misunderstanding your idea?

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon & file sets

2008-01-08 Thread Christoph Lameter
It may be good to coordinate this with Andrea Arcangeli's OOM fixes.

Also would it be possible to create generic functions that can move pages 
in pagevecs to an arbitrary lru list?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-08 Thread Rik van Riel
On Tue, 8 Jan 2008 14:22:38 -0800 (PST)
Christoph Lameter [EMAIL PROTECTED] wrote:

 It may be good to coordinate this with Andrea Arcangeli's OOM fixes.

Probably.  With the split LRU lists (and the noreclaim LRUs), we can
simplify the OOM test a lot:

If free + file_active + file_inactive = zone-pages_high and swap
space is full, the system is doomed.  No need for guesswork.

 Also would it be possible to create generic functions that can move pages 
 in pagevecs to an arbitrary lru list?

What would you use those functions for?

Or am I simply misunderstanding your idea?

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-08 Thread Christoph Lameter
On Tue, 8 Jan 2008, Rik van Riel wrote:

  Also would it be possible to create generic functions that can move pages 
  in pagevecs to an arbitrary lru list?
 
 What would you use those functions for?

We keep on duplicating the pagevec lru operation functions in mm/swap.c. 
Some generic stuff would reduce the code size.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-08 Thread Christoph Lameter
It may be good to coordinate this with Andrea Arcangeli's OOM fixes.

Also would it be possible to create generic functions that can move pages 
in pagevecs to an arbitrary lru list?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-08 Thread Rik van Riel
On Tue, 8 Jan 2008 14:42:03 -0800 (PST)
Christoph Lameter [EMAIL PROTECTED] wrote:
 On Tue, 8 Jan 2008, Rik van Riel wrote:
 
   Also would it be possible to create generic functions that can move pages 
   in pagevecs to an arbitrary lru list?
  
  What would you use those functions for?
 
 We keep on duplicating the pagevec lru operation functions in mm/swap.c. 
 Some generic stuff would reduce the code size.

Good idea.  Added to my TODO list :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/19] split LRU lists into anon file sets

2008-01-08 Thread KAMEZAWA Hiroyuki
I like this patch set thank you.

On Tue, 08 Jan 2008 15:59:44 -0500
Rik van Riel [EMAIL PROTECTED] wrote:
 Index: linux-2.6.24-rc6-mm1/mm/memcontrol.c
 ===
 --- linux-2.6.24-rc6-mm1.orig/mm/memcontrol.c 2008-01-07 11:55:09.0 
 -0500
 +++ linux-2.6.24-rc6-mm1/mm/memcontrol.c  2008-01-07 17:32:53.0 
 -0500
snip

 -enum mem_cgroup_zstat_index {
 - MEM_CGROUP_ZSTAT_ACTIVE,
 - MEM_CGROUP_ZSTAT_INACTIVE,
 -
 - NR_MEM_CGROUP_ZSTAT,
 -};
 -
  struct mem_cgroup_per_zone {
   /*
* spin_lock to protect the per cgroup LRU
*/
   spinlock_t  lru_lock;
 - struct list_headactive_list;
 - struct list_headinactive_list;
 - unsigned long count[NR_MEM_CGROUP_ZSTAT];
 + struct list_headlists[NR_LRU_LISTS];
 + unsigned long   count[NR_LRU_LISTS];
  };
  /* Macro for accessing counter */
  #define MEM_CGROUP_ZSTAT(mz, idx)((mz)-count[(idx)])
 @@ -160,6 +152,7 @@ struct page_cgroup {
  };
  #define PAGE_CGROUP_FLAG_CACHE   (0x1)   /* charged as cache */
  #define PAGE_CGROUP_FLAG_ACTIVE (0x2)/* page is active in this 
 cgroup */
 +#define PAGE_CGROUP_FLAG_FILE(0x4)   /* page is file system backed */
 

Now, we don't have control_type and a feature for accounting only CACHE.
Balbir-san, do you have some new plan ?

BTW, is it better to use PageSwapBacked(pc-page) rather than adding a new flag
PAGE_CGROUP_FLAG_FILE ?


PAGE_CGROUP_FLAG_ACTIVE is used because global reclaim can change
ACTIVE/INACTIVE attribute without accessing memory cgroup.
(Then, we cannot trust PageActive(pc-page))

ANON - FILE attribute can be changed dinamically (after added to LRU) ?

If no, using page_file_cache(pc-page) will be easy.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/