Re: [patch 10/19] No Reclaim LRU Infrastructure

2008-01-14 Thread KOSAKI Motohiro
Hi Lee-san

> > > +config NORECLAIM
> > > + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)"
> > > + depends on EXPERIMENTAL && 64BIT
> > > + help
> > > +   Supports tracking of non-reclaimable pages off the [in]active lists
> > > +   to avoid excessive reclaim overhead on large memory systems.  Pages
> > > +   may be non-reclaimable because:  they are locked into memory, they
> > > +   are anonymous pages for which no swap space exists, or they are anon
> > > +   pages that are expensive to unmap [long anon_vma "related vma" list.]
> > 
> > Why do you select to default is NO ?
> > I think this is really improvement and no one of 64bit user
> > hope turn off without NORECLAIM developer :)
> 
> This was my doing.  I left the default == NO during
> development/experimemental stage so that one would have to take explicit
> action to enable this function.  If the feature makes it into mainline
> and we decide that the default should be 'yes', that will be an easy
> change.

Oh I see.
I will help testing too for it merges to mainline early. 

thanks.


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 10/19] No Reclaim LRU Infrastructure

2008-01-11 Thread Lee Schermerhorn
On Fri, 2008-01-11 at 13:36 +0900, KOSAKI Motohiro wrote:
> Hi Rik
> 
> > +config NORECLAIM
> > +   bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)"
> > +   depends on EXPERIMENTAL && 64BIT
> > +   help
> > + Supports tracking of non-reclaimable pages off the [in]active lists
> > + to avoid excessive reclaim overhead on large memory systems.  Pages
> > + may be non-reclaimable because:  they are locked into memory, they
> > + are anonymous pages for which no swap space exists, or they are anon
> > + pages that are expensive to unmap [long anon_vma "related vma" list.]
> 
> Why do you select to default is NO ?
> I think this is really improvement and no one of 64bit user
> hope turn off without NORECLAIM developer :)
> 

Hello, Kosaki-san:

This was my doing.  I left the default == NO during
development/experimemental stage so that one would have to take explicit
action to enable this function.  If the feature makes it into mainline
and we decide that the default should be 'yes', that will be an easy
change.

Thanks for looking at this,
Lee Schermerhorn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 10/19] No Reclaim LRU Infrastructure

2008-01-10 Thread KOSAKI Motohiro
Hi Rik

> +config NORECLAIM
> + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)"
> + depends on EXPERIMENTAL && 64BIT
> + help
> +   Supports tracking of non-reclaimable pages off the [in]active lists
> +   to avoid excessive reclaim overhead on large memory systems.  Pages
> +   may be non-reclaimable because:  they are locked into memory, they
> +   are anonymous pages for which no swap space exists, or they are anon
> +   pages that are expensive to unmap [long anon_vma "related vma" list.]

Why do you select to default is NO ?
I think this is really improvement and no one of 64bit user
hope turn off without NORECLAIM developer :)


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/19] No Reclaim LRU Infrastructure

2008-01-08 Thread Rik van Riel
V1 -> V3:
+ rebase to 23-mm1 atop RvR's split LRU series
+ define NR_NORECLAIM and LRU_NORECLAIM to avoid errors when not
  configured.

V1 -> V2:
+  handle review comments -- various typos and errors.
+  extract "putback_all_noreclaim_pages()" into a separate patch
   and rework as "scan_all_zones_noreclaim_pages().

Infrastructure to manage pages excluded from reclaim--i.e., hidden
from vmscan.  Based on a patch by Larry Woodman of Red Hat. Reworked
to maintain "nonreclaimable" pages on a separate per-zone LRU list,
to "hide" them from vmscan.  A separate noreclaim pagevec is provided
for shrink_active_list() to move nonreclaimable pages to the noreclaim
list without over burdening the zone lru_lock.

Pages on the noreclaim list have both PG_noreclaim and PG_lru set.
Thus, PG_noreclaim is analogous to and mutually exclusive with
PG_active--it specifies which LRU list the page is on.  

The noreclaim infrastructure is enabled by a new mm Kconfig option
[CONFIG_]NORECLAIM.

A new function 'page_reclaimable(page, vma)' in vmscan.c tests whether
or not a page is reclaimable.  Subsequent patches will add the various
!reclaimable tests.  We'll want to keep these tests light-weight for
use in shrink_active_list() and, possibly, the fault path.

Notes:

1.  for now, use bit 30 in page flags.  This restricts the no reclaim
infrastructure to 64-bit systems.  [The mlock patch, later in this
series, uses another of these 64-bit-system-only flags.]

Rationale:  32-bit systems have no free page flags and are less
likely to have the large amounts of memory that exhibit the problems
this series attempts to solve.  [I'm sure someone will disabuse me
of this notion.]

Thus, NORECLAIM currently depends on [CONFIG_]64BIT.

2.  The pagevec to move pages to the noreclaim list results in another
loop at the end of shrink_active_list().  If we ultimately adopt Rik
van Riel's split lru approach, I think we'll need to find a way to
factor all of these loops into some common code.

3.  TODO:  Memory Controllers maintain separate active and inactive lists.
Need to consider whether they should also maintain a noreclaim list.  
Also, convert to use Christoph's array of indexed lru variables?

See //TODO note in mm/memcontrol.c re:  isolating non-reclaimable
pages. 

4.  TODO:  more factoring of lru list handling.  But, I want to get this
as close to functionally correct as possible before introducing those
perturbations.

Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>

Index: linux-2.6.24-rc6-mm1/mm/Kconfig
===
--- linux-2.6.24-rc6-mm1.orig/mm/Kconfig2008-01-08 12:08:03.0 
-0500
+++ linux-2.6.24-rc6-mm1/mm/Kconfig 2008-01-08 12:17:10.0 -0500
@@ -193,3 +193,13 @@ config NR_QUICK
 config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config NORECLAIM
+   bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)"
+   depends on EXPERIMENTAL && 64BIT
+   help
+ Supports tracking of non-reclaimable pages off the [in]active lists
+ to avoid excessive reclaim overhead on large memory systems.  Pages
+ may be non-reclaimable because:  they are locked into memory, they
+ are anonymous pages for which no swap space exists, or they are anon
+ pages that are expensive to unmap [long anon_vma "related vma" list.]
Index: linux-2.6.24-rc6-mm1/include/linux/page-flags.h
===
--- linux-2.6.24-rc6-mm1.orig/include/linux/page-flags.h2008-01-08 
12:08:03.0 -0500
+++ linux-2.6.24-rc6-mm1/include/linux/page-flags.h 2008-01-08 
12:17:10.0 -0500
@@ -94,6 +94,7 @@
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
+
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
@@ -107,6 +108,8 @@
  * 6332  0
  */
 #define PG_uncached31  /* Page has been mapped as uncached */
+
+#define PG_noreclaim   30  /* Page is "non-reclaimable"  */
 #endif
 
 /*
@@ -160,6 +163,7 @@ static inline void SetPageUptodate(struc
 #define SetPageActive(page)set_bit(PG_active, &(page)->flags)
 #define ClearPageActive(page)  clear_bit(PG_active, &(page)->flags)
 #define __ClearPageActive(page)__clear_bit(PG_active, &(page)->flags)
+#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags)
 
 #define PageSlab(page) test_bit(PG_slab, &(page)->flags)
 #define __SetPageSlab(page)__set_bit(PG_slab, &(page)->flags)
@@ -261,6 +265,21 @@ static inline void __ClearP

[patch 10/19] No Reclaim LRU Infrastructure

2008-01-02 Thread linux-kernel
V1 -> V3:
+ rebase to 23-mm1 atop RvR's split LRU series
+ define NR_NORECLAIM and LRU_NORECLAIM to avoid errors when not
  configured.

V1 -> V2:
+  handle review comments -- various typos and errors.
+  extract "putback_all_noreclaim_pages()" into a separate patch
   and rework as "scan_all_zones_noreclaim_pages().

Infrastructure to manage pages excluded from reclaim--i.e., hidden
from vmscan.  Based on a patch by Larry Woodman of Red Hat. Reworked
to maintain "nonreclaimable" pages on a separate per-zone LRU list,
to "hide" them from vmscan.  A separate noreclaim pagevec is provided
for shrink_active_list() to move nonreclaimable pages to the noreclaim
list without over burdening the zone lru_lock.

Pages on the noreclaim list have both PG_noreclaim and PG_lru set.
Thus, PG_noreclaim is analogous to and mutually exclusive with
PG_active--it specifies which LRU list the page is on.  

The noreclaim infrastructure is enabled by a new mm Kconfig option
[CONFIG_]NORECLAIM.

A new function 'page_reclaimable(page, vma)' in vmscan.c tests whether
or not a page is reclaimable.  Subsequent patches will add the various
!reclaimable tests.  We'll want to keep these tests light-weight for
use in shrink_active_list() and, possibly, the fault path.

Notes:

1.  for now, use bit 30 in page flags.  This restricts the no reclaim
infrastructure to 64-bit systems.  [The mlock patch, later in this
series, uses another of these 64-bit-system-only flags.]

Rationale:  32-bit systems have no free page flags and are less
likely to have the large amounts of memory that exhibit the problems
this series attempts to solve.  [I'm sure someone will disabuse me
of this notion.]

Thus, NORECLAIM currently depends on [CONFIG_]64BIT.

2.  The pagevec to move pages to the noreclaim list results in another
loop at the end of shrink_active_list().  If we ultimately adopt Rik
van Riel's split lru approach, I think we'll need to find a way to
factor all of these loops into some common code.

3.  TODO:  Memory Controllers maintain separate active and inactive lists.
Need to consider whether they should also maintain a noreclaim list.  
Also, convert to use Christoph's array of indexed lru variables?

See //TODO note in mm/memcontrol.c re:  isolating non-reclaimable
pages. 

4.  TODO:  more factoring of lru list handling.  But, I want to get this
as close to functionally correct as possible before introducing those
perturbations.

Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>

Index: linux-2.6.24-rc6-mm1/mm/Kconfig
===
--- linux-2.6.24-rc6-mm1.orig/mm/Kconfig2008-01-02 16:00:39.0 
-0500
+++ linux-2.6.24-rc6-mm1/mm/Kconfig 2008-01-02 16:00:54.0 -0500
@@ -193,3 +193,13 @@ config NR_QUICK
 config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config NORECLAIM
+   bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)"
+   depends on EXPERIMENTAL && 64BIT
+   help
+ Supports tracking of non-reclaimable pages off the [in]active lists
+ to avoid excessive reclaim overhead on large memory systems.  Pages
+ may be non-reclaimable because:  they are locked into memory, they
+ are anonymous pages for which no swap space exists, or they are anon
+ pages that are expensive to unmap [long anon_vma "related vma" list.]
Index: linux-2.6.24-rc6-mm1/include/linux/page-flags.h
===
--- linux-2.6.24-rc6-mm1.orig/include/linux/page-flags.h2008-01-02 
16:00:39.0 -0500
+++ linux-2.6.24-rc6-mm1/include/linux/page-flags.h 2008-01-02 
16:00:54.0 -0500
@@ -94,6 +94,7 @@
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
+
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
@@ -107,6 +108,8 @@
  * 6332  0
  */
 #define PG_uncached31  /* Page has been mapped as uncached */
+
+#define PG_noreclaim   30  /* Page is "non-reclaimable"  */
 #endif
 
 /*
@@ -160,6 +163,7 @@ static inline void SetPageUptodate(struc
 #define SetPageActive(page)set_bit(PG_active, &(page)->flags)
 #define ClearPageActive(page)  clear_bit(PG_active, &(page)->flags)
 #define __ClearPageActive(page)__clear_bit(PG_active, &(page)->flags)
+#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags)
 
 #define PageSlab(page) test_bit(PG_slab, &(page)->flags)
 #define __SetPageSlab(page)__set_bit(PG_slab, &(page)->flags)
@@ -261,6 +265,21 @@ static inline void __ClearP