Re: [patch 10/19] No Reclaim LRU Infrastructure
Hi Lee-san > > > +config NORECLAIM > > > + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)" > > > + depends on EXPERIMENTAL && 64BIT > > > + help > > > + Supports tracking of non-reclaimable pages off the [in]active lists > > > + to avoid excessive reclaim overhead on large memory systems. Pages > > > + may be non-reclaimable because: they are locked into memory, they > > > + are anonymous pages for which no swap space exists, or they are anon > > > + pages that are expensive to unmap [long anon_vma "related vma" list.] > > > > Why do you select to default is NO ? > > I think this is really improvement and no one of 64bit user > > hope turn off without NORECLAIM developer :) > > This was my doing. I left the default == NO during > development/experimemental stage so that one would have to take explicit > action to enable this function. If the feature makes it into mainline > and we decide that the default should be 'yes', that will be an easy > change. Oh I see. I will help testing too for it merges to mainline early. thanks. - kosaki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/19] No Reclaim LRU Infrastructure
On Fri, 2008-01-11 at 13:36 +0900, KOSAKI Motohiro wrote: > Hi Rik > > > +config NORECLAIM > > + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)" > > + depends on EXPERIMENTAL && 64BIT > > + help > > + Supports tracking of non-reclaimable pages off the [in]active lists > > + to avoid excessive reclaim overhead on large memory systems. Pages > > + may be non-reclaimable because: they are locked into memory, they > > + are anonymous pages for which no swap space exists, or they are anon > > + pages that are expensive to unmap [long anon_vma "related vma" list.] > > Why do you select to default is NO ? > I think this is really improvement and no one of 64bit user > hope turn off without NORECLAIM developer :) > Hello, Kosaki-san: This was my doing. I left the default == NO during development/experimemental stage so that one would have to take explicit action to enable this function. If the feature makes it into mainline and we decide that the default should be 'yes', that will be an easy change. Thanks for looking at this, Lee Schermerhorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 10/19] No Reclaim LRU Infrastructure
Hi Rik > +config NORECLAIM > + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)" > + depends on EXPERIMENTAL && 64BIT > + help > + Supports tracking of non-reclaimable pages off the [in]active lists > + to avoid excessive reclaim overhead on large memory systems. Pages > + may be non-reclaimable because: they are locked into memory, they > + are anonymous pages for which no swap space exists, or they are anon > + pages that are expensive to unmap [long anon_vma "related vma" list.] Why do you select to default is NO ? I think this is really improvement and no one of 64bit user hope turn off without NORECLAIM developer :) - kosaki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/19] No Reclaim LRU Infrastructure
V1 -> V3: + rebase to 23-mm1 atop RvR's split LRU series + define NR_NORECLAIM and LRU_NORECLAIM to avoid errors when not configured. V1 -> V2: + handle review comments -- various typos and errors. + extract "putback_all_noreclaim_pages()" into a separate patch and rework as "scan_all_zones_noreclaim_pages(). Infrastructure to manage pages excluded from reclaim--i.e., hidden from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to maintain "nonreclaimable" pages on a separate per-zone LRU list, to "hide" them from vmscan. A separate noreclaim pagevec is provided for shrink_active_list() to move nonreclaimable pages to the noreclaim list without over burdening the zone lru_lock. Pages on the noreclaim list have both PG_noreclaim and PG_lru set. Thus, PG_noreclaim is analogous to and mutually exclusive with PG_active--it specifies which LRU list the page is on. The noreclaim infrastructure is enabled by a new mm Kconfig option [CONFIG_]NORECLAIM. A new function 'page_reclaimable(page, vma)' in vmscan.c tests whether or not a page is reclaimable. Subsequent patches will add the various !reclaimable tests. We'll want to keep these tests light-weight for use in shrink_active_list() and, possibly, the fault path. Notes: 1. for now, use bit 30 in page flags. This restricts the no reclaim infrastructure to 64-bit systems. [The mlock patch, later in this series, uses another of these 64-bit-system-only flags.] Rationale: 32-bit systems have no free page flags and are less likely to have the large amounts of memory that exhibit the problems this series attempts to solve. [I'm sure someone will disabuse me of this notion.] Thus, NORECLAIM currently depends on [CONFIG_]64BIT. 2. The pagevec to move pages to the noreclaim list results in another loop at the end of shrink_active_list(). If we ultimately adopt Rik van Riel's split lru approach, I think we'll need to find a way to factor all of these loops into some common code. 3. TODO: Memory Controllers maintain separate active and inactive lists. Need to consider whether they should also maintain a noreclaim list. Also, convert to use Christoph's array of indexed lru variables? See //TODO note in mm/memcontrol.c re: isolating non-reclaimable pages. 4. TODO: more factoring of lru list handling. But, I want to get this as close to functionally correct as possible before introducing those perturbations. Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> Index: linux-2.6.24-rc6-mm1/mm/Kconfig === --- linux-2.6.24-rc6-mm1.orig/mm/Kconfig2008-01-08 12:08:03.0 -0500 +++ linux-2.6.24-rc6-mm1/mm/Kconfig 2008-01-08 12:17:10.0 -0500 @@ -193,3 +193,13 @@ config NR_QUICK config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config NORECLAIM + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)" + depends on EXPERIMENTAL && 64BIT + help + Supports tracking of non-reclaimable pages off the [in]active lists + to avoid excessive reclaim overhead on large memory systems. Pages + may be non-reclaimable because: they are locked into memory, they + are anonymous pages for which no swap space exists, or they are anon + pages that are expensive to unmap [long anon_vma "related vma" list.] Index: linux-2.6.24-rc6-mm1/include/linux/page-flags.h === --- linux-2.6.24-rc6-mm1.orig/include/linux/page-flags.h2008-01-08 12:08:03.0 -0500 +++ linux-2.6.24-rc6-mm1/include/linux/page-flags.h 2008-01-08 12:17:10.0 -0500 @@ -94,6 +94,7 @@ /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ #define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ + /* PG_owner_priv_1 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ #define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ @@ -107,6 +108,8 @@ * 6332 0 */ #define PG_uncached31 /* Page has been mapped as uncached */ + +#define PG_noreclaim 30 /* Page is "non-reclaimable" */ #endif /* @@ -160,6 +163,7 @@ static inline void SetPageUptodate(struc #define SetPageActive(page)set_bit(PG_active, &(page)->flags) #define ClearPageActive(page) clear_bit(PG_active, &(page)->flags) #define __ClearPageActive(page)__clear_bit(PG_active, &(page)->flags) +#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags) #define PageSlab(page) test_bit(PG_slab, &(page)->flags) #define __SetPageSlab(page)__set_bit(PG_slab, &(page)->flags) @@ -261,6 +265,21 @@ static inline void __ClearP
[patch 10/19] No Reclaim LRU Infrastructure
V1 -> V3: + rebase to 23-mm1 atop RvR's split LRU series + define NR_NORECLAIM and LRU_NORECLAIM to avoid errors when not configured. V1 -> V2: + handle review comments -- various typos and errors. + extract "putback_all_noreclaim_pages()" into a separate patch and rework as "scan_all_zones_noreclaim_pages(). Infrastructure to manage pages excluded from reclaim--i.e., hidden from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to maintain "nonreclaimable" pages on a separate per-zone LRU list, to "hide" them from vmscan. A separate noreclaim pagevec is provided for shrink_active_list() to move nonreclaimable pages to the noreclaim list without over burdening the zone lru_lock. Pages on the noreclaim list have both PG_noreclaim and PG_lru set. Thus, PG_noreclaim is analogous to and mutually exclusive with PG_active--it specifies which LRU list the page is on. The noreclaim infrastructure is enabled by a new mm Kconfig option [CONFIG_]NORECLAIM. A new function 'page_reclaimable(page, vma)' in vmscan.c tests whether or not a page is reclaimable. Subsequent patches will add the various !reclaimable tests. We'll want to keep these tests light-weight for use in shrink_active_list() and, possibly, the fault path. Notes: 1. for now, use bit 30 in page flags. This restricts the no reclaim infrastructure to 64-bit systems. [The mlock patch, later in this series, uses another of these 64-bit-system-only flags.] Rationale: 32-bit systems have no free page flags and are less likely to have the large amounts of memory that exhibit the problems this series attempts to solve. [I'm sure someone will disabuse me of this notion.] Thus, NORECLAIM currently depends on [CONFIG_]64BIT. 2. The pagevec to move pages to the noreclaim list results in another loop at the end of shrink_active_list(). If we ultimately adopt Rik van Riel's split lru approach, I think we'll need to find a way to factor all of these loops into some common code. 3. TODO: Memory Controllers maintain separate active and inactive lists. Need to consider whether they should also maintain a noreclaim list. Also, convert to use Christoph's array of indexed lru variables? See //TODO note in mm/memcontrol.c re: isolating non-reclaimable pages. 4. TODO: more factoring of lru list handling. But, I want to get this as close to functionally correct as possible before introducing those perturbations. Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> Index: linux-2.6.24-rc6-mm1/mm/Kconfig === --- linux-2.6.24-rc6-mm1.orig/mm/Kconfig2008-01-02 16:00:39.0 -0500 +++ linux-2.6.24-rc6-mm1/mm/Kconfig 2008-01-02 16:00:54.0 -0500 @@ -193,3 +193,13 @@ config NR_QUICK config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config NORECLAIM + bool "Track non-reclaimable pages (EXPERIMENTAL; 64BIT only)" + depends on EXPERIMENTAL && 64BIT + help + Supports tracking of non-reclaimable pages off the [in]active lists + to avoid excessive reclaim overhead on large memory systems. Pages + may be non-reclaimable because: they are locked into memory, they + are anonymous pages for which no swap space exists, or they are anon + pages that are expensive to unmap [long anon_vma "related vma" list.] Index: linux-2.6.24-rc6-mm1/include/linux/page-flags.h === --- linux-2.6.24-rc6-mm1.orig/include/linux/page-flags.h2008-01-02 16:00:39.0 -0500 +++ linux-2.6.24-rc6-mm1/include/linux/page-flags.h 2008-01-02 16:00:54.0 -0500 @@ -94,6 +94,7 @@ /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ #define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ + /* PG_owner_priv_1 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ #define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ @@ -107,6 +108,8 @@ * 6332 0 */ #define PG_uncached31 /* Page has been mapped as uncached */ + +#define PG_noreclaim 30 /* Page is "non-reclaimable" */ #endif /* @@ -160,6 +163,7 @@ static inline void SetPageUptodate(struc #define SetPageActive(page)set_bit(PG_active, &(page)->flags) #define ClearPageActive(page) clear_bit(PG_active, &(page)->flags) #define __ClearPageActive(page)__clear_bit(PG_active, &(page)->flags) +#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags) #define PageSlab(page) test_bit(PG_slab, &(page)->flags) #define __SetPageSlab(page)__set_bit(PG_slab, &(page)->flags) @@ -261,6 +265,21 @@ static inline void __ClearP