Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andrew Morton wrote:

> On Mon,  9 Apr 2007 11:25:20 -0700 (PDT)
> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> 
> > -static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > +static inline void pgd_ctor(void *x)
> > +static inline void pgd_dtor(void *x)
> 
> Seems dumb to inline these - they're only ever called indirectly, aren't
> they?

Yes.. In most cases they are not called at all because NULL is passed. 
Then the function call can be removed by the compiler from the in line 
functions.

> This means (I think) that the compiler will need to generate an out-of-line
> copy of these within each compilation unit which passes the address of these
> functions into some other function.

The function is constant. Constant propagation will lead to the function 
being included in the inline function.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andrew Morton
On Mon,  9 Apr 2007 11:25:20 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> -static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> +static inline void pgd_ctor(void *x)
> +static inline void pgd_dtor(void *x)

Seems dumb to inline these - they're only ever called indirectly, aren't
they?

This means (I think) that the compiler will need to generate an out-of-line
copy of these within each compilation unit which passes the address of these
functions into some other function.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

> > Otherwise you will leak pages to the page allocator before the tlb flush 
> > occurred.
> 
> I don't get it sorry. Can you please explain in more detail?

On process teardown pages are freed via the tlb mechanism. That mechanism 
guarantees that TLBs for pages are flushed before they can be reused. We 
tie into that and put pages on quicklists. The quicklists are trimmed
after the TLB flush.

If a shrinker would indepedently free pages from the quicklists then this 
mechanism would no longer work and pages that still have a valid TLB for 
one process may be reused by other processes.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen
On Monday 09 April 2007 20:51:00 Christoph Lameter wrote:
> On Mon, 9 Apr 2007, Andi Kleen wrote:
> 
> > > It has to be done in sync with tlb flushing.
> > 
> > Why?
> 
> Otherwise you will leak pages to the page allocator before the tlb flush 
> occurred.

I don't get it sorry. Can you please explain in more detail?

-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

> > It has to be done in sync with tlb flushing.
> 
> Why?

Otherwise you will leak pages to the page allocator before the tlb flush 
occurred.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen

> 
> It has to be done in sync with tlb flushing.

Why?

> Doing that on memory pressure  
> would complicate things significantly. 

Again why? 

> Also idling means that the cache  
> grows cold.

Does it? Unless you worry about interrupts nothing in idle
is going to thrash caches.

-Andi
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

> On Monday 09 April 2007 20:25:20 Christoph Lameter wrote:
> 
> >  #endif /* _X86_64_PGALLOC_H */
> > Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c
> > ===
> > --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/process.c  2007-04-07 
> > 18:07:47.0 -0700
> > +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c   2007-04-07 
> > 18:09:30.0 -0700
> > @@ -207,6 +207,7 @@
> > if (__get_cpu_var(cpu_idle_state))
> > __get_cpu_var(cpu_idle_state) = 0;
> >  
> > +   check_pgt_cache();
> 
> Wouldn't it be better to do that on memory pressure only (register
> it as a shrinker)?

It has to be done in sync with tlb flushing. Doing that on memory pressure 
would complicate things significantly. Also idling means that the cache 
grows cold.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen
On Monday 09 April 2007 20:25:20 Christoph Lameter wrote:

>  #endif /* _X86_64_PGALLOC_H */
> Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c
> ===
> --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/process.c2007-04-07 
> 18:07:47.0 -0700
> +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c 2007-04-07 
> 18:09:30.0 -0700
> @@ -207,6 +207,7 @@
>   if (__get_cpu_var(cpu_idle_state))
>   __get_cpu_var(cpu_idle_state) = 0;
>  
> + check_pgt_cache();

Wouldn't it be better to do that on memory pressure only (register
it as a shrinker)?

>   rmb();
>   idle = pm_idle;
>   if (!idle)
> Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/smp.c
> ===
> --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/smp.c2007-04-07 
> 18:07:47.0 -0700
> +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/smp.c 2007-04-07 
> 18:09:30.0 -0700
> @@ -241,7 +241,7 @@
>   }
>   if (!cpus_empty(cpu_mask))
>   flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
> -
> + check_pgt_cache();

Why is that here?

>   preempt_enable();
>  }


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86_64/Kconfig  |4 ++
 arch/x86_64/kernel/process.c |1 
 arch/x86_64/kernel/smp.c |2 -
 arch/x86_64/mm/fault.c   |5 +-
 include/asm-x86_64/pgalloc.h |   76 +--
 include/asm-x86_64/pgtable.h |3 -
 mm/Kconfig   |5 ++
 7 files changed, 52 insertions(+), 44 deletions(-)

Index: linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc5-mm4.orig/arch/x86_64/Kconfig   2007-04-07 
18:09:17.0 -0700
+++ linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig2007-04-07 18:09:30.0 
-0700
@@ -56,6 +56,14 @@
bool
default y
 
+config QUICKLIST
+   bool
+   default y
+
+config NR_QUICK
+   int
+   default 2
+
 config ISA
bool
 
Index: linux-2.6.21-rc5-mm4/include/asm-x86_64/pgalloc.h
===
--- linux-2.6.21-rc5-mm4.orig/include/asm-x86_64/pgalloc.h  2007-04-07 
18:07:47.0 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-x86_64/pgalloc.h   2007-04-07 
18:47:03.0 -0700
@@ -4,6 +4,10 @@
 #include 
 #include 
 #include 
+#include 
+
+#define QUICK_PGD 0/* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,23 +24,23 @@
 static inline void pmd_free(pmd_t *pmd)
 {
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
+   quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
+   quicklist_free(QUICK_PT, NULL, pud);
 }
 
 static inline void pgd_list_add(pgd_t *pgd)
@@ -57,41 +61,57 @@
spin_unlock(_lock);
 }
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+static inline void pgd_ctor(void *x)
 {
unsigned boundary;
-   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-   if (!pgd)
-   return NULL;
-   pgd_list_add(pgd);
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
+
/*
 * Copy kernel pointers in from init.
-* Could keep a freelist or slab cache of those because the kernel
-* part never changes.
 */
boundary = pgd_index(__PAGE_OFFSET);
-   memset(pgd, 0, boundary * sizeof(pgd_t));
memcpy(pgd + boundary,
-  init_level4_pgt + boundary,
-  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   init_level4_pgt + boundary,
+   (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
+   spin_lock(_lock);
+   list_add(>lru, _list);
+   spin_unlock(_lock);
+}
+
+static inline void pgd_dtor(void *x)
+{
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
+
+spin_lock(_lock);
+   list_del(>lru);
+   spin_unlock(_lock);
+}
+
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+   pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+   GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
-   pgd_list_del(pgd);
-   free_page((unsigned long)pgd);
+   quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long 
address)
 {
-   return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long 
address)
 {
-   void *p = (void 

[QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 arch/x86_64/Kconfig  |4 ++
 arch/x86_64/kernel/process.c |1 
 arch/x86_64/kernel/smp.c |2 -
 arch/x86_64/mm/fault.c   |5 +-
 include/asm-x86_64/pgalloc.h |   76 +--
 include/asm-x86_64/pgtable.h |3 -
 mm/Kconfig   |5 ++
 7 files changed, 52 insertions(+), 44 deletions(-)

Index: linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc5-mm4.orig/arch/x86_64/Kconfig   2007-04-07 
18:09:17.0 -0700
+++ linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig2007-04-07 18:09:30.0 
-0700
@@ -56,6 +56,14 @@
bool
default y
 
+config QUICKLIST
+   bool
+   default y
+
+config NR_QUICK
+   int
+   default 2
+
 config ISA
bool
 
Index: linux-2.6.21-rc5-mm4/include/asm-x86_64/pgalloc.h
===
--- linux-2.6.21-rc5-mm4.orig/include/asm-x86_64/pgalloc.h  2007-04-07 
18:07:47.0 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-x86_64/pgalloc.h   2007-04-07 
18:47:03.0 -0700
@@ -4,6 +4,10 @@
 #include asm/pda.h
 #include linux/threads.h
 #include linux/mm.h
+#include linux/quicklist.h
+
+#define QUICK_PGD 0/* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,23 +24,23 @@
 static inline void pmd_free(pmd_t *pmd)
 {
BUG_ON((unsigned long)pmd  (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
+   quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
BUG_ON((unsigned long)pud  (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
+   quicklist_free(QUICK_PT, NULL, pud);
 }
 
 static inline void pgd_list_add(pgd_t *pgd)
@@ -57,41 +61,57 @@
spin_unlock(pgd_lock);
 }
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+static inline void pgd_ctor(void *x)
 {
unsigned boundary;
-   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-   if (!pgd)
-   return NULL;
-   pgd_list_add(pgd);
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
+
/*
 * Copy kernel pointers in from init.
-* Could keep a freelist or slab cache of those because the kernel
-* part never changes.
 */
boundary = pgd_index(__PAGE_OFFSET);
-   memset(pgd, 0, boundary * sizeof(pgd_t));
memcpy(pgd + boundary,
-  init_level4_pgt + boundary,
-  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   init_level4_pgt + boundary,
+   (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
+   spin_lock(pgd_lock);
+   list_add(page-lru, pgd_list);
+   spin_unlock(pgd_lock);
+}
+
+static inline void pgd_dtor(void *x)
+{
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
+
+spin_lock(pgd_lock);
+   list_del(page-lru);
+   spin_unlock(pgd_lock);
+}
+
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+   pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+   GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd  (PAGE_SIZE-1));
-   pgd_list_del(pgd);
-   free_page((unsigned long)pgd);
+   quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long 
address)
 {
-   return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long 

Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen
On Monday 09 April 2007 20:25:20 Christoph Lameter wrote:

  #endif /* _X86_64_PGALLOC_H */
 Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c
 ===
 --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/process.c2007-04-07 
 18:07:47.0 -0700
 +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c 2007-04-07 
 18:09:30.0 -0700
 @@ -207,6 +207,7 @@
   if (__get_cpu_var(cpu_idle_state))
   __get_cpu_var(cpu_idle_state) = 0;
  
 + check_pgt_cache();

Wouldn't it be better to do that on memory pressure only (register
it as a shrinker)?

   rmb();
   idle = pm_idle;
   if (!idle)
 Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/smp.c
 ===
 --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/smp.c2007-04-07 
 18:07:47.0 -0700
 +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/smp.c 2007-04-07 
 18:09:30.0 -0700
 @@ -241,7 +241,7 @@
   }
   if (!cpus_empty(cpu_mask))
   flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 -
 + check_pgt_cache();

Why is that here?

   preempt_enable();
  }


-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

 On Monday 09 April 2007 20:25:20 Christoph Lameter wrote:
 
   #endif /* _X86_64_PGALLOC_H */
  Index: linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c
  ===
  --- linux-2.6.21-rc5-mm4.orig/arch/x86_64/kernel/process.c  2007-04-07 
  18:07:47.0 -0700
  +++ linux-2.6.21-rc5-mm4/arch/x86_64/kernel/process.c   2007-04-07 
  18:09:30.0 -0700
  @@ -207,6 +207,7 @@
  if (__get_cpu_var(cpu_idle_state))
  __get_cpu_var(cpu_idle_state) = 0;
   
  +   check_pgt_cache();
 
 Wouldn't it be better to do that on memory pressure only (register
 it as a shrinker)?

It has to be done in sync with tlb flushing. Doing that on memory pressure 
would complicate things significantly. Also idling means that the cache 
grows cold.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen

 
 It has to be done in sync with tlb flushing.

Why?

 Doing that on memory pressure  
 would complicate things significantly. 

Again why? 

 Also idling means that the cache  
 grows cold.

Does it? Unless you worry about interrupts nothing in idle
is going to thrash caches.

-Andi
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

  It has to be done in sync with tlb flushing.
 
 Why?

Otherwise you will leak pages to the page allocator before the tlb flush 
occurred.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andi Kleen
On Monday 09 April 2007 20:51:00 Christoph Lameter wrote:
 On Mon, 9 Apr 2007, Andi Kleen wrote:
 
   It has to be done in sync with tlb flushing.
  
  Why?
 
 Otherwise you will leak pages to the page allocator before the tlb flush 
 occurred.

I don't get it sorry. Can you please explain in more detail?

-Andi


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andi Kleen wrote:

  Otherwise you will leak pages to the page allocator before the tlb flush 
  occurred.
 
 I don't get it sorry. Can you please explain in more detail?

On process teardown pages are freed via the tlb mechanism. That mechanism 
guarantees that TLBs for pages are flushed before they can be reused. We 
tie into that and put pages on quicklists. The quicklists are trimmed
after the TLB flush.

If a shrinker would indepedently free pages from the quicklists then this 
mechanism would no longer work and pages that still have a valid TLB for 
one process may be reused by other processes.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Andrew Morton
On Mon,  9 Apr 2007 11:25:20 -0700 (PDT)
Christoph Lameter [EMAIL PROTECTED] wrote:

 -static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 +static inline void pgd_ctor(void *x)
 +static inline void pgd_dtor(void *x)

Seems dumb to inline these - they're only ever called indirectly, aren't
they?

This means (I think) that the compiler will need to generate an out-of-line
copy of these within each compilation unit which passes the address of these
functions into some other function.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 3/4] Quicklist support for x86_64

2007-04-09 Thread Christoph Lameter
On Mon, 9 Apr 2007, Andrew Morton wrote:

 On Mon,  9 Apr 2007 11:25:20 -0700 (PDT)
 Christoph Lameter [EMAIL PROTECTED] wrote:
 
  -static inline pgd_t *pgd_alloc(struct mm_struct *mm)
  +static inline void pgd_ctor(void *x)
  +static inline void pgd_dtor(void *x)
 
 Seems dumb to inline these - they're only ever called indirectly, aren't
 they?

Yes.. In most cases they are not called at all because NULL is passed. 
Then the function call can be removed by the compiler from the in line 
functions.

 This means (I think) that the compiler will need to generate an out-of-line
 copy of these within each compilation unit which passes the address of these
 functions into some other function.

The function is constant. Constant propagation will lead to the function 
being included in the inline function.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[QUICKLIST 3/4] Quicklist support for x86_64

2007-03-13 Thread Christoph Lameter
Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86_64/Kconfig  |4 ++
 arch/x86_64/kernel/process.c |1 
 arch/x86_64/kernel/smp.c |2 -
 arch/x86_64/mm/fault.c   |5 +-
 include/asm-x86_64/pgalloc.h |   76 +--
 include/asm-x86_64/pgtable.h |3 -
 mm/Kconfig   |5 ++
 7 files changed, 52 insertions(+), 44 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig   2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig2007-03-12 22:53:28.0 
-0700
@@ -56,6 +56,10 @@ config ZONE_DMA
bool
default y
 
+config NR_QUICK
+   int
+   default 2
+
 config ISA
bool
 
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h  2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h   2007-03-12 
22:53:28.0 -0700
@@ -4,6 +4,10 @@
 #include 
 #include 
 #include 
+#include 
+
+#define QUICK_PGD 0/* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
+   quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
+   quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+   unsigned boundary;
+   pgd_t *pgd = x;
struct page *page = virt_to_page(pgd);
 
+   /*
+* Copy kernel pointers in from init.
+*/
+   boundary = pgd_index(__PAGE_OFFSET);
+   memcpy(pgd + boundary,
+   init_level4_pgt + boundary,
+   (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
spin_lock(_lock);
-   page->index = (pgoff_t)pgd_list;
-   if (pgd_list)
-   pgd_list->private = (unsigned long)>index;
-   pgd_list = page;
-   page->private = (unsigned long)_list;
+   list_add(>lru, _list);
spin_unlock(_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-   struct page *next, **pprev, *page = virt_to_page(pgd);
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
 
spin_lock(_lock);
-   next = (struct page *)page->index;
-   pprev = (struct page **)page->private;
-   *pprev = next;
-   if (next)
-   next->private = (unsigned long)pprev;
+   list_del(>lru);
spin_unlock(_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   unsigned boundary;
-   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-   if (!pgd)
-   return NULL;
-   pgd_list_add(pgd);
-   /*
-* Copy kernel pointers in from init.
-* Could keep a freelist or slab cache of those because the kernel
-* part never changes.
-*/
-   boundary = pgd_index(__PAGE_OFFSET);
-   memset(pgd, 0, boundary * sizeof(pgd_t));
-   memcpy(pgd + boundary,
-  init_level4_pgt + boundary,
-  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {

[QUICKLIST 3/4] Quicklist support for x86_64

2007-03-13 Thread Christoph Lameter
Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 arch/x86_64/Kconfig  |4 ++
 arch/x86_64/kernel/process.c |1 
 arch/x86_64/kernel/smp.c |2 -
 arch/x86_64/mm/fault.c   |5 +-
 include/asm-x86_64/pgalloc.h |   76 +--
 include/asm-x86_64/pgtable.h |3 -
 mm/Kconfig   |5 ++
 7 files changed, 52 insertions(+), 44 deletions(-)

Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig   2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig2007-03-12 22:53:28.0 
-0700
@@ -56,6 +56,10 @@ config ZONE_DMA
bool
default y
 
+config NR_QUICK
+   int
+   default 2
+
 config ISA
bool
 
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h  2007-03-12 
22:49:20.0 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h   2007-03-12 
22:53:28.0 -0700
@@ -4,6 +4,10 @@
 #include asm/pda.h
 #include linux/threads.h
 #include linux/mm.h
+#include linux/quicklist.h
+
+#define QUICK_PGD 0/* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
BUG_ON((unsigned long)pmd  (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
+   quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+   return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, 
NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
BUG_ON((unsigned long)pud  (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
+   quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+   unsigned boundary;
+   pgd_t *pgd = x;
struct page *page = virt_to_page(pgd);
 
+   /*
+* Copy kernel pointers in from init.
+*/
+   boundary = pgd_index(__PAGE_OFFSET);
+   memcpy(pgd + boundary,
+   init_level4_pgt + boundary,
+   (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
spin_lock(pgd_lock);
-   page-index = (pgoff_t)pgd_list;
-   if (pgd_list)
-   pgd_list-private = (unsigned long)page-index;
-   pgd_list = page;
-   page-private = (unsigned long)pgd_list;
+   list_add(page-lru, pgd_list);
spin_unlock(pgd_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-   struct page *next, **pprev, *page = virt_to_page(pgd);
+   pgd_t *pgd = x;
+   struct page *page = virt_to_page(pgd);
 
spin_lock(pgd_lock);
-   next = (struct page *)page-index;
-   pprev = (struct page **)page-private;
-   *pprev = next;
-   if (next)
-   next-private = (unsigned long)pprev;
+   list_del(page-lru);
spin_unlock(pgd_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   unsigned boundary;
-   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-   if (!pgd)
-   return NULL;
-   pgd_list_add(pgd);
-   /*
-* Copy kernel pointers in from init.
-* Could keep a freelist or slab cache of those because the kernel
-* part never changes.
-*/
-   boundary = pgd_index(__PAGE_OFFSET);
-   memset(pgd, 0, boundary * sizeof(pgd_t));
-   memcpy(pgd + boundary,
-  init_level4_pgt + boundary,
-  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+