Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-09 Thread Chris Metcalf
On 10/4/2012 2:23 PM, David Miller wrote:
> From: "Aneesh Kumar K.V" 
> Date: Thu, 04 Oct 2012 22:00:48 +0530
> 
>> David Miller  writes:
>>
>>> We've split up the PTE tables so that they take up half a page instead
>>> of a full page.  This is in order to facilitate transparent huge page
>>> support, which works much better if our PMDs cover 4MB instead of 8MB.
>>>
>>> What we do is have a one-behind cache for PTE table allocations in the
>>> mm struct.
>>>
>>> This logic triggers only on allocations.  For example, we don't try to
>>> keep track of free'd up page table blocks in the style that the s390
>>> port does.
>>
>> I am also implementing a similar change for powerpc. We have a 64K page
>> size, and want to make sure PMD cover 16MB, which is the huge page size
>> supported by the hardware. I was looking at using the s390 logic,
>> considering we have 16 PMDs mapping to same PTE page. Should we look at
>> generalizing the case so that other architectures can start using the
>> same code ?
> 
> I think until we have multiple cases we won't know what's common or not.
> 
> Each arch has different need.  I need to split the page into two pieces
> so my code is simpler, and juse uses page counting to manage alloc/free.
> 
> Whereas s390 uses an bitmask to manage page state, and also reclaims
> pgtable pages into a per-mm list on free.  I decided not to do that
> and to just let the page allocator do the work.
> 
> So I don't think it's appropriate to think about commonization at this
> time, as even the only two cases existing are very non-common :-)

I'll add arch/tile to the list of architectures that would benefit.
We currently allocate PTEs using the page allocator, but by default
we use 64K pages and 16M huge pages, so with 8-byte PTEs that's
just 2K for the page table, so we could fit 32 of them on a page
if we wished.  Instead, for the time being, we just waste the space.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-09 Thread Chris Metcalf
On 10/4/2012 2:23 PM, David Miller wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Date: Thu, 04 Oct 2012 22:00:48 +0530
 
 David Miller da...@davemloft.net writes:

 We've split up the PTE tables so that they take up half a page instead
 of a full page.  This is in order to facilitate transparent huge page
 support, which works much better if our PMDs cover 4MB instead of 8MB.

 What we do is have a one-behind cache for PTE table allocations in the
 mm struct.

 This logic triggers only on allocations.  For example, we don't try to
 keep track of free'd up page table blocks in the style that the s390
 port does.

 I am also implementing a similar change for powerpc. We have a 64K page
 size, and want to make sure PMD cover 16MB, which is the huge page size
 supported by the hardware. I was looking at using the s390 logic,
 considering we have 16 PMDs mapping to same PTE page. Should we look at
 generalizing the case so that other architectures can start using the
 same code ?
 
 I think until we have multiple cases we won't know what's common or not.
 
 Each arch has different need.  I need to split the page into two pieces
 so my code is simpler, and juse uses page counting to manage alloc/free.
 
 Whereas s390 uses an bitmask to manage page state, and also reclaims
 pgtable pages into a per-mm list on free.  I decided not to do that
 and to just let the page allocator do the work.
 
 So I don't think it's appropriate to think about commonization at this
 time, as even the only two cases existing are very non-common :-)

I'll add arch/tile to the list of architectures that would benefit.
We currently allocate PTEs using the page allocator, but by default
we use 64K pages and 16M huge pages, so with 8-byte PTEs that's
just 2K for the page table, so we could fit 32 of them on a page
if we wished.  Instead, for the time being, we just waste the space.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-04 Thread David Miller
From: "Aneesh Kumar K.V" 
Date: Thu, 04 Oct 2012 22:00:48 +0530

> David Miller  writes:
> 
>> We've split up the PTE tables so that they take up half a page instead
>> of a full page.  This is in order to facilitate transparent huge page
>> support, which works much better if our PMDs cover 4MB instead of 8MB.
>>
>> What we do is have a one-behind cache for PTE table allocations in the
>> mm struct.
>>
>> This logic triggers only on allocations.  For example, we don't try to
>> keep track of free'd up page table blocks in the style that the s390
>> port does.
> 
> I am also implementing a similar change for powerpc. We have a 64K page
> size, and want to make sure PMD cover 16MB, which is the huge page size
> supported by the hardware. I was looking at using the s390 logic,
> considering we have 16 PMDs mapping to same PTE page. Should we look at
> generalizing the case so that other architectures can start using the
> same code ?

I think until we have multiple cases we won't know what's common or not.

Each arch has different need.  I need to split the page into two pieces
so my code is simpler, and juse uses page counting to manage alloc/free.

Whereas s390 uses an bitmask to manage page state, and also reclaims
pgtable pages into a per-mm list on free.  I decided not to do that
and to just let the page allocator do the work.

So I don't think it's appropriate to think about commonization at this
time, as even the only two cases existing are very non-common :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-04 Thread Aneesh Kumar K.V
David Miller  writes:

> We've split up the PTE tables so that they take up half a page instead
> of a full page.  This is in order to facilitate transparent huge page
> support, which works much better if our PMDs cover 4MB instead of 8MB.
>
> What we do is have a one-behind cache for PTE table allocations in the
> mm struct.
>
> This logic triggers only on allocations.  For example, we don't try to
> keep track of free'd up page table blocks in the style that the s390
> port does.

I am also implementing a similar change for powerpc. We have a 64K page
size, and want to make sure PMD cover 16MB, which is the huge page size
supported by the hardware. I was looking at using the s390 logic,
considering we have 16 PMDs mapping to same PTE page. Should we look at
generalizing the case so that other architectures can start using the
same code ?

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-04 Thread Aneesh Kumar K.V
David Miller da...@davemloft.net writes:

 We've split up the PTE tables so that they take up half a page instead
 of a full page.  This is in order to facilitate transparent huge page
 support, which works much better if our PMDs cover 4MB instead of 8MB.

 What we do is have a one-behind cache for PTE table allocations in the
 mm struct.

 This logic triggers only on allocations.  For example, we don't try to
 keep track of free'd up page table blocks in the style that the s390
 port does.

I am also implementing a similar change for powerpc. We have a 64K page
size, and want to make sure PMD cover 16MB, which is the huge page size
supported by the hardware. I was looking at using the s390 logic,
considering we have 16 PMDs mapping to same PTE page. Should we look at
generalizing the case so that other architectures can start using the
same code ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-04 Thread David Miller
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Date: Thu, 04 Oct 2012 22:00:48 +0530

 David Miller da...@davemloft.net writes:
 
 We've split up the PTE tables so that they take up half a page instead
 of a full page.  This is in order to facilitate transparent huge page
 support, which works much better if our PMDs cover 4MB instead of 8MB.

 What we do is have a one-behind cache for PTE table allocations in the
 mm struct.

 This logic triggers only on allocations.  For example, we don't try to
 keep track of free'd up page table blocks in the style that the s390
 port does.
 
 I am also implementing a similar change for powerpc. We have a 64K page
 size, and want to make sure PMD cover 16MB, which is the huge page size
 supported by the hardware. I was looking at using the s390 logic,
 considering we have 16 PMDs mapping to same PTE page. Should we look at
 generalizing the case so that other architectures can start using the
 same code ?

I think until we have multiple cases we won't know what's common or not.

Each arch has different need.  I need to split the page into two pieces
so my code is simpler, and juse uses page counting to manage alloc/free.

Whereas s390 uses an bitmask to manage page state, and also reclaims
pgtable pages into a per-mm list on free.  I decided not to do that
and to just let the page allocator do the work.

So I don't think it's appropriate to think about commonization at this
time, as even the only two cases existing are very non-common :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-02 Thread David Miller

We've split up the PTE tables so that they take up half a page instead
of a full page.  This is in order to facilitate transparent huge page
support, which works much better if our PMDs cover 4MB instead of 8MB.

What we do is have a one-behind cache for PTE table allocations in the
mm struct.

This logic triggers only on allocations.  For example, we don't try to
keep track of free'd up page table blocks in the style that the s390
port does.

We never allocate from this cache from interrupts, so we can access it
safely using only preemption protection.

There were only two slightly annoying aspects to this change:

1) Changing pgtable_t to be a "pte_t *".  There's all of this special
   logic in the TLB free paths that needed adjustments, as did the
   PMD populate interfaces.

2) init_new_context() needs to zap the pointer, since the mm struct
   just gets copied from the parent on fork.

Signed-off-by: David S. Miller 
---
 arch/sparc/include/asm/mmu_64.h |1 +
 arch/sparc/include/asm/page_64.h|2 +-
 arch/sparc/include/asm/pgalloc_64.h |   54 ---
 arch/sparc/mm/init_64.c |  101 +++
 arch/sparc/mm/tsb.c |9 
 5 files changed, 123 insertions(+), 44 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_64.h b/arch/sparc/include/asm/mmu_64.h
index b4d685d..0acfb46 100644
--- a/arch/sparc/include/asm/mmu_64.h
+++ b/arch/sparc/include/asm/mmu_64.h
@@ -100,6 +100,7 @@ typedef struct {
spinlock_t  lock;
unsigned long   sparc64_ctx_val;
unsigned long   huge_pte_count;
+   struct page *pgtable_page;
struct tsb_config   tsb_block[MM_NUM_TSBS];
struct hv_tsb_descr tsb_descr[MM_NUM_TSBS];
 } mm_context_t;
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 08bb5f7..fc2b229 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -92,7 +92,7 @@ typedef unsigned long pgprot_t;
 
 #endif /* (STRICT_MM_TYPECHECKS) */
 
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 
 #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \
 (_AC(0x7000,UL)) : \
diff --git a/arch/sparc/include/asm/pgalloc_64.h 
b/arch/sparc/include/asm/pgalloc_64.h
index 40b2d7a..0ebca93 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -38,51 +38,20 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
kmem_cache_free(pgtable_cache, pmd);
 }
 
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
-{
-   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
-   unsigned long address)
-{
-   struct page *page;
-   pte_t *pte;
-
-   pte = pte_alloc_one_kernel(mm, address);
-   if (!pte)
-   return NULL;
-   page = virt_to_page(pte);
-   pgtable_page_ctor(page);
-   return page;
-}
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   free_page((unsigned long)pte);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
-}
+extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+  unsigned long address);
+extern pgtable_t pte_alloc_one(struct mm_struct *mm,
+  unsigned long address);
+extern void pte_free_kernel(struct mm_struct *mm, pte_t *pte);
+extern void pte_free(struct mm_struct *mm, pgtable_t ptepage);
 
 #define pmd_populate_kernel(MM, PMD, PTE)  pmd_set(PMD, PTE)
-#define pmd_populate(MM,PMD,PTE_PAGE)  \
-   pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_populate(MM, PMD, PTE) pmd_set(PMD, PTE)
+#define pmd_pgtable(PMD)   ((pte_t *)__pmd_page(PMD))
 
 #define check_pgt_cache()  do { } while (0)
 
-static inline void pgtable_free(void *table, bool is_page)
-{
-   if (is_page)
-   free_page((unsigned long)table);
-   else
-   kmem_cache_free(pgtable_cache, table);
-}
+extern void pgtable_free(void *table, bool is_page);
 
 #ifdef CONFIG_SMP
 
@@ -113,11 +82,10 @@ static inline void pgtable_free_tlb(struct mmu_gather 
*tlb, void *table, bool is
 }
 #endif /* !CONFIG_SMP */
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pte_t *pte,
  unsigned long address)
 {
-   pgtable_page_dtor(ptepage);
-   pgtable_free_tlb(tlb, page_address(ptepage), true);
+   pgtable_free_tlb(tlb, pte, true);
 }
 

[PATCH 3/8] sparc64: Eliminate PTE table memory wastage.

2012-10-02 Thread David Miller

We've split up the PTE tables so that they take up half a page instead
of a full page.  This is in order to facilitate transparent huge page
support, which works much better if our PMDs cover 4MB instead of 8MB.

What we do is have a one-behind cache for PTE table allocations in the
mm struct.

This logic triggers only on allocations.  For example, we don't try to
keep track of free'd up page table blocks in the style that the s390
port does.

We never allocate from this cache from interrupts, so we can access it
safely using only preemption protection.

There were only two slightly annoying aspects to this change:

1) Changing pgtable_t to be a pte_t *.  There's all of this special
   logic in the TLB free paths that needed adjustments, as did the
   PMD populate interfaces.

2) init_new_context() needs to zap the pointer, since the mm struct
   just gets copied from the parent on fork.

Signed-off-by: David S. Miller da...@davemloft.net
---
 arch/sparc/include/asm/mmu_64.h |1 +
 arch/sparc/include/asm/page_64.h|2 +-
 arch/sparc/include/asm/pgalloc_64.h |   54 ---
 arch/sparc/mm/init_64.c |  101 +++
 arch/sparc/mm/tsb.c |9 
 5 files changed, 123 insertions(+), 44 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_64.h b/arch/sparc/include/asm/mmu_64.h
index b4d685d..0acfb46 100644
--- a/arch/sparc/include/asm/mmu_64.h
+++ b/arch/sparc/include/asm/mmu_64.h
@@ -100,6 +100,7 @@ typedef struct {
spinlock_t  lock;
unsigned long   sparc64_ctx_val;
unsigned long   huge_pte_count;
+   struct page *pgtable_page;
struct tsb_config   tsb_block[MM_NUM_TSBS];
struct hv_tsb_descr tsb_descr[MM_NUM_TSBS];
 } mm_context_t;
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 08bb5f7..fc2b229 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -92,7 +92,7 @@ typedef unsigned long pgprot_t;
 
 #endif /* (STRICT_MM_TYPECHECKS) */
 
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 
 #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \
 (_AC(0x7000,UL)) : \
diff --git a/arch/sparc/include/asm/pgalloc_64.h 
b/arch/sparc/include/asm/pgalloc_64.h
index 40b2d7a..0ebca93 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -38,51 +38,20 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
kmem_cache_free(pgtable_cache, pmd);
 }
 
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
-{
-   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
-   unsigned long address)
-{
-   struct page *page;
-   pte_t *pte;
-
-   pte = pte_alloc_one_kernel(mm, address);
-   if (!pte)
-   return NULL;
-   page = virt_to_page(pte);
-   pgtable_page_ctor(page);
-   return page;
-}
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   free_page((unsigned long)pte);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
-}
+extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+  unsigned long address);
+extern pgtable_t pte_alloc_one(struct mm_struct *mm,
+  unsigned long address);
+extern void pte_free_kernel(struct mm_struct *mm, pte_t *pte);
+extern void pte_free(struct mm_struct *mm, pgtable_t ptepage);
 
 #define pmd_populate_kernel(MM, PMD, PTE)  pmd_set(PMD, PTE)
-#define pmd_populate(MM,PMD,PTE_PAGE)  \
-   pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_populate(MM, PMD, PTE) pmd_set(PMD, PTE)
+#define pmd_pgtable(PMD)   ((pte_t *)__pmd_page(PMD))
 
 #define check_pgt_cache()  do { } while (0)
 
-static inline void pgtable_free(void *table, bool is_page)
-{
-   if (is_page)
-   free_page((unsigned long)table);
-   else
-   kmem_cache_free(pgtable_cache, table);
-}
+extern void pgtable_free(void *table, bool is_page);
 
 #ifdef CONFIG_SMP
 
@@ -113,11 +82,10 @@ static inline void pgtable_free_tlb(struct mmu_gather 
*tlb, void *table, bool is
 }
 #endif /* !CONFIG_SMP */
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pte_t *pte,
  unsigned long address)
 {
-   pgtable_page_dtor(ptepage);
-   pgtable_free_tlb(tlb, page_address(ptepage), true);
+