Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-14 Thread Benjamin Herrenschmidt
On Thu, 2014-02-13 at 08:03 +1100, Benjamin Herrenschmidt wrote:

 It looks very different because the function that needs to be fixed
 changed a lot upstream in 3.13.

 .../...

Hi Greg !

You didn't say if that explanation was to your liking :-)

If it is, do you want Aneesh to re-submit the patch with such an
explanation in the changelog ?

Cheers,
Ben.

 In practice it's *not* very different in behaviour. It's just that
 on powerpc we need to unconditionally call withdraw and deposit when
 moving PTEs or it will crash, due to how we keep the transparent
 huge page in sync with the hash table.
 
 With the 3.13 code, due to lock breaking introduced by Kirill in
 3.13-rc's, there's already a generic case for doing that (if we dropped
 the lock). So we just changed the condition to essentially force the
 condition to true to always do it under control of an arch helper.
 
 The pre-3.13 code didn't do the withdraw and deposit at all in that
 function however, so in that case, the patch (this 3.12 one) basically
 just adds the calls to withdraw and deposit under control of an ifdef
 which is only enabled for powerpc64.
 
 So you are taking 0 risk with other architecture and as the powerpc
 maintainer I'm happy with the patch.
 
 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-12 Thread Greg KH
On Wed, Feb 12, 2014 at 08:22:02AM +0530, Aneesh Kumar K.V wrote:
 Greg KH gre...@linuxfoundation.org writes:
 
  On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  This patch fix the below crash
  
  NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
  LR [c00439ac] .hash_page+0x18c/0x5e0
  ...
  Call Trace:
  [c00736103c40] [1b00] 0x1b00(unreliable)
  [437908.479693] [c00736103d50] [c00439ac] 
  .hash_page+0x18c/0x5e0
  [437908.479699] [c00736103e30] [c000924c] 
  .do_hash_page+0x4c/0x58
  
  On ppc64 we use the pgtable for storing the hpte slot information and
  store address to the pgtable at a constant offset (PTRS_PER_PMD) from
  pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
  the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
  from new pmd.
  
  We also want to move the withdraw and deposit before the set_pmd so
  that, when page fault find the pmd as trans huge we can be sure that
  pgtable can be located at the offset.
  
  variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
  for 3.12 stable series
 
  This doesn't look like a variant, it looks totally different.  Why
  can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
  (and follow-on fix) for 3.12?
 
 Because the code in that function changed in 3.13. Kirill added split
 ptl locks for huge pte, and we decide whether to withdraw and
 deposit again based on the ptl locks in 3.13. In 3.12 we do that only
 for ppc64 using #ifdef

I have no idea what that means...

If you want this patch applied, please be specific as to what is going
on, why the code is _very_ different, and all of that.  Make it
_obvious_ as to what is happening, and why I would be a fool not to take
it in the stable tree.

As it is, the code in this patch looks so different that I'm just
assuming you got something wrong and are trying to really send me
something else, so I'll just ignore it.

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-12 Thread Aneesh Kumar K.V
Greg KH gre...@linuxfoundation.org writes:

 On Wed, Feb 12, 2014 at 08:22:02AM +0530, Aneesh Kumar K.V wrote:
 Greg KH gre...@linuxfoundation.org writes:
 
  On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  This patch fix the below crash
  
  NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
  LR [c00439ac] .hash_page+0x18c/0x5e0
  ...
  Call Trace:
  [c00736103c40] [1b00] 0x1b00(unreliable)
  [437908.479693] [c00736103d50] [c00439ac] 
  .hash_page+0x18c/0x5e0
  [437908.479699] [c00736103e30] [c000924c] 
  .do_hash_page+0x4c/0x58
  
  On ppc64 we use the pgtable for storing the hpte slot information and
  store address to the pgtable at a constant offset (PTRS_PER_PMD) from
  pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
  the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
  from new pmd.
  
  We also want to move the withdraw and deposit before the set_pmd so
  that, when page fault find the pmd as trans huge we can be sure that
  pgtable can be located at the offset.
  
  variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
  for 3.12 stable series
 
  This doesn't look like a variant, it looks totally different.  Why
  can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
  (and follow-on fix) for 3.12?
 
 Because the code in that function changed in 3.13. Kirill added split
 ptl locks for huge pte, and we decide whether to withdraw and
 deposit again based on the ptl locks in 3.13. In 3.12 we do that only
 for ppc64 using #ifdef

 I have no idea what that means...

 If you want this patch applied, please be specific as to what is going
 on, why the code is _very_ different, and all of that.  Make it
 _obvious_ as to what is happening, and why I would be a fool not to take
 it in the stable tree.

 As it is, the code in this patch looks so different that I'm just
 assuming you got something wrong and are trying to really send me
 something else, so I'll just ignore it.

3.13 we added split huge ptl lock which introduced separate lock at pmd
level for hugepage (bf929152e9f6c49b66fad4ebf08cc95b02ce48f5). This
required us 3592806cfa08b7cca968f793c33f8e9460bab395. ie, when we move
huge page, we need to withdraw and deposit PTE page if we are moving
them across different pmd page. We did that by checking spin lock
address in 3.13. ie, we have


 if (new_ptl != old_ptl) {
.
   pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
   pgtable_trans_huge_deposit(mm, new_pmd,pgtable);
...
}

ppc64 even without using split ptl had PTE page per pmd entry. The
details for that are explained in the commit message above. So when
we move huge page we need to withdraw and deposit PTE page always on
ppc64.

Now on 3.13 we added a new function which did


static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
spinlock_t *old_pmd_ptl)
{
   /*
* With split pmd lock we also need to move preallocated
* PTE page table if new_pmd is on different PMD page table.
*/
   return new_pmd_ptl != old_pmd_ptl;
}

for x86

and on ppc64 we did

static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
spinlock_t *old_pmd_ptl)
{
   /*
* Archs like ppc64 use pgtable to store per pmd
* specific information. So when we switch the pmd,
* we should also withdraw and deposit the pgtable
*/
   return true;
}

ie, on ppc64 we always did withdraw and deposit and on x86 we do that
only when spin lock address are different.

For 3.12, since we don't have split huge ptl locks yet, we did the below

+#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+   /*
+* Archs like ppc64 use pgtable to store per pmd
+* specific information. So when we switch the pmd,
+* we should also withdraw and deposit the pgtable
+*/
+   pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
+   pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
+#endif

CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW is only set for PPC64.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-12 Thread Benjamin Herrenschmidt
On Wed, 2014-02-12 at 06:23 -0800, Greg KH wrote:
 I have no idea what that means...
 
 If you want this patch applied, please be specific as to what is going
 on, why the code is _very_ different, and all of that.  Make it
 _obvious_ as to what is happening, and why I would be a fool not to
 take
 it in the stable tree.
 
 As it is, the code in this patch looks so different that I'm just
 assuming you got something wrong and are trying to really send me
 something else, so I'll just ignore it.

It looks very different because the function that needs to be fixed
changed a lot upstream in 3.13.

In practice it's *not* very different in behaviour. It's just that
on powerpc we need to unconditionally call withdraw and deposit when
moving PTEs or it will crash, due to how we keep the transparent
huge page in sync with the hash table.

With the 3.13 code, due to lock breaking introduced by Kirill in
3.13-rc's, there's already a generic case for doing that (if we dropped
the lock). So we just changed the condition to essentially force the
condition to true to always do it under control of an arch helper.

The pre-3.13 code didn't do the withdraw and deposit at all in that
function however, so in that case, the patch (this 3.12 one) basically
just adds the calls to withdraw and deposit under control of an ifdef
which is only enabled for powerpc64.

So you are taking 0 risk with other architecture and as the powerpc
maintainer I'm happy with the patch.

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-11 Thread Greg KH
On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch fix the below crash
 
 NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
 LR [c00439ac] .hash_page+0x18c/0x5e0
 ...
 Call Trace:
 [c00736103c40] [1b00] 0x1b00(unreliable)
 [437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
 [437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58
 
 On ppc64 we use the pgtable for storing the hpte slot information and
 store address to the pgtable at a constant offset (PTRS_PER_PMD) from
 pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
 the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
 from new pmd.
 
 We also want to move the withdraw and deposit before the set_pmd so
 that, when page fault find the pmd as trans huge we can be sure that
 pgtable can be located at the offset.
 
 variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
 for 3.12 stable series

This doesn't look like a variant, it looks totally different.  Why
can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
(and follow-on fix) for 3.12?

I _REALLY_ dislike patches that are totally different from Linus's tree
in stable trees, it has caused nothing but problems in the past.

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-11 Thread Benjamin Herrenschmidt
On Tue, 2014-02-11 at 09:31 -0800, Greg KH wrote:
 On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  This patch fix the below crash
  
  NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
  LR [c00439ac] .hash_page+0x18c/0x5e0
  ...
  Call Trace:
  [c00736103c40] [1b00] 0x1b00(unreliable)
  [437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
  [437908.479699] [c00736103e30] [c000924c] 
  .do_hash_page+0x4c/0x58
  
  On ppc64 we use the pgtable for storing the hpte slot information and
  store address to the pgtable at a constant offset (PTRS_PER_PMD) from
  pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
  the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
  from new pmd.
  
  We also want to move the withdraw and deposit before the set_pmd so
  that, when page fault find the pmd as trans huge we can be sure that
  pgtable can be located at the offset.
  
  variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
  for 3.12 stable series
 
 This doesn't look like a variant, it looks totally different.  Why
 can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
 (and follow-on fix) for 3.12?
 
 I _REALLY_ dislike patches that are totally different from Linus's tree
 in stable trees, it has caused nothing but problems in the past.

I don't think it applies... (I tried on an internal tree) but the
affected function changed in 3.13 in various ways. Aneesh, please
provide a more details explanation and whether we should backport those
other changes too or whether this is not necessary.

BTW. Aneesh, we need a 3.11.x one too

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-11 Thread Aneesh Kumar K.V
Greg KH gre...@linuxfoundation.org writes:

 On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch fix the below crash
 
 NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
 LR [c00439ac] .hash_page+0x18c/0x5e0
 ...
 Call Trace:
 [c00736103c40] [1b00] 0x1b00(unreliable)
 [437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
 [437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58
 
 On ppc64 we use the pgtable for storing the hpte slot information and
 store address to the pgtable at a constant offset (PTRS_PER_PMD) from
 pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
 the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
 from new pmd.
 
 We also want to move the withdraw and deposit before the set_pmd so
 that, when page fault find the pmd as trans huge we can be sure that
 pgtable can be located at the offset.
 
 variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
 for 3.12 stable series

 This doesn't look like a variant, it looks totally different.  Why
 can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
 (and follow-on fix) for 3.12?

Because the code in that function changed in 3.13. Kirill added split
ptl locks for huge pte, and we decide whether to withdraw and
deposit again based on the ptl locks in 3.13. In 3.12 we do that only
for ppc64 using #ifdef



 I _REALLY_ dislike patches that are totally different from Linus's tree
 in stable trees, it has caused nothing but problems in the past.


-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-11 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Tue, 2014-02-11 at 09:31 -0800, Greg KH wrote:
 On Fri, Feb 07, 2014 at 07:21:57PM +0530, Aneesh Kumar K.V wrote:
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  This patch fix the below crash
  
  NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
  LR [c00439ac] .hash_page+0x18c/0x5e0
  ...
  Call Trace:
  [c00736103c40] [1b00] 0x1b00(unreliable)
  [437908.479693] [c00736103d50] [c00439ac] 
  .hash_page+0x18c/0x5e0
  [437908.479699] [c00736103e30] [c000924c] 
  .do_hash_page+0x4c/0x58
  
  On ppc64 we use the pgtable for storing the hpte slot information and
  store address to the pgtable at a constant offset (PTRS_PER_PMD) from
  pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
  the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
  from new pmd.
  
  We also want to move the withdraw and deposit before the set_pmd so
  that, when page fault find the pmd as trans huge we can be sure that
  pgtable can be located at the offset.
  
  variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
  for 3.12 stable series
 
 This doesn't look like a variant, it looks totally different.  Why
 can't I just take the b3084f4db3aeb991c507ca774337c7e7893ed04f patch
 (and follow-on fix) for 3.12?
 
 I _REALLY_ dislike patches that are totally different from Linus's tree
 in stable trees, it has caused nothing but problems in the past.

 I don't think it applies... (I tried on an internal tree) but the
 affected function changed in 3.13 in various ways. Aneesh, please
 provide a more details explanation and whether we should backport those
 other changes too or whether this is not necessary

Yes the affected function added support for split ptl locks for huge
pte. I don't think that is a stable material.

.

 BTW. Aneesh, we need a 3.11.x one too


3.11.x it is already applied.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH V2] powerpc: thp: Fix crash on mremap

2014-02-07 Thread Aneesh Kumar K.V
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

This patch fix the below crash

NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
LR [c00439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c00736103c40] [1b00] 0x1b00(unreliable)
[437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.

variant of upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
for 3.12 stable series

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 arch/Kconfig   |  3 +++
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 mm/huge_memory.c   | 12 
 3 files changed, 16 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index af2cc6eabcc7..bca9e7a18bd2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -365,6 +365,9 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 config HAVE_ARCH_SOFT_DIRTY
bool
 
+config ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+   bool
+
 config HAVE_MOD_ARCH_SPECIFIC
bool
help
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 6704e2e20e6b..0225011231ea 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -71,6 +71,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select SYS_SUPPORTS_HUGETLBFS
+   select ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
 
 config PPC_BOOK3E_64
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 292a266e0d42..89b7a647f1cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1474,8 +1474,20 @@ int move_huge_pmd(struct vm_area_struct *vma, struct 
vm_area_struct *new_vma,
 
ret = __pmd_trans_huge_lock(old_pmd, vma);
if (ret == 1) {
+#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+   pgtable_t pgtable;
+#endif
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
+#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+   /*
+* Archs like ppc64 use pgtable to store per pmd
+* specific information. So when we switch the pmd,
+* we should also withdraw and deposit the pgtable
+*/
+   pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
+   pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
+#endif
set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
spin_unlock(mm-page_table_lock);
}
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH -V2] powerpc: thp: Fix crash on mremap

2014-01-02 Thread Aneesh Kumar K.V
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

This patch fix the below crash

NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
LR [c00439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c00736103c40] [1b00] 0x1b00(unreliable)
[437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
Changes from V1:
* limit the withraw/deposit to only ppc64

 arch/Kconfig   |  3 +++
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 include/linux/huge_mm.h|  6 ++
 mm/huge_memory.c   | 21 -
 4 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index f1cf895c040f..3759e70a649d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -371,6 +371,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
bool
 
+config ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+   bool
+
 config HAVE_ARCH_SOFT_DIRTY
bool
 
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index bca2465a9c34..5f83b4334e5f 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -71,6 +71,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select SYS_SUPPORTS_HUGETLBFS
+   select ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
 
 config PPC_BOOK3E_64
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 91672e2deec3..836242a738a5 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -230,4 +230,10 @@ static inline int do_huge_pmd_numa_page(struct mm_struct 
*mm, struct vm_area_str
 
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
+#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 1
+#else
+#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 0
+#endif
+
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf85f683..32006b51d102 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1505,19 +1505,22 @@ int move_huge_pmd(struct vm_area_struct *vma, struct 
vm_area_struct *new_vma,
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
-   set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
-   if (new_ptl != old_ptl) {
+   /*
+* Archs like ppc64 use pgtable to store per pmd
+* specific information. So when we switch the pmd,
+* we should also withdraw and deposit the pgtable
+*
+* With split pmd lock we also need to move preallocated
+* PTE page table if new_pmd is on different PMD page table.
+*/
+   if (new_ptl != old_ptl || ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW) {
pgtable_t pgtable;
-
-   /*
-* Move preallocated PTE page table if new_pmd is on
-* different PMD page table.
-*/
pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
-
-   spin_unlock(new_ptl);
}
+   set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+   if (new_ptl != old_ptl)
+   spin_unlock(new_ptl);
spin_unlock(old_ptl);
}
 out:
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH -V2] powerpc: thp: Fix crash on mremap

2014-01-02 Thread Kirill A. Shutemov
Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch fix the below crash
 
 NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
 LR [c00439ac] .hash_page+0x18c/0x5e0
 ...
 Call Trace:
 [c00736103c40] [1b00] 0x1b00(unreliable)
 [437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
 [437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58
 
 On ppc64 we use the pgtable for storing the hpte slot information and
 store address to the pgtable at a constant offset (PTRS_PER_PMD) from
 pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
 the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
 from new pmd.
 
 We also want to move the withdraw and deposit before the set_pmd so
 that, when page fault find the pmd as trans huge we can be sure that
 pgtable can be located at the offset.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 Changes from V1:
 * limit the withraw/deposit to only ppc64
 
  arch/Kconfig   |  3 +++
  arch/powerpc/platforms/Kconfig.cputype |  1 +
  include/linux/huge_mm.h|  6 ++
  mm/huge_memory.c   | 21 -
  4 files changed, 22 insertions(+), 9 deletions(-)
 
 diff --git a/arch/Kconfig b/arch/Kconfig
 index f1cf895c040f..3759e70a649d 100644
 --- a/arch/Kconfig
 +++ b/arch/Kconfig
 @@ -371,6 +371,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
  config HAVE_ARCH_TRANSPARENT_HUGEPAGE
   bool
  
 +config ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW

I don't like name of the option, but can't find any better... :(

 + bool
 +
  config HAVE_ARCH_SOFT_DIRTY
   bool
  
 diff --git a/arch/powerpc/platforms/Kconfig.cputype 
 b/arch/powerpc/platforms/Kconfig.cputype
 index bca2465a9c34..5f83b4334e5f 100644
 --- a/arch/powerpc/platforms/Kconfig.cputype
 +++ b/arch/powerpc/platforms/Kconfig.cputype
 @@ -71,6 +71,7 @@ config PPC_BOOK3S_64
   select PPC_FPU
   select PPC_HAVE_PMU_SUPPORT
   select SYS_SUPPORTS_HUGETLBFS
 + select ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
   select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
  
  config PPC_BOOK3E_64
 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
 index 91672e2deec3..836242a738a5 100644
 --- a/include/linux/huge_mm.h
 +++ b/include/linux/huge_mm.h
 @@ -230,4 +230,10 @@ static inline int do_huge_pmd_numa_page(struct mm_struct 
 *mm, struct vm_area_str
  
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
 +#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
 +#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 1
 +#else
 +#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 0
 +#endif
 +

Just use config option directly:

if (new_ptl != old_ptl ||
IS_ENABLED(CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW))
...


Otherwise, looks good:

Acked-by: Kirill A. Shutemov kirill.shute...@linux.intel.com

  #endif /* _LINUX_HUGE_MM_H */
 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index 7de1bf85f683..32006b51d102 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -1505,19 +1505,22 @@ int move_huge_pmd(struct vm_area_struct *vma, struct 
 vm_area_struct *new_vma,
   spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
   pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
   VM_BUG_ON(!pmd_none(*new_pmd));
 - set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
 - if (new_ptl != old_ptl) {
 + /*
 +  * Archs like ppc64 use pgtable to store per pmd
 +  * specific information. So when we switch the pmd,
 +  * we should also withdraw and deposit the pgtable
 +  *
 +  * With split pmd lock we also need to move preallocated
 +  * PTE page table if new_pmd is on different PMD page table.
 +  */
 + if (new_ptl != old_ptl || ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW) {
   pgtable_t pgtable;
 -
 - /*
 -  * Move preallocated PTE page table if new_pmd is on
 -  * different PMD page table.
 -  */
   pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
   pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
 -
 - spin_unlock(new_ptl);
   }
 + set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
 + if (new_ptl != old_ptl)
 + spin_unlock(new_ptl);
   spin_unlock(old_ptl);
   }
  out:
 -- 
 1.8.3.2

-- 
 Kirill A. Shutemov
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH -V2] powerpc: thp: Fix crash on mremap

2014-01-02 Thread Aneesh Kumar K.V
Kirill A. Shutemov kirill.shute...@linux.intel.com writes:

 Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch fix the below crash
 
 NIP [c004cee4] .__hash_page_thp+0x2a4/0x440
 LR [c00439ac] .hash_page+0x18c/0x5e0
 ...
 Call Trace:
 [c00736103c40] [1b00] 0x1b00(unreliable)
 [437908.479693] [c00736103d50] [c00439ac] .hash_page+0x18c/0x5e0
 [437908.479699] [c00736103e30] [c000924c] .do_hash_page+0x4c/0x58
 
 On ppc64 we use the pgtable for storing the hpte slot information and
 store address to the pgtable at a constant offset (PTRS_PER_PMD) from
 pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
 the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
 from new pmd.
 
 We also want to move the withdraw and deposit before the set_pmd so
 that, when page fault find the pmd as trans huge we can be sure that
 pgtable can be located at the offset.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 Changes from V1:
 * limit the withraw/deposit to only ppc64
 
  arch/Kconfig   |  3 +++
  arch/powerpc/platforms/Kconfig.cputype |  1 +
  include/linux/huge_mm.h|  6 ++
  mm/huge_memory.c   | 21 -
  4 files changed, 22 insertions(+), 9 deletions(-)
 
 diff --git a/arch/Kconfig b/arch/Kconfig
 index f1cf895c040f..3759e70a649d 100644
 --- a/arch/Kconfig
 +++ b/arch/Kconfig
 @@ -371,6 +371,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
  config HAVE_ARCH_TRANSPARENT_HUGEPAGE
  bool
  
 +config ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW

 I don't like name of the option, but can't find any better... :(

 +bool
 +
  config HAVE_ARCH_SOFT_DIRTY
  bool
  
 diff --git a/arch/powerpc/platforms/Kconfig.cputype 
 b/arch/powerpc/platforms/Kconfig.cputype
 index bca2465a9c34..5f83b4334e5f 100644
 --- a/arch/powerpc/platforms/Kconfig.cputype
 +++ b/arch/powerpc/platforms/Kconfig.cputype
 @@ -71,6 +71,7 @@ config PPC_BOOK3S_64
  select PPC_FPU
  select PPC_HAVE_PMU_SUPPORT
  select SYS_SUPPORTS_HUGETLBFS
 +select ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
  select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
  
  config PPC_BOOK3E_64
 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
 index 91672e2deec3..836242a738a5 100644
 --- a/include/linux/huge_mm.h
 +++ b/include/linux/huge_mm.h
 @@ -230,4 +230,10 @@ static inline int do_huge_pmd_numa_page(struct 
 mm_struct *mm, struct vm_area_str
  
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
 +#ifdef CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW
 +#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 1
 +#else
 +#define ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW 0
 +#endif
 +

 Just use config option directly:

   if (new_ptl != old_ptl ||
   IS_ENABLED(CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW))


I didn't like that. I found the earlier one easier for reading.
If you and others strongly feel about this, I can redo the patch. Please let me 
know


 ...


 Otherwise, looks good:

 Acked-by: Kirill A. Shutemov kirill.shute...@linux.intel.com

  #endif /* _LINUX_HUGE_MM_H */
 diff --git a/mm/huge_memory.c b/mm/huge_memory.c
 index 7de1bf85f683..32006b51d102 100644
 --- a/mm/huge_memory.c
 +++ b/mm/huge_memory.c
 @@ -1505,19 +1505,22 @@ int move_huge_pmd(struct vm_area_struct *vma, struct 
 vm_area_struct *new_vma,
  spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
  pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
  VM_BUG_ON(!pmd_none(*new_pmd));
 -set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
 -if (new_ptl != old_ptl) {
 +/*
 + * Archs like ppc64 use pgtable to store per pmd
 + * specific information. So when we switch the pmd,
 + * we should also withdraw and deposit the pgtable
 + *
 + * With split pmd lock we also need to move preallocated
 + * PTE page table if new_pmd is on different PMD page table.
 + */
 +if (new_ptl != old_ptl || ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW) {
  pgtable_t pgtable;
 -
 -/*
 - * Move preallocated PTE page table if new_pmd is on
 - * different PMD page table.
 - */
  pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
  pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
 -
 -spin_unlock(new_ptl);
  }
 +set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
 +if (new_ptl != old_ptl)
 +spin_unlock(new_ptl);
  spin_unlock(old_ptl);
  }
  out:
 -- 
 1.8.3.2

 -- 
  Kirill A. Shutemov

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org

Re: [PATCH -V2] powerpc: thp: Fix crash on mremap

2014-01-02 Thread Benjamin Herrenschmidt
On Thu, 2014-01-02 at 16:22 +0530, Aneesh Kumar K.V wrote:
  Just use config option directly:
 
if (new_ptl != old_ptl ||
IS_ENABLED(CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW))
 
 
 I didn't like that. I found the earlier one easier for reading.
 If you and others strongly feel about this, I can redo the patch.
 Please let me know

Yes, use IS_ENABLED, no need to have two indirections of #define's

Another option is to have

if (pmd_move_must_withdraw(new,old)) {
}

With in a generic header:

#ifndef pmd_move_must_withdraw
static inline bool pmd_move_must_withdraw(spinlock_t *new_ptl, ...)
{
return new_ptl != old_ptl;
}
#endif

And in powerpc:

static inline bool pmd_move_must_withdraw(spinlock_t *new_ptl, ...)
{
return true;
}
#define pmd_move_must_withdraw pmd_move_must_withdraw

Cheers,
Ben.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2] powerpc: thp: Fix crash on mremap

2014-01-02 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Thu, 2014-01-02 at 16:22 +0530, Aneesh Kumar K.V wrote:
  Just use config option directly:
 
if (new_ptl != old_ptl ||
IS_ENABLED(CONFIG_ARCH_THP_MOVE_PMD_ALWAYS_WITHDRAW))
 
 
 I didn't like that. I found the earlier one easier for reading.
 If you and others strongly feel about this, I can redo the patch.
 Please let me know

 Yes, use IS_ENABLED, no need to have two indirections of #define's

 Another option is to have

   if (pmd_move_must_withdraw(new,old)) {
   }

 With in a generic header:

 #ifndef pmd_move_must_withdraw
 static inline bool pmd_move_must_withdraw(spinlock_t *new_ptl, ...)
 {
   return new_ptl != old_ptl;
 }
 #endif

 And in powerpc:

 static inline bool pmd_move_must_withdraw(spinlock_t *new_ptl, ...)
 {
   return true;
 }
 #define pmd_move_must_withdraw pmd_move_must_withdraw

This is better i guess. It is also in-line with rest of transparent
hugepage functions. I will do this.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev