Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-06 Thread Anshuman Khandual



On 11/06/2018 06:05 AM, Will Deacon wrote:
> On Fri, Nov 02, 2018 at 11:45:00AM +0530, Anshuman Khandual wrote:
>> On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
>>> What we need to do during split is an invalidate of the huge TLB.
>>> There's no pmd_trans_splitting anymore, so we only clear the present
>>> bit in the PTE despite pmd_present still returns true (just like
>>> PROT_NONE, nothing new in this respect). pmd_present never meant the
>>
>> On arm64, the problem is that pmd_present() is tied with pte_present() which
>> checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
>> invalidation. pmd_present() returns false just after the first step of PMD
>> splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
>> same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
>> around like PAGE_PSE as in case of x86. I am working towards a solution.
> 
> Could we not just go via a PROT_NONE mapping during the split, instead of
> having to allocate a new software bit to treat these invalid ptes as
> present?

The problem might occur during page fault (i.e __handle_mm_fault). As discussed
previously on this thread any potential PTE sticky bit would be used for both
pmd_trans_huge() and pmd_present() wrappers to maintain existing semantics. At
present, PMD state analysis during page fault has conditional block like this.

if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(, orig_pmd);

Using PROT_NONE for pmd_trans_huge() might force PMD page fault to go through
NUMA fault handling all the time as both pmd_trans_huge() and pmd_protnone()
will return true in that situation.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-06 Thread Anshuman Khandual



On 11/06/2018 06:05 AM, Will Deacon wrote:
> On Fri, Nov 02, 2018 at 11:45:00AM +0530, Anshuman Khandual wrote:
>> On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
>>> What we need to do during split is an invalidate of the huge TLB.
>>> There's no pmd_trans_splitting anymore, so we only clear the present
>>> bit in the PTE despite pmd_present still returns true (just like
>>> PROT_NONE, nothing new in this respect). pmd_present never meant the
>>
>> On arm64, the problem is that pmd_present() is tied with pte_present() which
>> checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
>> invalidation. pmd_present() returns false just after the first step of PMD
>> splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
>> same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
>> around like PAGE_PSE as in case of x86. I am working towards a solution.
> 
> Could we not just go via a PROT_NONE mapping during the split, instead of
> having to allocate a new software bit to treat these invalid ptes as
> present?

The problem might occur during page fault (i.e __handle_mm_fault). As discussed
previously on this thread any potential PTE sticky bit would be used for both
pmd_trans_huge() and pmd_present() wrappers to maintain existing semantics. At
present, PMD state analysis during page fault has conditional block like this.

if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(, orig_pmd);

Using PROT_NONE for pmd_trans_huge() might force PMD page fault to go through
NUMA fault handling all the time as both pmd_trans_huge() and pmd_protnone()
will return true in that situation.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-05 Thread Will Deacon
On Fri, Nov 02, 2018 at 11:45:00AM +0530, Anshuman Khandual wrote:
> On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
> > What we need to do during split is an invalidate of the huge TLB.
> > There's no pmd_trans_splitting anymore, so we only clear the present
> > bit in the PTE despite pmd_present still returns true (just like
> > PROT_NONE, nothing new in this respect). pmd_present never meant the
> 
> On arm64, the problem is that pmd_present() is tied with pte_present() which
> checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
> invalidation. pmd_present() returns false just after the first step of PMD
> splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
> same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
> around like PAGE_PSE as in case of x86. I am working towards a solution.

Could we not just go via a PROT_NONE mapping during the split, instead of
having to allocate a new software bit to treat these invalid ptes as
present?

Will


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-05 Thread Will Deacon
On Fri, Nov 02, 2018 at 11:45:00AM +0530, Anshuman Khandual wrote:
> On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
> > What we need to do during split is an invalidate of the huge TLB.
> > There's no pmd_trans_splitting anymore, so we only clear the present
> > bit in the PTE despite pmd_present still returns true (just like
> > PROT_NONE, nothing new in this respect). pmd_present never meant the
> 
> On arm64, the problem is that pmd_present() is tied with pte_present() which
> checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
> invalidation. pmd_present() returns false just after the first step of PMD
> splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
> same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
> around like PAGE_PSE as in case of x86. I am working towards a solution.

Could we not just go via a PROT_NONE mapping during the split, instead of
having to allocate a new software bit to treat these invalid ptes as
present?

Will


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-02 Thread Anshuman Khandual



On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
> Hello Zi,
> 
> On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
>> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
>> true
>> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
>> returns false in the same situation?
Thank you Andrea for a such a detailed explanation. It really helped us in
understanding certain subtle details about pmd_present() & pmd_trans_huge().

> 
> !pmd_present means it's a migration entry or swap entry and doesn't
> point to RAM. It means if you do pmd_to_page(*pmd) it will return you
> an undefined result.

Sure but this needs to be made clear some where. Not sure whether its better
just by adding some in-code documentation or enforcing it in generic paths.

> 
> During splitting the physical page is still very well pointed by the
> pmd as long as pmd_trans_huge returns true and you hold the
> pmd_lock.

Agreed, it still does point to a huge page in RAM. So pmd_present() should
just return true in such cases as you have explained above.

> 
> pmd_trans_huge must be true at all times for a transhuge pmd that
> points to a hugepage, or all VM fast paths won't serialize with the

But as Naoya mentioned we should not check for pmd_trans_huge() on swap or
migration entries. If this makes sense, I will be happy to look into this
further and remove/replace pmd_trans_huge() check from affected code paths.

> pmd_lock, that is the only reason why, and it's a very good reason
> because it avoids to take the pmd_lock when walking over non transhuge
> pmds (i.e. when there are no THP allocated).
> 
> Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
> at all times, why would you want to make pmd_present return false? How
> could it help if pmd_trans_huge returns true, but pmd_present returns
> false despite pmd_to_page works fine and the pmd is really still
> pointing to the page?

Then what is the difference between pmd_trans_huge() and pmd_present()
if both should return true if the PMD points to a huge page in RAM and
pmd_page() also returns a valid huge page in RAM.

> 
> When userland faults on such pmd !pmd_present it will make the page
> fault take a swap or migration path, but that's the wrong path if the
> pmd points to RAM.
This is a real concern. __handle_mm_fault() does check for a swap entry
(which can only be a migration entry at the moment) and then wait on
till the migration is completed.

if (unlikely(is_swap_pmd(orig_pmd))) {
VM_BUG_ON(thp_migration_supported() &&
  !is_pmd_migration_entry(orig_pmd));
if (is_pmd_migration_entry(orig_pmd))
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}

> 
> What we need to do during split is an invalidate of the huge TL> There's no 
> pmd_trans_splitting anymore, so we only clear the present
> bit in the PTE despite pmd_present still returns true (just like
> PROT_NONE, nothing new in this respect). pmd_present never meant the

On arm64, the problem is that pmd_present() is tied with pte_present() which
checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
invalidation. pmd_present() returns false just after the first step of PMD
splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
around like PAGE_PSE as in case of x86. I am working towards a solution.

> real present bit in the pte was set, it just means the pmd points to
> RAM. It means it doesn't point to swap or migration entry and you can
> do pmd_to_page and it works fine
> We need to invalidate the TLB by clearing the present bit and by
> flushing the TLB before overwriting the transhuge pmd with the regular
> pte (i.e. to make it non huge). That is actually required by an errata
> (l1 cache aliasing of the same mapping through two different TLB of
> two different sizes broke some old CPU and triggered machine checks).
> It's not something fundamentally necessary from a common code point of

TLB entries mapping same VA -> PA space with different pages sizes might
not co-exist with each other which requires TLB invalidation. PMD split
phase initiating a TLB invalidation is not like getting around a CPU HW
problem but its just that SW should not assume behavior on behalf of the
architecture regarding which TLB entries can co-exist at any point.

> view. It's more risky from an hardware (not software) standpoint and
> before you can get rid of the pmd you need to do a TLB flush anyway to
> be sure CPUs stops using it, so better clear the present bit before
> doing the real costly thing (the tlb flush with IPIs). Clearing the
> present bit during the TLB flush is a cost that gets lost in the noise.

Doing TLB invalidation is not tied to whether present bit is 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-02 Thread Anshuman Khandual



On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
> Hello Zi,
> 
> On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
>> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
>> true
>> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
>> returns false in the same situation?
Thank you Andrea for a such a detailed explanation. It really helped us in
understanding certain subtle details about pmd_present() & pmd_trans_huge().

> 
> !pmd_present means it's a migration entry or swap entry and doesn't
> point to RAM. It means if you do pmd_to_page(*pmd) it will return you
> an undefined result.

Sure but this needs to be made clear some where. Not sure whether its better
just by adding some in-code documentation or enforcing it in generic paths.

> 
> During splitting the physical page is still very well pointed by the
> pmd as long as pmd_trans_huge returns true and you hold the
> pmd_lock.

Agreed, it still does point to a huge page in RAM. So pmd_present() should
just return true in such cases as you have explained above.

> 
> pmd_trans_huge must be true at all times for a transhuge pmd that
> points to a hugepage, or all VM fast paths won't serialize with the

But as Naoya mentioned we should not check for pmd_trans_huge() on swap or
migration entries. If this makes sense, I will be happy to look into this
further and remove/replace pmd_trans_huge() check from affected code paths.

> pmd_lock, that is the only reason why, and it's a very good reason
> because it avoids to take the pmd_lock when walking over non transhuge
> pmds (i.e. when there are no THP allocated).
> 
> Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
> at all times, why would you want to make pmd_present return false? How
> could it help if pmd_trans_huge returns true, but pmd_present returns
> false despite pmd_to_page works fine and the pmd is really still
> pointing to the page?

Then what is the difference between pmd_trans_huge() and pmd_present()
if both should return true if the PMD points to a huge page in RAM and
pmd_page() also returns a valid huge page in RAM.

> 
> When userland faults on such pmd !pmd_present it will make the page
> fault take a swap or migration path, but that's the wrong path if the
> pmd points to RAM.
This is a real concern. __handle_mm_fault() does check for a swap entry
(which can only be a migration entry at the moment) and then wait on
till the migration is completed.

if (unlikely(is_swap_pmd(orig_pmd))) {
VM_BUG_ON(thp_migration_supported() &&
  !is_pmd_migration_entry(orig_pmd));
if (is_pmd_migration_entry(orig_pmd))
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}

> 
> What we need to do during split is an invalidate of the huge TL> There's no 
> pmd_trans_splitting anymore, so we only clear the present
> bit in the PTE despite pmd_present still returns true (just like
> PROT_NONE, nothing new in this respect). pmd_present never meant the

On arm64, the problem is that pmd_present() is tied with pte_present() which
checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
invalidation. pmd_present() returns false just after the first step of PMD
splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
around like PAGE_PSE as in case of x86. I am working towards a solution.

> real present bit in the pte was set, it just means the pmd points to
> RAM. It means it doesn't point to swap or migration entry and you can
> do pmd_to_page and it works fine
> We need to invalidate the TLB by clearing the present bit and by
> flushing the TLB before overwriting the transhuge pmd with the regular
> pte (i.e. to make it non huge). That is actually required by an errata
> (l1 cache aliasing of the same mapping through two different TLB of
> two different sizes broke some old CPU and triggered machine checks).
> It's not something fundamentally necessary from a common code point of

TLB entries mapping same VA -> PA space with different pages sizes might
not co-exist with each other which requires TLB invalidation. PMD split
phase initiating a TLB invalidation is not like getting around a CPU HW
problem but its just that SW should not assume behavior on behalf of the
architecture regarding which TLB entries can co-exist at any point.

> view. It's more risky from an hardware (not software) standpoint and
> before you can get rid of the pmd you need to do a TLB flush anyway to
> be sure CPUs stops using it, so better clear the present bit before
> doing the real costly thing (the tlb flush with IPIs). Clearing the
> present bit during the TLB flush is a cost that gets lost in the noise.

Doing TLB invalidation is not tied to whether present bit is 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-01 Thread Anshuman Khandual
 

On 10/18/2018 07:47 AM, Naoya Horiguchi wrote:
> On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>
>>> On 10/15/2018 06:23 AM, Zi Yan wrote:
 On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
 for x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly 
> differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with 
> pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new 
> conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> mutually
> exclusive which makes the current conditional block work for both 
> mapped
> and migration entries. This is not same with arm64 where 
> pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>> _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is 
>> set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in 
>> the kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE 
>> bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> logical.
> Otherwise we would revert the condition block order to accommodate both 
> the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this 
> semantics
> as suggested by Will. If a PMD is trans huge page or not should not 
> depend on
> whether it is present or not.

 In terms of THPs, we have three cases: a present THP, a THP under 
 splitting,
 and a THP under migration. pmd_present() and pmd_trans_huge() both return 
 true
 for a present THP and a THP under splitting, because they discover 
 _PAGE_PSE bit
>>>
>>> Then how do we differentiate between a mapped THP and a splitting THP.
>>
>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>> THP
>> using helper functions.
>>
>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>> splitting THP
>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>> return
>> true as long as _PAGE_PSE bit is set.
>>
>>>
 is set for both cases, whereas they both return false for a THP under 
 migration.
 You want to change them to make pmd_trans_huge() returns true for a THP 
 under migration
 instead of false to help ARM64’s support for THP migration.
>>> I am just trying to understand the rationale behind this semantics and see 
>>> where
>>> it should be fixed.
>>>
>>> I think the fundamental problem here is that THP under split has been 
>>> difficult
>>> to be re-presented through the available helper functions and in turn PTE 
>>> bits.
>>>
>>> The following checks
>>>
>>> 1) pmd_present()
>>> 2) pmd_trans_huge()
>>>
>>> Represent three THP states
>>>
>>> 1) Mapped THP   (pmd_present && pmd_trans_huge)
>>> 2) Splitting THP(pmd_present && pmd_trans_huge)
>>> 3) Migrating THP(!pmd_present && !pmd_trans_huge)
>>>
>>> The problem is if we make pmd_trans_huge() return true for all the three 
>>> states
>>> which sounds logical because they are all still trans huge PMD, then 
>>> pmd_present()
>>> can only represent two states not three as required.
>>
>> We are on the same page about 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-01 Thread Anshuman Khandual
 

On 10/18/2018 07:47 AM, Naoya Horiguchi wrote:
> On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>
>>> On 10/15/2018 06:23 AM, Zi Yan wrote:
 On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
 for x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly 
> differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with 
> pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new 
> conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> mutually
> exclusive which makes the current conditional block work for both 
> mapped
> and migration entries. This is not same with arm64 where 
> pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>> _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is 
>> set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in 
>> the kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE 
>> bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> logical.
> Otherwise we would revert the condition block order to accommodate both 
> the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this 
> semantics
> as suggested by Will. If a PMD is trans huge page or not should not 
> depend on
> whether it is present or not.

 In terms of THPs, we have three cases: a present THP, a THP under 
 splitting,
 and a THP under migration. pmd_present() and pmd_trans_huge() both return 
 true
 for a present THP and a THP under splitting, because they discover 
 _PAGE_PSE bit
>>>
>>> Then how do we differentiate between a mapped THP and a splitting THP.
>>
>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>> THP
>> using helper functions.
>>
>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>> splitting THP
>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>> return
>> true as long as _PAGE_PSE bit is set.
>>
>>>
 is set for both cases, whereas they both return false for a THP under 
 migration.
 You want to change them to make pmd_trans_huge() returns true for a THP 
 under migration
 instead of false to help ARM64’s support for THP migration.
>>> I am just trying to understand the rationale behind this semantics and see 
>>> where
>>> it should be fixed.
>>>
>>> I think the fundamental problem here is that THP under split has been 
>>> difficult
>>> to be re-presented through the available helper functions and in turn PTE 
>>> bits.
>>>
>>> The following checks
>>>
>>> 1) pmd_present()
>>> 2) pmd_trans_huge()
>>>
>>> Represent three THP states
>>>
>>> 1) Mapped THP   (pmd_present && pmd_trans_huge)
>>> 2) Splitting THP(pmd_present && pmd_trans_huge)
>>> 3) Migrating THP(!pmd_present && !pmd_trans_huge)
>>>
>>> The problem is if we make pmd_trans_huge() return true for all the three 
>>> states
>>> which sounds logical because they are all still trans huge PMD, then 
>>> pmd_present()
>>> can only represent two states not three as required.
>>
>> We are on the same page about 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Anshuman Khandual



On 10/26/2018 12:15 AM, Zi Yan wrote:
> On 25 Oct 2018, at 4:10, Anshuman Khandual wrote:
> 
>> On 10/16/2018 08:01 PM, Zi Yan wrote:
>>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>>
 On 10/15/2018 06:23 AM, Zi Yan wrote:
> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>
>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>>
 On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
> for x86
> PMD migration entry check)
>
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>
>> A normal mapped THP page at PMD level should be correctly 
>> differentiated
>> from a PMD migration entry while walking the page table. A mapped 
>> THP would
>> additionally check positive for pmd_present() along with 
>> pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new 
>> conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic 
>> path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
>> mutually
>> exclusive which makes the current conditional block work for both 
>> mapped
>> and migration entries. This is not same with arm64 where 
>> pmd_trans_huge()
>
> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> splitting,

 Not really if we just look at code in the conditional blocks.
>>>
>>> Yeah, I explained it wrong above. Sorry about that.
>>>
>>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>>> _PAGE_PSE),
>>> thus, it returns true even if the present bit is cleared but PSE bit is 
>>> set.
>>
>> Okay.
>>
>>> This is done so, because THPs under splitting are regarded as present 
>>> in the kernel
>>> but not present when a hardware page table walker checks it.
>>
>> Okay.
>>
>>>
>>> For PMD migration entry, which should be regarded as not present, if 
>>> PSE bit
>>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>>> PMD migration entries will be regarded as present
>>
>> Okay to make pmd_present() return false pmd_trans_huge() has to return 
>> false
>> as well. Is there anything which can be done to get around this problem 
>> on
>> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
>> logical.
>> Otherwise we would revert the condition block order to accommodate both 
>> the
>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>> consider this patch forward.
>>
>> Because I am not really sure yet about the idea of getting pmd_present()
>> check into pmd_trans_huge() on arm64 just to make it fit into this 
>> semantics
>> as suggested by Will. If a PMD is trans huge page or not should not 
>> depend on
>> whether it is present or not.
>
> In terms of THPs, we have three cases: a present THP, a THP under 
> splitting,
> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
> true
> for a present THP and a THP under splitting, because they discover 
> _PAGE_PSE bit

 Then how do we differentiate between a mapped THP and a splitting THP.
>>>
>>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>>> THP
>>> using helper functions.
>>>
>>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>>> splitting THP
>>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>>> return
>>> true as long as _PAGE_PSE bit is set.
>>
>> I understand that. What I was wondering was since there is a need to 
>> differentiate
>> between a mapped THP and a splitting THP at various places in generic THP, 
>> we would
>> need to way to identify each of them unambiguously some how. Is that 
>> particular
>> assumption wrong ? Dont we need to differentiate between a mapped THP and 
>> THP under
>> splitting ?
> 
> According to Andrea's explanation here: 
> https://lore.kernel.org/patchwork/patch/997412/#1184298,
> we do not distinguish between a mapped THP and a splitting THP, because 
> pmd_to_page()
> can return valid pages for both cases.

I have gone through Andrea's explanation but still trying to understand all 
those
details in there. Will get back on this.

> 
>>>

> is set for both cases, whereas they both return false for a THP under 
> migration.
> You want to change them to make pmd_trans_huge() returns true for a THP 
> under migration
> instead of false to help ARM64’s 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Anshuman Khandual



On 10/26/2018 12:15 AM, Zi Yan wrote:
> On 25 Oct 2018, at 4:10, Anshuman Khandual wrote:
> 
>> On 10/16/2018 08:01 PM, Zi Yan wrote:
>>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>>
 On 10/15/2018 06:23 AM, Zi Yan wrote:
> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>
>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>>
 On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
> for x86
> PMD migration entry check)
>
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>
>> A normal mapped THP page at PMD level should be correctly 
>> differentiated
>> from a PMD migration entry while walking the page table. A mapped 
>> THP would
>> additionally check positive for pmd_present() along with 
>> pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new 
>> conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic 
>> path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
>> mutually
>> exclusive which makes the current conditional block work for both 
>> mapped
>> and migration entries. This is not same with arm64 where 
>> pmd_trans_huge()
>
> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> splitting,

 Not really if we just look at code in the conditional blocks.
>>>
>>> Yeah, I explained it wrong above. Sorry about that.
>>>
>>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>>> _PAGE_PSE),
>>> thus, it returns true even if the present bit is cleared but PSE bit is 
>>> set.
>>
>> Okay.
>>
>>> This is done so, because THPs under splitting are regarded as present 
>>> in the kernel
>>> but not present when a hardware page table walker checks it.
>>
>> Okay.
>>
>>>
>>> For PMD migration entry, which should be regarded as not present, if 
>>> PSE bit
>>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>>> PMD migration entries will be regarded as present
>>
>> Okay to make pmd_present() return false pmd_trans_huge() has to return 
>> false
>> as well. Is there anything which can be done to get around this problem 
>> on
>> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
>> logical.
>> Otherwise we would revert the condition block order to accommodate both 
>> the
>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>> consider this patch forward.
>>
>> Because I am not really sure yet about the idea of getting pmd_present()
>> check into pmd_trans_huge() on arm64 just to make it fit into this 
>> semantics
>> as suggested by Will. If a PMD is trans huge page or not should not 
>> depend on
>> whether it is present or not.
>
> In terms of THPs, we have three cases: a present THP, a THP under 
> splitting,
> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
> true
> for a present THP and a THP under splitting, because they discover 
> _PAGE_PSE bit

 Then how do we differentiate between a mapped THP and a splitting THP.
>>>
>>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>>> THP
>>> using helper functions.
>>>
>>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>>> splitting THP
>>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>>> return
>>> true as long as _PAGE_PSE bit is set.
>>
>> I understand that. What I was wondering was since there is a need to 
>> differentiate
>> between a mapped THP and a splitting THP at various places in generic THP, 
>> we would
>> need to way to identify each of them unambiguously some how. Is that 
>> particular
>> assumption wrong ? Dont we need to differentiate between a mapped THP and 
>> THP under
>> splitting ?
> 
> According to Andrea's explanation here: 
> https://lore.kernel.org/patchwork/patch/997412/#1184298,
> we do not distinguish between a mapped THP and a splitting THP, because 
> pmd_to_page()
> can return valid pages for both cases.

I have gone through Andrea's explanation but still trying to understand all 
those
details in there. Will get back on this.

> 
>>>

> is set for both cases, whereas they both return false for a THP under 
> migration.
> You want to change them to make pmd_trans_huge() returns true for a THP 
> under migration
> instead of false to help ARM64’s 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Zi Yan
On 25 Oct 2018, at 4:10, Anshuman Khandual wrote:

> On 10/16/2018 08:01 PM, Zi Yan wrote:
>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>
>>> On 10/15/2018 06:23 AM, Zi Yan wrote:
 On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
 for x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly 
> differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with 
> pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new 
> conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> mutually
> exclusive which makes the current conditional block work for both 
> mapped
> and migration entries. This is not same with arm64 where 
> pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>> _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is 
>> set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in 
>> the kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE 
>> bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> logical.
> Otherwise we would revert the condition block order to accommodate both 
> the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this 
> semantics
> as suggested by Will. If a PMD is trans huge page or not should not 
> depend on
> whether it is present or not.

 In terms of THPs, we have three cases: a present THP, a THP under 
 splitting,
 and a THP under migration. pmd_present() and pmd_trans_huge() both return 
 true
 for a present THP and a THP under splitting, because they discover 
 _PAGE_PSE bit
>>>
>>> Then how do we differentiate between a mapped THP and a splitting THP.
>>
>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>> THP
>> using helper functions.
>>
>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>> splitting THP
>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>> return
>> true as long as _PAGE_PSE bit is set.
>
> I understand that. What I was wondering was since there is a need to 
> differentiate
> between a mapped THP and a splitting THP at various places in generic THP, we 
> would
> need to way to identify each of them unambiguously some how. Is that 
> particular
> assumption wrong ? Dont we need to differentiate between a mapped THP and THP 
> under
> splitting ?

According to Andrea's explanation here: 
https://lore.kernel.org/patchwork/patch/997412/#1184298,
we do not distinguish between a mapped THP and a splitting THP, because 
pmd_to_page()
can return valid pages for both cases.

>>
>>>
 is set for both cases, whereas they both return false for a THP under 
 migration.
 You want to change them to make pmd_trans_huge() returns true for a THP 
 under migration
 instead of false to help ARM64’s support for THP migration.
>>> I am just trying to understand the rationale behind this semantics and see 
>>> where
>>> it should be fixed.
>>>
>>> I think the fundamental problem here is that THP under split has been 
>>> difficult
>>> to be re-presented through the available helper functions and in turn PTE 
>>> bits.

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Zi Yan
On 25 Oct 2018, at 4:10, Anshuman Khandual wrote:

> On 10/16/2018 08:01 PM, Zi Yan wrote:
>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>
>>> On 10/15/2018 06:23 AM, Zi Yan wrote:
 On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
 for x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly 
> differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with 
> pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new 
> conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> mutually
> exclusive which makes the current conditional block work for both 
> mapped
> and migration entries. This is not same with arm64 where 
> pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>> _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is 
>> set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in 
>> the kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE 
>> bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> logical.
> Otherwise we would revert the condition block order to accommodate both 
> the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this 
> semantics
> as suggested by Will. If a PMD is trans huge page or not should not 
> depend on
> whether it is present or not.

 In terms of THPs, we have three cases: a present THP, a THP under 
 splitting,
 and a THP under migration. pmd_present() and pmd_trans_huge() both return 
 true
 for a present THP and a THP under splitting, because they discover 
 _PAGE_PSE bit
>>>
>>> Then how do we differentiate between a mapped THP and a splitting THP.
>>
>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>> THP
>> using helper functions.
>>
>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>> splitting THP
>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>> return
>> true as long as _PAGE_PSE bit is set.
>
> I understand that. What I was wondering was since there is a need to 
> differentiate
> between a mapped THP and a splitting THP at various places in generic THP, we 
> would
> need to way to identify each of them unambiguously some how. Is that 
> particular
> assumption wrong ? Dont we need to differentiate between a mapped THP and THP 
> under
> splitting ?

According to Andrea's explanation here: 
https://lore.kernel.org/patchwork/patch/997412/#1184298,
we do not distinguish between a mapped THP and a splitting THP, because 
pmd_to_page()
can return valid pages for both cases.

>>
>>>
 is set for both cases, whereas they both return false for a THP under 
 migration.
 You want to change them to make pmd_trans_huge() returns true for a THP 
 under migration
 instead of false to help ARM64’s support for THP migration.
>>> I am just trying to understand the rationale behind this semantics and see 
>>> where
>>> it should be fixed.
>>>
>>> I think the fundamental problem here is that THP under split has been 
>>> difficult
>>> to be re-presented through the available helper functions and in turn PTE 
>>> bits.

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Anshuman Khandual



On 10/16/2018 08:01 PM, Zi Yan wrote:
> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
> 
>> On 10/15/2018 06:23 AM, Zi Yan wrote:
>>> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>>>
 On 10/10/2018 06:13 PM, Zi Yan wrote:
> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>
>> On 10/09/2018 07:28 PM, Zi Yan wrote:
>>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
>>> for x86
>>> PMD migration entry check)
>>>
>>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>>
 A normal mapped THP page at PMD level should be correctly 
 differentiated
 from a PMD migration entry while walking the page table. A mapped THP 
 would
 additionally check positive for pmd_present() along with 
 pmd_trans_huge()
 as compared to a PMD migration entry. This just adds a new conditional 
 test
 differentiating the two while walking the page table.

 Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
 Signed-off-by: Anshuman Khandual 
 ---
 On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
 mutually
 exclusive which makes the current conditional block work for both 
 mapped
 and migration entries. This is not same with arm64 where 
 pmd_trans_huge()
>>>
>>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>>> splitting,
>>
>> Not really if we just look at code in the conditional blocks.
>
> Yeah, I explained it wrong above. Sorry about that.
>
> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
> thus, it returns true even if the present bit is cleared but PSE bit is 
> set.

 Okay.

> This is done so, because THPs under splitting are regarded as present in 
> the kernel
> but not present when a hardware page table walker checks it.

 Okay.

>
> For PMD migration entry, which should be regarded as not present, if PSE 
> bit
> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
> PMD migration entries will be regarded as present

 Okay to make pmd_present() return false pmd_trans_huge() has to return 
 false
 as well. Is there anything which can be done to get around this problem on
 X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
 Otherwise we would revert the condition block order to accommodate both the
 implementation for pmd_trans_huge() as suggested by Kirill before or just
 consider this patch forward.

 Because I am not really sure yet about the idea of getting pmd_present()
 check into pmd_trans_huge() on arm64 just to make it fit into this 
 semantics
 as suggested by Will. If a PMD is trans huge page or not should not depend 
 on
 whether it is present or not.
>>>
>>> In terms of THPs, we have three cases: a present THP, a THP under splitting,
>>> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
>>> true
>>> for a present THP and a THP under splitting, because they discover 
>>> _PAGE_PSE bit
>>
>> Then how do we differentiate between a mapped THP and a splitting THP.
> 
> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
> THP
> using helper functions.
> 
> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
> THP
> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
> true as long as _PAGE_PSE bit is set.

I understand that. What I was wondering was since there is a need to 
differentiate
between a mapped THP and a splitting THP at various places in generic THP, we 
would
need to way to identify each of them unambiguously some how. Is that particular
assumption wrong ? Dont we need to differentiate between a mapped THP and THP 
under
splitting ?

> 
>>
>>> is set for both cases, whereas they both return false for a THP under 
>>> migration.
>>> You want to change them to make pmd_trans_huge() returns true for a THP 
>>> under migration
>>> instead of false to help ARM64’s support for THP migration.
>> I am just trying to understand the rationale behind this semantics and see 
>> where
>> it should be fixed.
>>
>> I think the fundamental problem here is that THP under split has been 
>> difficult
>> to be re-presented through the available helper functions and in turn PTE 
>> bits.
>>
>> The following checks
>>
>> 1) pmd_present()
>> 2) pmd_trans_huge()
>>
>> Represent three THP states
>>
>> 1) Mapped THP(pmd_present && pmd_trans_huge)
>> 2) Splitting THP (pmd_present && pmd_trans_huge)
>> 3) Migrating THP (!pmd_present && !pmd_trans_huge)
>>
>> The problem is if we make pmd_trans_huge() return true for all the three 
>> states
>> which sounds logical because they are all still 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-25 Thread Anshuman Khandual



On 10/16/2018 08:01 PM, Zi Yan wrote:
> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
> 
>> On 10/15/2018 06:23 AM, Zi Yan wrote:
>>> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>>>
 On 10/10/2018 06:13 PM, Zi Yan wrote:
> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>
>> On 10/09/2018 07:28 PM, Zi Yan wrote:
>>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
>>> for x86
>>> PMD migration entry check)
>>>
>>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>>
 A normal mapped THP page at PMD level should be correctly 
 differentiated
 from a PMD migration entry while walking the page table. A mapped THP 
 would
 additionally check positive for pmd_present() along with 
 pmd_trans_huge()
 as compared to a PMD migration entry. This just adds a new conditional 
 test
 differentiating the two while walking the page table.

 Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
 Signed-off-by: Anshuman Khandual 
 ---
 On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
 mutually
 exclusive which makes the current conditional block work for both 
 mapped
 and migration entries. This is not same with arm64 where 
 pmd_trans_huge()
>>>
>>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>>> splitting,
>>
>> Not really if we just look at code in the conditional blocks.
>
> Yeah, I explained it wrong above. Sorry about that.
>
> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
> thus, it returns true even if the present bit is cleared but PSE bit is 
> set.

 Okay.

> This is done so, because THPs under splitting are regarded as present in 
> the kernel
> but not present when a hardware page table walker checks it.

 Okay.

>
> For PMD migration entry, which should be regarded as not present, if PSE 
> bit
> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
> PMD migration entries will be regarded as present

 Okay to make pmd_present() return false pmd_trans_huge() has to return 
 false
 as well. Is there anything which can be done to get around this problem on
 X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
 Otherwise we would revert the condition block order to accommodate both the
 implementation for pmd_trans_huge() as suggested by Kirill before or just
 consider this patch forward.

 Because I am not really sure yet about the idea of getting pmd_present()
 check into pmd_trans_huge() on arm64 just to make it fit into this 
 semantics
 as suggested by Will. If a PMD is trans huge page or not should not depend 
 on
 whether it is present or not.
>>>
>>> In terms of THPs, we have three cases: a present THP, a THP under splitting,
>>> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
>>> true
>>> for a present THP and a THP under splitting, because they discover 
>>> _PAGE_PSE bit
>>
>> Then how do we differentiate between a mapped THP and a splitting THP.
> 
> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
> THP
> using helper functions.
> 
> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
> THP
> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
> true as long as _PAGE_PSE bit is set.

I understand that. What I was wondering was since there is a need to 
differentiate
between a mapped THP and a splitting THP at various places in generic THP, we 
would
need to way to identify each of them unambiguously some how. Is that particular
assumption wrong ? Dont we need to differentiate between a mapped THP and THP 
under
splitting ?

> 
>>
>>> is set for both cases, whereas they both return false for a THP under 
>>> migration.
>>> You want to change them to make pmd_trans_huge() returns true for a THP 
>>> under migration
>>> instead of false to help ARM64’s support for THP migration.
>> I am just trying to understand the rationale behind this semantics and see 
>> where
>> it should be fixed.
>>
>> I think the fundamental problem here is that THP under split has been 
>> difficult
>> to be re-presented through the available helper functions and in turn PTE 
>> bits.
>>
>> The following checks
>>
>> 1) pmd_present()
>> 2) pmd_trans_huge()
>>
>> Represent three THP states
>>
>> 1) Mapped THP(pmd_present && pmd_trans_huge)
>> 2) Splitting THP (pmd_present && pmd_trans_huge)
>> 3) Migrating THP (!pmd_present && !pmd_trans_huge)
>>
>> The problem is if we make pmd_trans_huge() return true for all the three 
>> states
>> which sounds logical because they are all still 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-22 Thread Zi Yan
Hi Andrea,

On 16 Oct 2018, at 22:09, Andrea Arcangeli wrote:

> Hello Zi,
>
> On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
>> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
>> true
>> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
>> returns false in the same situation?
>
> !pmd_present means it's a migration entry or swap entry and doesn't
> point to RAM. It means if you do pmd_to_page(*pmd) it will return you
> an undefined result.
>
> During splitting the physical page is still very well pointed by the
> pmd as long as pmd_trans_huge returns true and you hold the
> pmd_lock.
>
> pmd_trans_huge must be true at all times for a transhuge pmd that
> points to a hugepage, or all VM fast paths won't serialize with the
> pmd_lock, that is the only reason why, and it's a very good reason
> because it avoids to take the pmd_lock when walking over non transhuge
> pmds (i.e. when there are no THP allocated).
>
> Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
> at all times, why would you want to make pmd_present return false? How
> could it help if pmd_trans_huge returns true, but pmd_present returns
> false despite pmd_to_page works fine and the pmd is really still
> pointing to the page?
>
> When userland faults on such pmd !pmd_present it will make the page
> fault take a swap or migration path, but that's the wrong path if the
> pmd points to RAM.
>
> What we need to do during split is an invalidate of the huge TLB.
> There's no pmd_trans_splitting anymore, so we only clear the present
> bit in the PTE despite pmd_present still returns true (just like
> PROT_NONE, nothing new in this respect). pmd_present never meant the
> real present bit in the pte was set, it just means the pmd points to
> RAM. It means it doesn't point to swap or migration entry and you can
> do pmd_to_page and it works fine.
>
> We need to invalidate the TLB by clearing the present bit and by
> flushing the TLB before overwriting the transhuge pmd with the regular
> pte (i.e. to make it non huge). That is actually required by an errata
> (l1 cache aliasing of the same mapping through two different TLB of
> two different sizes broke some old CPU and triggered machine checks).
> It's not something fundamentally necessary from a common code point of
> view. It's more risky from an hardware (not software) standpoint and
> before you can get rid of the pmd you need to do a TLB flush anyway to
> be sure CPUs stops using it, so better clear the present bit before
> doing the real costly thing (the tlb flush with IPIs). Clearing the
> present bit during the TLB flush is a cost that gets lost in the noise.
>
> The clear of the real present bit during pmd (virtual) splitting is
> done with pmdp_invalidate, that is created specifically to keeps
> pmd_trans_huge=true, pmd_present=true despite the present bit is not
> set. So you could imagine _PAGE_PSE as the real present bit.
>
> Before the physical split was deferred and decoupled from the virtual
> memory pmd split, pmd_trans_splitting allowed to wait the split to
> finish and to keep all gup_fast at bay during it (while the page was
> still mapped readable and writable in userland by other CPUs). Now the
> physical split is deferred so you just split the pmd locally and only
> a physical split invoked on the page (not the virtual split invoked on
> the pmd with split_huge_pmd) has to keep gup at bay, and it does so by
> freezing the refcount so all gup_fast fail with the
> page_cache_get_speculative during the freeze. This removed the need of
> the pmd_splitting flag in gup_fast (when pmd_splitting was set gup
> fast had to go through the non-fast gup), but it means that now a
> hugepage cannot be physically splitted if it's gup pinned. The main
> difference is that freezing the refcount can fail, so the code must
> learn to cope with such failure and defer it. Decoupling the physical
> and virtual splits introduced the need of tracking the doublemap case
> with a new PG_double_map flag too. It makes the refcounting of
> hugepages trivial in comparison (identical to hugetlbfs in fact), but
> it requires total_mapcount to account for all those huge and non huge
> mappings. It primarily pays off to add THP to tmpfs where the physical
> split may have to be deferred for pagecache reasons anyway.

Thanks for your detailed explanation!

Do you think it is worth documenting what you have said? At least on
why we want pmd_present() and pmd_trans_huge() both return true when
a THP is under splitting, so that we can avoid some confusion in the future.
I can send a patch to add it to Document/vm/transhuge.rst.

--
Best Regards
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-22 Thread Zi Yan
Hi Andrea,

On 16 Oct 2018, at 22:09, Andrea Arcangeli wrote:

> Hello Zi,
>
> On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
>> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
>> true
>> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
>> returns false in the same situation?
>
> !pmd_present means it's a migration entry or swap entry and doesn't
> point to RAM. It means if you do pmd_to_page(*pmd) it will return you
> an undefined result.
>
> During splitting the physical page is still very well pointed by the
> pmd as long as pmd_trans_huge returns true and you hold the
> pmd_lock.
>
> pmd_trans_huge must be true at all times for a transhuge pmd that
> points to a hugepage, or all VM fast paths won't serialize with the
> pmd_lock, that is the only reason why, and it's a very good reason
> because it avoids to take the pmd_lock when walking over non transhuge
> pmds (i.e. when there are no THP allocated).
>
> Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
> at all times, why would you want to make pmd_present return false? How
> could it help if pmd_trans_huge returns true, but pmd_present returns
> false despite pmd_to_page works fine and the pmd is really still
> pointing to the page?
>
> When userland faults on such pmd !pmd_present it will make the page
> fault take a swap or migration path, but that's the wrong path if the
> pmd points to RAM.
>
> What we need to do during split is an invalidate of the huge TLB.
> There's no pmd_trans_splitting anymore, so we only clear the present
> bit in the PTE despite pmd_present still returns true (just like
> PROT_NONE, nothing new in this respect). pmd_present never meant the
> real present bit in the pte was set, it just means the pmd points to
> RAM. It means it doesn't point to swap or migration entry and you can
> do pmd_to_page and it works fine.
>
> We need to invalidate the TLB by clearing the present bit and by
> flushing the TLB before overwriting the transhuge pmd with the regular
> pte (i.e. to make it non huge). That is actually required by an errata
> (l1 cache aliasing of the same mapping through two different TLB of
> two different sizes broke some old CPU and triggered machine checks).
> It's not something fundamentally necessary from a common code point of
> view. It's more risky from an hardware (not software) standpoint and
> before you can get rid of the pmd you need to do a TLB flush anyway to
> be sure CPUs stops using it, so better clear the present bit before
> doing the real costly thing (the tlb flush with IPIs). Clearing the
> present bit during the TLB flush is a cost that gets lost in the noise.
>
> The clear of the real present bit during pmd (virtual) splitting is
> done with pmdp_invalidate, that is created specifically to keeps
> pmd_trans_huge=true, pmd_present=true despite the present bit is not
> set. So you could imagine _PAGE_PSE as the real present bit.
>
> Before the physical split was deferred and decoupled from the virtual
> memory pmd split, pmd_trans_splitting allowed to wait the split to
> finish and to keep all gup_fast at bay during it (while the page was
> still mapped readable and writable in userland by other CPUs). Now the
> physical split is deferred so you just split the pmd locally and only
> a physical split invoked on the page (not the virtual split invoked on
> the pmd with split_huge_pmd) has to keep gup at bay, and it does so by
> freezing the refcount so all gup_fast fail with the
> page_cache_get_speculative during the freeze. This removed the need of
> the pmd_splitting flag in gup_fast (when pmd_splitting was set gup
> fast had to go through the non-fast gup), but it means that now a
> hugepage cannot be physically splitted if it's gup pinned. The main
> difference is that freezing the refcount can fail, so the code must
> learn to cope with such failure and defer it. Decoupling the physical
> and virtual splits introduced the need of tracking the doublemap case
> with a new PG_double_map flag too. It makes the refcounting of
> hugepages trivial in comparison (identical to hugetlbfs in fact), but
> it requires total_mapcount to account for all those huge and non huge
> mappings. It primarily pays off to add THP to tmpfs where the physical
> split may have to be deferred for pagecache reasons anyway.

Thanks for your detailed explanation!

Do you think it is worth documenting what you have said? At least on
why we want pmd_present() and pmd_trans_huge() both return true when
a THP is under splitting, so that we can avoid some confusion in the future.
I can send a patch to add it to Document/vm/transhuge.rst.

--
Best Regards
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-17 Thread Naoya Horiguchi
On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
> 
> > On 10/15/2018 06:23 AM, Zi Yan wrote:
> >> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
> >>
> >>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>  On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
> 
> > On 10/09/2018 07:28 PM, Zi Yan wrote:
> >> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
> >> for x86
> >> PMD migration entry check)
> >>
> >> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
> >>
> >>> A normal mapped THP page at PMD level should be correctly 
> >>> differentiated
> >>> from a PMD migration entry while walking the page table. A mapped THP 
> >>> would
> >>> additionally check positive for pmd_present() along with 
> >>> pmd_trans_huge()
> >>> as compared to a PMD migration entry. This just adds a new 
> >>> conditional test
> >>> differentiating the two while walking the page table.
> >>>
> >>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> >>> Signed-off-by: Anshuman Khandual 
> >>> ---
> >>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> >>> mutually
> >>> exclusive which makes the current conditional block work for both 
> >>> mapped
> >>> and migration entries. This is not same with arm64 where 
> >>> pmd_trans_huge()
> >>
> >> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> >> splitting,
> >
> > Not really if we just look at code in the conditional blocks.
> 
>  Yeah, I explained it wrong above. Sorry about that.
> 
>  In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>  _PAGE_PSE),
>  thus, it returns true even if the present bit is cleared but PSE bit is 
>  set.
> >>>
> >>> Okay.
> >>>
>  This is done so, because THPs under splitting are regarded as present in 
>  the kernel
>  but not present when a hardware page table walker checks it.
> >>>
> >>> Okay.
> >>>
> 
>  For PMD migration entry, which should be regarded as not present, if PSE 
>  bit
>  is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>  PMD migration entries will be regarded as present
> >>>
> >>> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> >>> false
> >>> as well. Is there anything which can be done to get around this problem on
> >>> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> >>> logical.
> >>> Otherwise we would revert the condition block order to accommodate both 
> >>> the
> >>> implementation for pmd_trans_huge() as suggested by Kirill before or just
> >>> consider this patch forward.
> >>>
> >>> Because I am not really sure yet about the idea of getting pmd_present()
> >>> check into pmd_trans_huge() on arm64 just to make it fit into this 
> >>> semantics
> >>> as suggested by Will. If a PMD is trans huge page or not should not 
> >>> depend on
> >>> whether it is present or not.
> >>
> >> In terms of THPs, we have three cases: a present THP, a THP under 
> >> splitting,
> >> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
> >> true
> >> for a present THP and a THP under splitting, because they discover 
> >> _PAGE_PSE bit
> >
> > Then how do we differentiate between a mapped THP and a splitting THP.
> 
> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
> THP
> using helper functions.
> 
> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
> THP
> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
> true as long as _PAGE_PSE bit is set.
> 
> >
> >> is set for both cases, whereas they both return false for a THP under 
> >> migration.
> >> You want to change them to make pmd_trans_huge() returns true for a THP 
> >> under migration
> >> instead of false to help ARM64’s support for THP migration.
> > I am just trying to understand the rationale behind this semantics and see 
> > where
> > it should be fixed.
> >
> > I think the fundamental problem here is that THP under split has been 
> > difficult
> > to be re-presented through the available helper functions and in turn PTE 
> > bits.
> >
> > The following checks
> >
> > 1) pmd_present()
> > 2) pmd_trans_huge()
> >
> > Represent three THP states
> >
> > 1) Mapped THP   (pmd_present && pmd_trans_huge)
> > 2) Splitting THP(pmd_present && pmd_trans_huge)
> > 3) Migrating THP(!pmd_present && !pmd_trans_huge)
> >
> > The problem is if we make pmd_trans_huge() return true for all the three 
> > states
> > which sounds logical because they are all still trans huge PMD, then 
> > pmd_present()
> > can only represent two states not three as required.
> 
> We are on the same page about representing three THP states in x86.
> I also agree with you that it 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-17 Thread Naoya Horiguchi
On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
> 
> > On 10/15/2018 06:23 AM, Zi Yan wrote:
> >> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
> >>
> >>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>  On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
> 
> > On 10/09/2018 07:28 PM, Zi Yan wrote:
> >> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
> >> for x86
> >> PMD migration entry check)
> >>
> >> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
> >>
> >>> A normal mapped THP page at PMD level should be correctly 
> >>> differentiated
> >>> from a PMD migration entry while walking the page table. A mapped THP 
> >>> would
> >>> additionally check positive for pmd_present() along with 
> >>> pmd_trans_huge()
> >>> as compared to a PMD migration entry. This just adds a new 
> >>> conditional test
> >>> differentiating the two while walking the page table.
> >>>
> >>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> >>> Signed-off-by: Anshuman Khandual 
> >>> ---
> >>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> >>> mutually
> >>> exclusive which makes the current conditional block work for both 
> >>> mapped
> >>> and migration entries. This is not same with arm64 where 
> >>> pmd_trans_huge()
> >>
> >> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> >> splitting,
> >
> > Not really if we just look at code in the conditional blocks.
> 
>  Yeah, I explained it wrong above. Sorry about that.
> 
>  In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>  _PAGE_PSE),
>  thus, it returns true even if the present bit is cleared but PSE bit is 
>  set.
> >>>
> >>> Okay.
> >>>
>  This is done so, because THPs under splitting are regarded as present in 
>  the kernel
>  but not present when a hardware page table walker checks it.
> >>>
> >>> Okay.
> >>>
> 
>  For PMD migration entry, which should be regarded as not present, if PSE 
>  bit
>  is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>  PMD migration entries will be regarded as present
> >>>
> >>> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> >>> false
> >>> as well. Is there anything which can be done to get around this problem on
> >>> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> >>> logical.
> >>> Otherwise we would revert the condition block order to accommodate both 
> >>> the
> >>> implementation for pmd_trans_huge() as suggested by Kirill before or just
> >>> consider this patch forward.
> >>>
> >>> Because I am not really sure yet about the idea of getting pmd_present()
> >>> check into pmd_trans_huge() on arm64 just to make it fit into this 
> >>> semantics
> >>> as suggested by Will. If a PMD is trans huge page or not should not 
> >>> depend on
> >>> whether it is present or not.
> >>
> >> In terms of THPs, we have three cases: a present THP, a THP under 
> >> splitting,
> >> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
> >> true
> >> for a present THP and a THP under splitting, because they discover 
> >> _PAGE_PSE bit
> >
> > Then how do we differentiate between a mapped THP and a splitting THP.
> 
> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
> THP
> using helper functions.
> 
> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
> THP
> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
> true as long as _PAGE_PSE bit is set.
> 
> >
> >> is set for both cases, whereas they both return false for a THP under 
> >> migration.
> >> You want to change them to make pmd_trans_huge() returns true for a THP 
> >> under migration
> >> instead of false to help ARM64’s support for THP migration.
> > I am just trying to understand the rationale behind this semantics and see 
> > where
> > it should be fixed.
> >
> > I think the fundamental problem here is that THP under split has been 
> > difficult
> > to be re-presented through the available helper functions and in turn PTE 
> > bits.
> >
> > The following checks
> >
> > 1) pmd_present()
> > 2) pmd_trans_huge()
> >
> > Represent three THP states
> >
> > 1) Mapped THP   (pmd_present && pmd_trans_huge)
> > 2) Splitting THP(pmd_present && pmd_trans_huge)
> > 3) Migrating THP(!pmd_present && !pmd_trans_huge)
> >
> > The problem is if we make pmd_trans_huge() return true for all the three 
> > states
> > which sounds logical because they are all still trans huge PMD, then 
> > pmd_present()
> > can only represent two states not three as required.
> 
> We are on the same page about representing three THP states in x86.
> I also agree with you that it 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Andrea Arcangeli
Hello Zi,

On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
> true
> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
> returns false in the same situation?

!pmd_present means it's a migration entry or swap entry and doesn't
point to RAM. It means if you do pmd_to_page(*pmd) it will return you
an undefined result.

During splitting the physical page is still very well pointed by the
pmd as long as pmd_trans_huge returns true and you hold the
pmd_lock.

pmd_trans_huge must be true at all times for a transhuge pmd that
points to a hugepage, or all VM fast paths won't serialize with the
pmd_lock, that is the only reason why, and it's a very good reason
because it avoids to take the pmd_lock when walking over non transhuge
pmds (i.e. when there are no THP allocated).

Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
at all times, why would you want to make pmd_present return false? How
could it help if pmd_trans_huge returns true, but pmd_present returns
false despite pmd_to_page works fine and the pmd is really still
pointing to the page?

When userland faults on such pmd !pmd_present it will make the page
fault take a swap or migration path, but that's the wrong path if the
pmd points to RAM.

What we need to do during split is an invalidate of the huge TLB.
There's no pmd_trans_splitting anymore, so we only clear the present
bit in the PTE despite pmd_present still returns true (just like
PROT_NONE, nothing new in this respect). pmd_present never meant the
real present bit in the pte was set, it just means the pmd points to
RAM. It means it doesn't point to swap or migration entry and you can
do pmd_to_page and it works fine.

We need to invalidate the TLB by clearing the present bit and by
flushing the TLB before overwriting the transhuge pmd with the regular
pte (i.e. to make it non huge). That is actually required by an errata
(l1 cache aliasing of the same mapping through two different TLB of
two different sizes broke some old CPU and triggered machine checks).
It's not something fundamentally necessary from a common code point of
view. It's more risky from an hardware (not software) standpoint and
before you can get rid of the pmd you need to do a TLB flush anyway to
be sure CPUs stops using it, so better clear the present bit before
doing the real costly thing (the tlb flush with IPIs). Clearing the
present bit during the TLB flush is a cost that gets lost in the noise.

The clear of the real present bit during pmd (virtual) splitting is
done with pmdp_invalidate, that is created specifically to keeps
pmd_trans_huge=true, pmd_present=true despite the present bit is not
set. So you could imagine _PAGE_PSE as the real present bit.

Before the physical split was deferred and decoupled from the virtual
memory pmd split, pmd_trans_splitting allowed to wait the split to
finish and to keep all gup_fast at bay during it (while the page was
still mapped readable and writable in userland by other CPUs). Now the
physical split is deferred so you just split the pmd locally and only
a physical split invoked on the page (not the virtual split invoked on
the pmd with split_huge_pmd) has to keep gup at bay, and it does so by
freezing the refcount so all gup_fast fail with the
page_cache_get_speculative during the freeze. This removed the need of
the pmd_splitting flag in gup_fast (when pmd_splitting was set gup
fast had to go through the non-fast gup), but it means that now a
hugepage cannot be physically splitted if it's gup pinned. The main
difference is that freezing the refcount can fail, so the code must
learn to cope with such failure and defer it. Decoupling the physical
and virtual splits introduced the need of tracking the doublemap case
with a new PG_double_map flag too. It makes the refcounting of
hugepages trivial in comparison (identical to hugetlbfs in fact), but
it requires total_mapcount to account for all those huge and non huge
mappings. It primarily pays off to add THP to tmpfs where the physical
split may have to be deferred for pagecache reasons anyway.

Thanks,
Andrea


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Andrea Arcangeli
Hello Zi,

On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
> true
> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
> returns false in the same situation?

!pmd_present means it's a migration entry or swap entry and doesn't
point to RAM. It means if you do pmd_to_page(*pmd) it will return you
an undefined result.

During splitting the physical page is still very well pointed by the
pmd as long as pmd_trans_huge returns true and you hold the
pmd_lock.

pmd_trans_huge must be true at all times for a transhuge pmd that
points to a hugepage, or all VM fast paths won't serialize with the
pmd_lock, that is the only reason why, and it's a very good reason
because it avoids to take the pmd_lock when walking over non transhuge
pmds (i.e. when there are no THP allocated).

Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
at all times, why would you want to make pmd_present return false? How
could it help if pmd_trans_huge returns true, but pmd_present returns
false despite pmd_to_page works fine and the pmd is really still
pointing to the page?

When userland faults on such pmd !pmd_present it will make the page
fault take a swap or migration path, but that's the wrong path if the
pmd points to RAM.

What we need to do during split is an invalidate of the huge TLB.
There's no pmd_trans_splitting anymore, so we only clear the present
bit in the PTE despite pmd_present still returns true (just like
PROT_NONE, nothing new in this respect). pmd_present never meant the
real present bit in the pte was set, it just means the pmd points to
RAM. It means it doesn't point to swap or migration entry and you can
do pmd_to_page and it works fine.

We need to invalidate the TLB by clearing the present bit and by
flushing the TLB before overwriting the transhuge pmd with the regular
pte (i.e. to make it non huge). That is actually required by an errata
(l1 cache aliasing of the same mapping through two different TLB of
two different sizes broke some old CPU and triggered machine checks).
It's not something fundamentally necessary from a common code point of
view. It's more risky from an hardware (not software) standpoint and
before you can get rid of the pmd you need to do a TLB flush anyway to
be sure CPUs stops using it, so better clear the present bit before
doing the real costly thing (the tlb flush with IPIs). Clearing the
present bit during the TLB flush is a cost that gets lost in the noise.

The clear of the real present bit during pmd (virtual) splitting is
done with pmdp_invalidate, that is created specifically to keeps
pmd_trans_huge=true, pmd_present=true despite the present bit is not
set. So you could imagine _PAGE_PSE as the real present bit.

Before the physical split was deferred and decoupled from the virtual
memory pmd split, pmd_trans_splitting allowed to wait the split to
finish and to keep all gup_fast at bay during it (while the page was
still mapped readable and writable in userland by other CPUs). Now the
physical split is deferred so you just split the pmd locally and only
a physical split invoked on the page (not the virtual split invoked on
the pmd with split_huge_pmd) has to keep gup at bay, and it does so by
freezing the refcount so all gup_fast fail with the
page_cache_get_speculative during the freeze. This removed the need of
the pmd_splitting flag in gup_fast (when pmd_splitting was set gup
fast had to go through the non-fast gup), but it means that now a
hugepage cannot be physically splitted if it's gup pinned. The main
difference is that freezing the refcount can fail, so the code must
learn to cope with such failure and defer it. Decoupling the physical
and virtual splits introduced the need of tracking the doublemap case
with a new PG_double_map flag too. It makes the refcounting of
hugepages trivial in comparison (identical to hugetlbfs in fact), but
it requires total_mapcount to account for all those huge and non huge
mappings. It primarily pays off to add THP to tmpfs where the physical
split may have to be deferred for pagecache reasons anyway.

Thanks,
Andrea


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Zi Yan
On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:

> On 10/15/2018 06:23 AM, Zi Yan wrote:
>> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>>
>>> On 10/10/2018 06:13 PM, Zi Yan wrote:
 On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:

> On 10/09/2018 07:28 PM, Zi Yan wrote:
>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
>> for x86
>> PMD migration entry check)
>>
>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP 
>>> would
>>> additionally check positive for pmd_present() along with 
>>> pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional 
>>> test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
>>> mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where 
>>> pmd_trans_huge()
>>
>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>> splitting,
>
> Not really if we just look at code in the conditional blocks.

 Yeah, I explained it wrong above. Sorry about that.

 In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
 thus, it returns true even if the present bit is cleared but PSE bit is 
 set.
>>>
>>> Okay.
>>>
 This is done so, because THPs under splitting are regarded as present in 
 the kernel
 but not present when a hardware page table walker checks it.
>>>
>>> Okay.
>>>

 For PMD migration entry, which should be regarded as not present, if PSE 
 bit
 is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
 PMD migration entries will be regarded as present
>>>
>>> Okay to make pmd_present() return false pmd_trans_huge() has to return false
>>> as well. Is there anything which can be done to get around this problem on
>>> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
>>> Otherwise we would revert the condition block order to accommodate both the
>>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>>> consider this patch forward.
>>>
>>> Because I am not really sure yet about the idea of getting pmd_present()
>>> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
>>> as suggested by Will. If a PMD is trans huge page or not should not depend 
>>> on
>>> whether it is present or not.
>>
>> In terms of THPs, we have three cases: a present THP, a THP under splitting,
>> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
>> true
>> for a present THP and a THP under splitting, because they discover _PAGE_PSE 
>> bit
>
> Then how do we differentiate between a mapped THP and a splitting THP.

AFAIK, in x86, there is no distinction between a mapped THP and a splitting THP
using helper functions.

A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
THP
has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
true as long as _PAGE_PSE bit is set.

>
>> is set for both cases, whereas they both return false for a THP under 
>> migration.
>> You want to change them to make pmd_trans_huge() returns true for a THP 
>> under migration
>> instead of false to help ARM64’s support for THP migration.
> I am just trying to understand the rationale behind this semantics and see 
> where
> it should be fixed.
>
> I think the fundamental problem here is that THP under split has been 
> difficult
> to be re-presented through the available helper functions and in turn PTE 
> bits.
>
> The following checks
>
> 1) pmd_present()
> 2) pmd_trans_huge()
>
> Represent three THP states
>
> 1) Mapped THP (pmd_present && pmd_trans_huge)
> 2) Splitting THP  (pmd_present && pmd_trans_huge)
> 3) Migrating THP  (!pmd_present && !pmd_trans_huge)
>
> The problem is if we make pmd_trans_huge() return true for all the three 
> states
> which sounds logical because they are all still trans huge PMD, then 
> pmd_present()
> can only represent two states not three as required.

We are on the same page about representing three THP states in x86.
I also agree with you that it is logical to use three distinct representations
for these three states, i.e. splitting THP could be changed to (!pmd_present && 
pmd_trans_huge).


>>
>> For x86, this change requires:
>> 1. changing the condition in pmd_trans_huge(), so that it returns true for
>> PMD migration entries;
>> 2. changing the code, which calls pmd_trans_huge(), to match the new 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Zi Yan
On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:

> On 10/15/2018 06:23 AM, Zi Yan wrote:
>> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
>>
>>> On 10/10/2018 06:13 PM, Zi Yan wrote:
 On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:

> On 10/09/2018 07:28 PM, Zi Yan wrote:
>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
>> for x86
>> PMD migration entry check)
>>
>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP 
>>> would
>>> additionally check positive for pmd_present() along with 
>>> pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional 
>>> test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
>>> mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where 
>>> pmd_trans_huge()
>>
>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>> splitting,
>
> Not really if we just look at code in the conditional blocks.

 Yeah, I explained it wrong above. Sorry about that.

 In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
 thus, it returns true even if the present bit is cleared but PSE bit is 
 set.
>>>
>>> Okay.
>>>
 This is done so, because THPs under splitting are regarded as present in 
 the kernel
 but not present when a hardware page table walker checks it.
>>>
>>> Okay.
>>>

 For PMD migration entry, which should be regarded as not present, if PSE 
 bit
 is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
 PMD migration entries will be regarded as present
>>>
>>> Okay to make pmd_present() return false pmd_trans_huge() has to return false
>>> as well. Is there anything which can be done to get around this problem on
>>> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
>>> Otherwise we would revert the condition block order to accommodate both the
>>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>>> consider this patch forward.
>>>
>>> Because I am not really sure yet about the idea of getting pmd_present()
>>> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
>>> as suggested by Will. If a PMD is trans huge page or not should not depend 
>>> on
>>> whether it is present or not.
>>
>> In terms of THPs, we have three cases: a present THP, a THP under splitting,
>> and a THP under migration. pmd_present() and pmd_trans_huge() both return 
>> true
>> for a present THP and a THP under splitting, because they discover _PAGE_PSE 
>> bit
>
> Then how do we differentiate between a mapped THP and a splitting THP.

AFAIK, in x86, there is no distinction between a mapped THP and a splitting THP
using helper functions.

A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting 
THP
has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
true as long as _PAGE_PSE bit is set.

>
>> is set for both cases, whereas they both return false for a THP under 
>> migration.
>> You want to change them to make pmd_trans_huge() returns true for a THP 
>> under migration
>> instead of false to help ARM64’s support for THP migration.
> I am just trying to understand the rationale behind this semantics and see 
> where
> it should be fixed.
>
> I think the fundamental problem here is that THP under split has been 
> difficult
> to be re-presented through the available helper functions and in turn PTE 
> bits.
>
> The following checks
>
> 1) pmd_present()
> 2) pmd_trans_huge()
>
> Represent three THP states
>
> 1) Mapped THP (pmd_present && pmd_trans_huge)
> 2) Splitting THP  (pmd_present && pmd_trans_huge)
> 3) Migrating THP  (!pmd_present && !pmd_trans_huge)
>
> The problem is if we make pmd_trans_huge() return true for all the three 
> states
> which sounds logical because they are all still trans huge PMD, then 
> pmd_present()
> can only represent two states not three as required.

We are on the same page about representing three THP states in x86.
I also agree with you that it is logical to use three distinct representations
for these three states, i.e. splitting THP could be changed to (!pmd_present && 
pmd_trans_huge).


>>
>> For x86, this change requires:
>> 1. changing the condition in pmd_trans_huge(), so that it returns true for
>> PMD migration entries;
>> 2. changing the code, which calls pmd_trans_huge(), to match the new 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Anshuman Khandual



On 10/15/2018 02:02 PM, Aneesh Kumar K.V wrote:
> On 10/12/18 1:32 PM, Anshuman Khandual wrote:
>>
>>
>> On 10/09/2018 06:48 PM, Will Deacon wrote:
>>> On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
 On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional 
> test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()
> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ?

 I guess it's just a design choice. Any reason why arm64 cannot do the
 same?
>>>
>>> Anshuman, would it work to:
>>>
>>> #define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
>>> PMD_TABLE_BIT))
>> yeah this works but some how does not seem like the right thing to do
>> but can be the very last option.
>>
> 
> 
> There can be other code paths that makes that assumption. I ended up doing 
> the below for pmd_trans_huge on ppc64.
>

Yeah, did see that in one of the previous proposals.

https://patchwork.kernel.org/patch/10544291/

But the existing semantics does not look right and makes vague assumptions.
Zi Yan has already asked Andrea for his input in this regard on the next
thread. So I guess while being here, its a good idea to revisit existing
semantics and it's assumptions before fixing it in arch specific helpers.

- Anshuman 


> /*
>  * Only returns true for a THP. False for pmd migration entry.
>  * We also need to return true when we come across a pte that
>  * in between a thp split. While splitting THP, we mark the pmd
>  * invalid (pmdp_invalidate()) before we set it with pte page
>  * address. A pmd_trans_huge() check against a pmd entry during that time
>  * should return true.
>  * We should not call this on a hugetlb entry. We should check for HugeTLB
>  * entry using vma->vm_flags
>  * The page table walk rule is explained in Documentation/vm/transhuge.rst
>  */
> static inline int pmd_trans_huge(pmd_t pmd)
> {
> if (!pmd_present(pmd))
>     return false;
> 
> if (radix_enabled())
>     return radix__pmd_trans_huge(pmd);
> return hash__pmd_trans_huge(pmd);
> }
> 
> -aneesh
> 


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Anshuman Khandual



On 10/15/2018 02:02 PM, Aneesh Kumar K.V wrote:
> On 10/12/18 1:32 PM, Anshuman Khandual wrote:
>>
>>
>> On 10/09/2018 06:48 PM, Will Deacon wrote:
>>> On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
 On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional 
> test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()
> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ?

 I guess it's just a design choice. Any reason why arm64 cannot do the
 same?
>>>
>>> Anshuman, would it work to:
>>>
>>> #define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
>>> PMD_TABLE_BIT))
>> yeah this works but some how does not seem like the right thing to do
>> but can be the very last option.
>>
> 
> 
> There can be other code paths that makes that assumption. I ended up doing 
> the below for pmd_trans_huge on ppc64.
>

Yeah, did see that in one of the previous proposals.

https://patchwork.kernel.org/patch/10544291/

But the existing semantics does not look right and makes vague assumptions.
Zi Yan has already asked Andrea for his input in this regard on the next
thread. So I guess while being here, its a good idea to revisit existing
semantics and it's assumptions before fixing it in arch specific helpers.

- Anshuman 


> /*
>  * Only returns true for a THP. False for pmd migration entry.
>  * We also need to return true when we come across a pte that
>  * in between a thp split. While splitting THP, we mark the pmd
>  * invalid (pmdp_invalidate()) before we set it with pte page
>  * address. A pmd_trans_huge() check against a pmd entry during that time
>  * should return true.
>  * We should not call this on a hugetlb entry. We should check for HugeTLB
>  * entry using vma->vm_flags
>  * The page table walk rule is explained in Documentation/vm/transhuge.rst
>  */
> static inline int pmd_trans_huge(pmd_t pmd)
> {
> if (!pmd_present(pmd))
>     return false;
> 
> if (radix_enabled())
>     return radix__pmd_trans_huge(pmd);
> return hash__pmd_trans_huge(pmd);
> }
> 
> -aneesh
> 


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Anshuman Khandual



On 10/15/2018 06:23 AM, Zi Yan wrote:
> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
> 
>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>>
 On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
> x86
> PMD migration entry check)
>
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP 
>> would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional 
>> test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>
> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> splitting,

 Not really if we just look at code in the conditional blocks.
>>>
>>> Yeah, I explained it wrong above. Sorry about that.
>>>
>>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
>>> thus, it returns true even if the present bit is cleared but PSE bit is set.
>>
>> Okay.
>>
>>> This is done so, because THPs under splitting are regarded as present in 
>>> the kernel
>>> but not present when a hardware page table walker checks it.
>>
>> Okay.
>>
>>>
>>> For PMD migration entry, which should be regarded as not present, if PSE bit
>>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>>> PMD migration entries will be regarded as present
>>
>> Okay to make pmd_present() return false pmd_trans_huge() has to return false
>> as well. Is there anything which can be done to get around this problem on
>> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
>> Otherwise we would revert the condition block order to accommodate both the
>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>> consider this patch forward.
>>
>> Because I am not really sure yet about the idea of getting pmd_present()
>> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
>> as suggested by Will. If a PMD is trans huge page or not should not depend on
>> whether it is present or not.
> 
> In terms of THPs, we have three cases: a present THP, a THP under splitting,
> and a THP under migration. pmd_present() and pmd_trans_huge() both return true
> for a present THP and a THP under splitting, because they discover _PAGE_PSE 
> bit

Then how do we differentiate between a mapped THP and a splitting THP.

> is set for both cases, whereas they both return false for a THP under 
> migration.
> You want to change them to make pmd_trans_huge() returns true for a THP under 
> migration
> instead of false to help ARM64’s support for THP migration.
I am just trying to understand the rationale behind this semantics and see where
it should be fixed.

I think the fundamental problem here is that THP under split has been difficult
to be re-presented through the available helper functions and in turn PTE bits.

The following checks

1) pmd_present()
2) pmd_trans_huge()

Represent three THP states

1) Mapped THP   (pmd_present && pmd_trans_huge)
2) Splitting THP(pmd_present && pmd_trans_huge)
3) Migrating THP(!pmd_present && !pmd_trans_huge)

The problem is if we make pmd_trans_huge() return true for all the three states
which sounds logical because they are all still trans huge PMD, then 
pmd_present()
can only represent two states not three as required.

> 
> For x86, this change requires:
> 1. changing the condition in pmd_trans_huge(), so that it returns true for
> PMD migration entries;
> 2. changing the code, which calls pmd_trans_huge(), to match the new logic.
Can those be fixed with an additional check for pmd_present() as suggested here
in this patch ? Asking because in case we could not get common semantics for
these helpers on all arch that would be a fall back option for the moment.

> 
> Another problem I see is that x86’s pmd_present() returns true for a THP under
> splitting but ARM64’s pmd_present() returns false for a THP under splitting.

But how did you conclude this ? I dont see any explicit helper for splitting
THP. Could you please point me in the code ?

> I do not know if there is any correctness issue with this. So I copy Andrea
> here, since he made x86’s pmd_present() returns true for a THP under splitting
> as an optimization. I want to understand more about it and potentially make
> x86 and 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Anshuman Khandual



On 10/15/2018 06:23 AM, Zi Yan wrote:
> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
> 
>> On 10/10/2018 06:13 PM, Zi Yan wrote:
>>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>>
 On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
> x86
> PMD migration entry check)
>
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP 
>> would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional 
>> test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>
> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
> splitting,

 Not really if we just look at code in the conditional blocks.
>>>
>>> Yeah, I explained it wrong above. Sorry about that.
>>>
>>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
>>> thus, it returns true even if the present bit is cleared but PSE bit is set.
>>
>> Okay.
>>
>>> This is done so, because THPs under splitting are regarded as present in 
>>> the kernel
>>> but not present when a hardware page table walker checks it.
>>
>> Okay.
>>
>>>
>>> For PMD migration entry, which should be regarded as not present, if PSE bit
>>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>>> PMD migration entries will be regarded as present
>>
>> Okay to make pmd_present() return false pmd_trans_huge() has to return false
>> as well. Is there anything which can be done to get around this problem on
>> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
>> Otherwise we would revert the condition block order to accommodate both the
>> implementation for pmd_trans_huge() as suggested by Kirill before or just
>> consider this patch forward.
>>
>> Because I am not really sure yet about the idea of getting pmd_present()
>> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
>> as suggested by Will. If a PMD is trans huge page or not should not depend on
>> whether it is present or not.
> 
> In terms of THPs, we have three cases: a present THP, a THP under splitting,
> and a THP under migration. pmd_present() and pmd_trans_huge() both return true
> for a present THP and a THP under splitting, because they discover _PAGE_PSE 
> bit

Then how do we differentiate between a mapped THP and a splitting THP.

> is set for both cases, whereas they both return false for a THP under 
> migration.
> You want to change them to make pmd_trans_huge() returns true for a THP under 
> migration
> instead of false to help ARM64’s support for THP migration.
I am just trying to understand the rationale behind this semantics and see where
it should be fixed.

I think the fundamental problem here is that THP under split has been difficult
to be re-presented through the available helper functions and in turn PTE bits.

The following checks

1) pmd_present()
2) pmd_trans_huge()

Represent three THP states

1) Mapped THP   (pmd_present && pmd_trans_huge)
2) Splitting THP(pmd_present && pmd_trans_huge)
3) Migrating THP(!pmd_present && !pmd_trans_huge)

The problem is if we make pmd_trans_huge() return true for all the three states
which sounds logical because they are all still trans huge PMD, then 
pmd_present()
can only represent two states not three as required.

> 
> For x86, this change requires:
> 1. changing the condition in pmd_trans_huge(), so that it returns true for
> PMD migration entries;
> 2. changing the code, which calls pmd_trans_huge(), to match the new logic.
Can those be fixed with an additional check for pmd_present() as suggested here
in this patch ? Asking because in case we could not get common semantics for
these helpers on all arch that would be a fall back option for the moment.

> 
> Another problem I see is that x86’s pmd_present() returns true for a THP under
> splitting but ARM64’s pmd_present() returns false for a THP under splitting.

But how did you conclude this ? I dont see any explicit helper for splitting
THP. Could you please point me in the code ?

> I do not know if there is any correctness issue with this. So I copy Andrea
> here, since he made x86’s pmd_present() returns true for a THP under splitting
> as an optimization. I want to understand more about it and potentially make
> x86 and 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-15 Thread Aneesh Kumar K.V

On 10/12/18 1:32 PM, Anshuman Khandual wrote:



On 10/09/2018 06:48 PM, Will Deacon wrote:

On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:

On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:

A normal mapped THP page at PMD level should be correctly differentiated
from a PMD migration entry while walking the page table. A mapped THP would
additionally check positive for pmd_present() along with pmd_trans_huge()
as compared to a PMD migration entry. This just adds a new conditional test
differentiating the two while walking the page table.

Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
Signed-off-by: Anshuman Khandual 
---
On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
exclusive which makes the current conditional block work for both mapped
and migration entries. This is not same with arm64 where pmd_trans_huge()
returns positive for both mapped and migration entries. Could some one
please explain why pmd_trans_huge() has to return false for migration
entries which just install swap bits and its still a PMD ?


I guess it's just a design choice. Any reason why arm64 cannot do the
same?


Anshuman, would it work to:

#define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
PMD_TABLE_BIT))

yeah this works but some how does not seem like the right thing to do
but can be the very last option.




There can be other code paths that makes that assumption. I ended up 
doing the below for pmd_trans_huge on ppc64.


/*
 * Only returns true for a THP. False for pmd migration entry.
 * We also need to return true when we come across a pte that
 * in between a thp split. While splitting THP, we mark the pmd
 * invalid (pmdp_invalidate()) before we set it with pte page
 * address. A pmd_trans_huge() check against a pmd entry during that time
 * should return true.
 * We should not call this on a hugetlb entry. We should check for HugeTLB
 * entry using vma->vm_flags
 * The page table walk rule is explained in Documentation/vm/transhuge.rst
 */
static inline int pmd_trans_huge(pmd_t pmd)
{
if (!pmd_present(pmd))
return false;

if (radix_enabled())
return radix__pmd_trans_huge(pmd);
return hash__pmd_trans_huge(pmd);
}

-aneesh


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-15 Thread Aneesh Kumar K.V

On 10/12/18 1:32 PM, Anshuman Khandual wrote:



On 10/09/2018 06:48 PM, Will Deacon wrote:

On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:

On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:

A normal mapped THP page at PMD level should be correctly differentiated
from a PMD migration entry while walking the page table. A mapped THP would
additionally check positive for pmd_present() along with pmd_trans_huge()
as compared to a PMD migration entry. This just adds a new conditional test
differentiating the two while walking the page table.

Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
Signed-off-by: Anshuman Khandual 
---
On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
exclusive which makes the current conditional block work for both mapped
and migration entries. This is not same with arm64 where pmd_trans_huge()
returns positive for both mapped and migration entries. Could some one
please explain why pmd_trans_huge() has to return false for migration
entries which just install swap bits and its still a PMD ?


I guess it's just a design choice. Any reason why arm64 cannot do the
same?


Anshuman, would it work to:

#define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
PMD_TABLE_BIT))

yeah this works but some how does not seem like the right thing to do
but can be the very last option.




There can be other code paths that makes that assumption. I ended up 
doing the below for pmd_trans_huge on ppc64.


/*
 * Only returns true for a THP. False for pmd migration entry.
 * We also need to return true when we come across a pte that
 * in between a thp split. While splitting THP, we mark the pmd
 * invalid (pmdp_invalidate()) before we set it with pte page
 * address. A pmd_trans_huge() check against a pmd entry during that time
 * should return true.
 * We should not call this on a hugetlb entry. We should check for HugeTLB
 * entry using vma->vm_flags
 * The page table walk rule is explained in Documentation/vm/transhuge.rst
 */
static inline int pmd_trans_huge(pmd_t pmd)
{
if (!pmd_present(pmd))
return false;

if (radix_enabled())
return radix__pmd_trans_huge(pmd);
return hash__pmd_trans_huge(pmd);
}

-aneesh


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-14 Thread Zi Yan
On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
 x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional 
> test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in the 
>> kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
> Otherwise we would revert the condition block order to accommodate both the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
> as suggested by Will. If a PMD is trans huge page or not should not depend on
> whether it is present or not.

In terms of THPs, we have three cases: a present THP, a THP under splitting,
and a THP under migration. pmd_present() and pmd_trans_huge() both return true
for a present THP and a THP under splitting, because they discover _PAGE_PSE bit
is set for both cases, whereas they both return false for a THP under migration.
You want to change them to make pmd_trans_huge() returns true for a THP under 
migration
instead of false to help ARM64’s support for THP migration.

For x86, this change requires:
1. changing the condition in pmd_trans_huge(), so that it returns true for
PMD migration entries;
2. changing the code, which calls pmd_trans_huge(), to match the new logic.

Another problem I see is that x86’s pmd_present() returns true for a THP under
splitting but ARM64’s pmd_present() returns false for a THP under splitting.
I do not know if there is any correctness issue with this. So I copy Andrea
here, since he made x86’s pmd_present() returns true for a THP under splitting
as an optimization. I want to understand more about it and potentially make
x86 and ARM64 (maybe all other architectures, too) return the same value
for all three cases mentioned above.


Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
true
for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
returns false in the same situation?


>>
>> My concern is that if ARM64’s pmd_trans_huge() returns true for migration
>> entries, unlike x86, there might be bugs triggered in the kernel when
>> THP migration is enabled in ARM64.
>
> Right and that is exactly what we are trying to fix with this patch.
>

I am not sure this patch can fix the problem in ARM64, because many other places
in the kernel, pmd_trans_huge() still returns false for a THP under migration.
We may need more comprehensive fixes for ARM64.

Thanks.

—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-14 Thread Zi Yan
On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
 x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional 
> test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in the 
>> kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
> Otherwise we would revert the condition block order to accommodate both the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
> as suggested by Will. If a PMD is trans huge page or not should not depend on
> whether it is present or not.

In terms of THPs, we have three cases: a present THP, a THP under splitting,
and a THP under migration. pmd_present() and pmd_trans_huge() both return true
for a present THP and a THP under splitting, because they discover _PAGE_PSE bit
is set for both cases, whereas they both return false for a THP under migration.
You want to change them to make pmd_trans_huge() returns true for a THP under 
migration
instead of false to help ARM64’s support for THP migration.

For x86, this change requires:
1. changing the condition in pmd_trans_huge(), so that it returns true for
PMD migration entries;
2. changing the code, which calls pmd_trans_huge(), to match the new logic.

Another problem I see is that x86’s pmd_present() returns true for a THP under
splitting but ARM64’s pmd_present() returns false for a THP under splitting.
I do not know if there is any correctness issue with this. So I copy Andrea
here, since he made x86’s pmd_present() returns true for a THP under splitting
as an optimization. I want to understand more about it and potentially make
x86 and ARM64 (maybe all other architectures, too) return the same value
for all three cases mentioned above.


Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
true
for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
returns false in the same situation?


>>
>> My concern is that if ARM64’s pmd_trans_huge() returns true for migration
>> entries, unlike x86, there might be bugs triggered in the kernel when
>> THP migration is enabled in ARM64.
>
> Right and that is exactly what we are trying to fix with this patch.
>

I am not sure this patch can fix the problem in ARM64, because many other places
in the kernel, pmd_trans_huge() still returns false for a THP under migration.
We may need more comprehensive fixes for ARM64.

Thanks.

—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-12 Thread Anshuman Khandual



On 10/09/2018 06:48 PM, Will Deacon wrote:
> On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP would
>>> additionally check positive for pmd_present() along with pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>>> returns positive for both mapped and migration entries. Could some one
>>> please explain why pmd_trans_huge() has to return false for migration
>>> entries which just install swap bits and its still a PMD ?
>>
>> I guess it's just a design choice. Any reason why arm64 cannot do the
>> same?
> 
> Anshuman, would it work to:
> 
> #define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
> PMD_TABLE_BIT))
yeah this works but some how does not seem like the right thing to do
but can be the very last option.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-12 Thread Anshuman Khandual



On 10/09/2018 06:48 PM, Will Deacon wrote:
> On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP would
>>> additionally check positive for pmd_present() along with pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>>> returns positive for both mapped and migration entries. Could some one
>>> please explain why pmd_trans_huge() has to return false for migration
>>> entries which just install swap bits and its still a PMD ?
>>
>> I guess it's just a design choice. Any reason why arm64 cannot do the
>> same?
> 
> Anshuman, would it work to:
> 
> #define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
> PMD_TABLE_BIT))
yeah this works but some how does not seem like the right thing to do
but can be the very last option.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-12 Thread Anshuman Khandual



On 10/10/2018 06:13 PM, Zi Yan wrote:
> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
> 
>> On 10/09/2018 07:28 PM, Zi Yan wrote:
>>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
>>> x86
>>> PMD migration entry check)
>>>
>>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>>
 A normal mapped THP page at PMD level should be correctly differentiated
 from a PMD migration entry while walking the page table. A mapped THP would
 additionally check positive for pmd_present() along with pmd_trans_huge()
 as compared to a PMD migration entry. This just adds a new conditional test
 differentiating the two while walking the page table.

 Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
 Signed-off-by: Anshuman Khandual 
 ---
 On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
 exclusive which makes the current conditional block work for both mapped
 and migration entries. This is not same with arm64 where pmd_trans_huge()
>>>
>>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>>> splitting,
>>
>> Not really if we just look at code in the conditional blocks.
> 
> Yeah, I explained it wrong above. Sorry about that.
> 
> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
> thus, it returns true even if the present bit is cleared but PSE bit is set.

Okay.

> This is done so, because THPs under splitting are regarded as present in the 
> kernel
> but not present when a hardware page table walker checks it.

Okay.

> 
> For PMD migration entry, which should be regarded as not present, if PSE bit
> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
> PMD migration entries will be regarded as present

Okay to make pmd_present() return false pmd_trans_huge() has to return false
as well. Is there anything which can be done to get around this problem on
X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
Otherwise we would revert the condition block order to accommodate both the
implementation for pmd_trans_huge() as suggested by Kirill before or just
consider this patch forward.

Because I am not really sure yet about the idea of getting pmd_present()
check into pmd_trans_huge() on arm64 just to make it fit into this semantics
as suggested by Will. If a PMD is trans huge page or not should not depend on
whether it is present or not.

> 
> My concern is that if ARM64’s pmd_trans_huge() returns true for migration
> entries, unlike x86, there might be bugs triggered in the kernel when
> THP migration is enabled in ARM64.

Right and that is exactly what we are trying to fix with this patch.

>
> Let me know if I explain this clear to you.
> 
>>
>>> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
>>> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h
>>
>>
>> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> if (likely(pmd_trans_huge(*pvmw->pmd))) {
>> if (pvmw->flags & PVMW_MIGRATION)
>> return not_found(pvmw);
>> if (pmd_page(*pvmw->pmd) != page)
>> return not_found(pvmw);
>> return true;
>> } else if (!pmd_present(*pvmw->pmd)) {
>> if (thp_migration_supported()) {
>> if (!(pvmw->flags & PVMW_MIGRATION))
>> return not_found(pvmw);
>> if 
>> (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
>> swp_entry_t entry = 
>> pmd_to_swp_entry(*pvmw->pmd);
>>
>> if (migration_entry_to_page(entry) 
>> != page)
>> return not_found(pvmw);
>> return true;
>> }
>> }
>> return not_found(pvmw);
>> } else {
>> /* THP pmd was split under us: handle on pte level */
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> }
>> } else if (!pmd_present(pmde)) { ---> Outer 'else if'
>> return false;
>> }
>>
>> Looking at the above code, it seems the conditional check for a THP
>> splitting case would be (!pmd_trans_huge && pmd_present) instead as
>> it has skipped the first two conditions. But THP splitting must have
>> been initiated once it has cleared the outer check (else it would not
>> have cleared otherwise)
>>
>> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).
> 
> If a THP is under splitting, both pmd_present() and 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-12 Thread Anshuman Khandual



On 10/10/2018 06:13 PM, Zi Yan wrote:
> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
> 
>> On 10/09/2018 07:28 PM, Zi Yan wrote:
>>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for 
>>> x86
>>> PMD migration entry check)
>>>
>>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>>
 A normal mapped THP page at PMD level should be correctly differentiated
 from a PMD migration entry while walking the page table. A mapped THP would
 additionally check positive for pmd_present() along with pmd_trans_huge()
 as compared to a PMD migration entry. This just adds a new conditional test
 differentiating the two while walking the page table.

 Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
 Signed-off-by: Anshuman Khandual 
 ---
 On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
 exclusive which makes the current conditional block work for both mapped
 and migration entries. This is not same with arm64 where pmd_trans_huge()
>>>
>>> !pmd_present() && pmd_trans_huge() is used to represent THPs under 
>>> splitting,
>>
>> Not really if we just look at code in the conditional blocks.
> 
> Yeah, I explained it wrong above. Sorry about that.
> 
> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
> thus, it returns true even if the present bit is cleared but PSE bit is set.

Okay.

> This is done so, because THPs under splitting are regarded as present in the 
> kernel
> but not present when a hardware page table walker checks it.

Okay.

> 
> For PMD migration entry, which should be regarded as not present, if PSE bit
> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
> PMD migration entries will be regarded as present

Okay to make pmd_present() return false pmd_trans_huge() has to return false
as well. Is there anything which can be done to get around this problem on
X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
Otherwise we would revert the condition block order to accommodate both the
implementation for pmd_trans_huge() as suggested by Kirill before or just
consider this patch forward.

Because I am not really sure yet about the idea of getting pmd_present()
check into pmd_trans_huge() on arm64 just to make it fit into this semantics
as suggested by Will. If a PMD is trans huge page or not should not depend on
whether it is present or not.

> 
> My concern is that if ARM64’s pmd_trans_huge() returns true for migration
> entries, unlike x86, there might be bugs triggered in the kernel when
> THP migration is enabled in ARM64.

Right and that is exactly what we are trying to fix with this patch.

>
> Let me know if I explain this clear to you.
> 
>>
>>> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
>>> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h
>>
>>
>> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> if (likely(pmd_trans_huge(*pvmw->pmd))) {
>> if (pvmw->flags & PVMW_MIGRATION)
>> return not_found(pvmw);
>> if (pmd_page(*pvmw->pmd) != page)
>> return not_found(pvmw);
>> return true;
>> } else if (!pmd_present(*pvmw->pmd)) {
>> if (thp_migration_supported()) {
>> if (!(pvmw->flags & PVMW_MIGRATION))
>> return not_found(pvmw);
>> if 
>> (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
>> swp_entry_t entry = 
>> pmd_to_swp_entry(*pvmw->pmd);
>>
>> if (migration_entry_to_page(entry) 
>> != page)
>> return not_found(pvmw);
>> return true;
>> }
>> }
>> return not_found(pvmw);
>> } else {
>> /* THP pmd was split under us: handle on pte level */
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> }
>> } else if (!pmd_present(pmde)) { ---> Outer 'else if'
>> return false;
>> }
>>
>> Looking at the above code, it seems the conditional check for a THP
>> splitting case would be (!pmd_trans_huge && pmd_present) instead as
>> it has skipped the first two conditions. But THP splitting must have
>> been initiated once it has cleared the outer check (else it would not
>> have cleared otherwise)
>>
>> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).
> 
> If a THP is under splitting, both pmd_present() and 

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-10 Thread Zi Yan
On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:

> On 10/09/2018 07:28 PM, Zi Yan wrote:
>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
>> PMD migration entry check)
>>
>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP would
>>> additionally check positive for pmd_present() along with pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>>
>> !pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,
>
> Not really if we just look at code in the conditional blocks.

Yeah, I explained it wrong above. Sorry about that.

In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
thus, it returns true even if the present bit is cleared but PSE bit is set.
This is done so, because THPs under splitting are regarded as present in the 
kernel
but not present when a hardware page table walker checks it.

For PMD migration entry, which should be regarded as not present, if PSE bit
is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
PMD migration entries will be regarded as present.

My concern is that if ARM64’s pmd_trans_huge() returns true for migration
entries, unlike x86, there might be bugs triggered in the kernel when
THP migration is enabled in ARM64.

Let me know if I explain this clear to you.

>
>> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
>> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h
>
>
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> if (likely(pmd_trans_huge(*pvmw->pmd))) {
> if (pvmw->flags & PVMW_MIGRATION)
> return not_found(pvmw);
> if (pmd_page(*pvmw->pmd) != page)
> return not_found(pvmw);
> return true;
> } else if (!pmd_present(*pvmw->pmd)) {
> if (thp_migration_supported()) {
> if (!(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> if 
> (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
> swp_entry_t entry = 
> pmd_to_swp_entry(*pvmw->pmd);
>
> if (migration_entry_to_page(entry) != 
> page)
> return not_found(pvmw);
> return true;
> }
> }
> return not_found(pvmw);
> } else {
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> }
> } else if (!pmd_present(pmde)) { ---> Outer 'else if'
> return false;
> }
>
> Looking at the above code, it seems the conditional check for a THP
> splitting case would be (!pmd_trans_huge && pmd_present) instead as
> it has skipped the first two conditions. But THP splitting must have
> been initiated once it has cleared the outer check (else it would not
> have cleared otherwise)
>
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).

If a THP is under splitting, both pmd_present() and pmd_trans_huge() return
true in x86. The else part (/* THP pmd was split under us … */) happens
after splitting is done.

> BTW what PMD state does the outer 'else if' block identify which must
> have cleared the following condition to get there.
>
> (!pmd_present && !pmd_trans_huge && !is_pmd_migration_entry)

I think it is the case that the PMD is gone or equivalently pmd_none().
This PMD entry is not in use.


—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-10 Thread Zi Yan
On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:

> On 10/09/2018 07:28 PM, Zi Yan wrote:
>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
>> PMD migration entry check)
>>
>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
>>
>>> A normal mapped THP page at PMD level should be correctly differentiated
>>> from a PMD migration entry while walking the page table. A mapped THP would
>>> additionally check positive for pmd_present() along with pmd_trans_huge()
>>> as compared to a PMD migration entry. This just adds a new conditional test
>>> differentiating the two while walking the page table.
>>>
>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>>> exclusive which makes the current conditional block work for both mapped
>>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>>
>> !pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,
>
> Not really if we just look at code in the conditional blocks.

Yeah, I explained it wrong above. Sorry about that.

In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
thus, it returns true even if the present bit is cleared but PSE bit is set.
This is done so, because THPs under splitting are regarded as present in the 
kernel
but not present when a hardware page table walker checks it.

For PMD migration entry, which should be regarded as not present, if PSE bit
is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
PMD migration entries will be regarded as present.

My concern is that if ARM64’s pmd_trans_huge() returns true for migration
entries, unlike x86, there might be bugs triggered in the kernel when
THP migration is enabled in ARM64.

Let me know if I explain this clear to you.

>
>> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
>> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h
>
>
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> if (likely(pmd_trans_huge(*pvmw->pmd))) {
> if (pvmw->flags & PVMW_MIGRATION)
> return not_found(pvmw);
> if (pmd_page(*pvmw->pmd) != page)
> return not_found(pvmw);
> return true;
> } else if (!pmd_present(*pvmw->pmd)) {
> if (thp_migration_supported()) {
> if (!(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> if 
> (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
> swp_entry_t entry = 
> pmd_to_swp_entry(*pvmw->pmd);
>
> if (migration_entry_to_page(entry) != 
> page)
> return not_found(pvmw);
> return true;
> }
> }
> return not_found(pvmw);
> } else {
> /* THP pmd was split under us: handle on pte level */
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> }
> } else if (!pmd_present(pmde)) { ---> Outer 'else if'
> return false;
> }
>
> Looking at the above code, it seems the conditional check for a THP
> splitting case would be (!pmd_trans_huge && pmd_present) instead as
> it has skipped the first two conditions. But THP splitting must have
> been initiated once it has cleared the outer check (else it would not
> have cleared otherwise)
>
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).

If a THP is under splitting, both pmd_present() and pmd_trans_huge() return
true in x86. The else part (/* THP pmd was split under us … */) happens
after splitting is done.

> BTW what PMD state does the outer 'else if' block identify which must
> have cleared the following condition to get there.
>
> (!pmd_present && !pmd_trans_huge && !is_pmd_migration_entry)

I think it is the case that the PMD is gone or equivalently pmd_none().
This PMD entry is not in use.


—
Best Regards,
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Anshuman Khandual



On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
> PMD migration entry check)
> 
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
> 
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
> 
> !pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,

Not really if we just look at code in the conditional blocks.

> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h


if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (likely(pmd_trans_huge(*pvmw->pmd))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
if (pmd_page(*pvmw->pmd) != page)
return not_found(pvmw);
return true;
} else if (!pmd_present(*pvmw->pmd)) {
if (thp_migration_supported()) {
if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
if 
(is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
swp_entry_t entry = 
pmd_to_swp_entry(*pvmw->pmd);

if (migration_entry_to_page(entry) != 
page)
return not_found(pvmw);
return true;
}
}
return not_found(pvmw);
} else {
/* THP pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
} else if (!pmd_present(pmde)) { ---> Outer 'else if'
return false;
}

Looking at the above code, it seems the conditional check for a THP
splitting case would be (!pmd_trans_huge && pmd_present) instead as
it has skipped the first two conditions. But THP splitting must have
been initiated once it has cleared the outer check (else it would not
have cleared otherwise)

if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).


BTW what PMD state does the outer 'else if' block identify which must
have cleared the following condition to get there.

(!pmd_present && !pmd_trans_huge && !is_pmd_migration_entry)


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Anshuman Khandual



On 10/09/2018 07:28 PM, Zi Yan wrote:
> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
> PMD migration entry check)
> 
> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
> 
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
> 
> !pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,

Not really if we just look at code in the conditional blocks.

> since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
> See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h


if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (likely(pmd_trans_huge(*pvmw->pmd))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
if (pmd_page(*pvmw->pmd) != page)
return not_found(pvmw);
return true;
} else if (!pmd_present(*pvmw->pmd)) {
if (thp_migration_supported()) {
if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
if 
(is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
swp_entry_t entry = 
pmd_to_swp_entry(*pvmw->pmd);

if (migration_entry_to_page(entry) != 
page)
return not_found(pvmw);
return true;
}
}
return not_found(pvmw);
} else {
/* THP pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
} else if (!pmd_present(pmde)) { ---> Outer 'else if'
return false;
}

Looking at the above code, it seems the conditional check for a THP
splitting case would be (!pmd_trans_huge && pmd_present) instead as
it has skipped the first two conditions. But THP splitting must have
been initiated once it has cleared the outer check (else it would not
have cleared otherwise)

if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)).


BTW what PMD state does the outer 'else if' block identify which must
have cleared the following condition to get there.

(!pmd_present && !pmd_trans_huge && !is_pmd_migration_entry)


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Zi Yan
cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
PMD migration entry check)

On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()

!pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,
since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h

> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ? Nonetheless
> pmd_present() seems to be a better check to distinguish between mapped
> and (non-mapped non-present) migration entries without any ambiguity.

If arm64 does it differently, I just wonder how THP splitting is handled
in arm64.


--
Best Regards
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Zi Yan
cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
PMD migration entry check)

On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()

!pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,
since _PAGE_PRESENT is cleared during THP splitting but _PAGE_PSE is not.
See the comment in pmd_present() for x86, in arch/x86/include/asm/pgtable.h

> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ? Nonetheless
> pmd_present() seems to be a better check to distinguish between mapped
> and (non-mapped non-present) migration entries without any ambiguity.

If arm64 does it differently, I just wonder how THP splitting is handled
in arm64.


--
Best Regards
Yan Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Anshuman Khandual



On 10/09/2018 06:34 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>> returns positive for both mapped and migration entries. Could some one
>> please explain why pmd_trans_huge() has to return false for migration
>> entries which just install swap bits and its still a PMD ?
> 
> I guess it's just a design choice. Any reason why arm64 cannot do the
> same?
>
I think probably it can do. I am happy to look into these in detail what
will make pmd_trans_huge() return false on migration entries but it does
not quite sound like a right semantic at the moment.

>> Nonetheless pmd_present() seems to be a better check to distinguish
>> between mapped and (non-mapped non-present) migration entries without
>> any ambiguity.
> 
> Can we instead reverse order of check:
> 
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>   pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>   if (!pmd_present(*pvmw->pmd)) {
>   ...
>   } else if (likely(pmd_trans_huge(*pvmw->pmd))) {
>   ...
>   } else {
>   ...
>   }
> ...
> 
> This should cover both imeplementations of pmd_trans_huge().

Yeah it does cover and I have tested it first before proposing the current
patch. The only problem is that the order saves the code :) Having another
reasonable check like pmd_present() prevents it from being broken if the
code block moves around for some reason. But I am happy to do either way.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Anshuman Khandual



On 10/09/2018 06:34 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
>> A normal mapped THP page at PMD level should be correctly differentiated
>> from a PMD migration entry while walking the page table. A mapped THP would
>> additionally check positive for pmd_present() along with pmd_trans_huge()
>> as compared to a PMD migration entry. This just adds a new conditional test
>> differentiating the two while walking the page table.
>>
>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>> Signed-off-by: Anshuman Khandual 
>> ---
>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
>> exclusive which makes the current conditional block work for both mapped
>> and migration entries. This is not same with arm64 where pmd_trans_huge()
>> returns positive for both mapped and migration entries. Could some one
>> please explain why pmd_trans_huge() has to return false for migration
>> entries which just install swap bits and its still a PMD ?
> 
> I guess it's just a design choice. Any reason why arm64 cannot do the
> same?
>
I think probably it can do. I am happy to look into these in detail what
will make pmd_trans_huge() return false on migration entries but it does
not quite sound like a right semantic at the moment.

>> Nonetheless pmd_present() seems to be a better check to distinguish
>> between mapped and (non-mapped non-present) migration entries without
>> any ambiguity.
> 
> Can we instead reverse order of check:
> 
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>   pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>   if (!pmd_present(*pvmw->pmd)) {
>   ...
>   } else if (likely(pmd_trans_huge(*pvmw->pmd))) {
>   ...
>   } else {
>   ...
>   }
> ...
> 
> This should cover both imeplementations of pmd_trans_huge().

Yeah it does cover and I have tested it first before proposing the current
patch. The only problem is that the order saves the code :) Having another
reasonable check like pmd_present() prevents it from being broken if the
code block moves around for some reason. But I am happy to do either way.


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Will Deacon
On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> > A normal mapped THP page at PMD level should be correctly differentiated
> > from a PMD migration entry while walking the page table. A mapped THP would
> > additionally check positive for pmd_present() along with pmd_trans_huge()
> > as compared to a PMD migration entry. This just adds a new conditional test
> > differentiating the two while walking the page table.
> > 
> > Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> > Signed-off-by: Anshuman Khandual 
> > ---
> > On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> > exclusive which makes the current conditional block work for both mapped
> > and migration entries. This is not same with arm64 where pmd_trans_huge()
> > returns positive for both mapped and migration entries. Could some one
> > please explain why pmd_trans_huge() has to return false for migration
> > entries which just install swap bits and its still a PMD ?
> 
> I guess it's just a design choice. Any reason why arm64 cannot do the
> same?

Anshuman, would it work to:

#define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
PMD_TABLE_BIT))

?

> > Nonetheless pmd_present() seems to be a better check to distinguish
> > between mapped and (non-mapped non-present) migration entries without
> > any ambiguity.
> 
> Can we instead reverse order of check:
> 
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>   pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>   if (!pmd_present(*pvmw->pmd)) {
>   ...
>   } else if (likely(pmd_trans_huge(*pvmw->pmd))) {
>   ...
>   } else {
>   ...
>   }
> ...
> 
> This should cover both imeplementations of pmd_trans_huge().

I'd much rather have portable semantics for pmd_trans_huge(), if we can
achieve that efficiently. But that would be fast /and/ correct, so perhaps
I'm being too hopeful :)

Will


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Will Deacon
On Tue, Oct 09, 2018 at 04:04:21PM +0300, Kirill A. Shutemov wrote:
> On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> > A normal mapped THP page at PMD level should be correctly differentiated
> > from a PMD migration entry while walking the page table. A mapped THP would
> > additionally check positive for pmd_present() along with pmd_trans_huge()
> > as compared to a PMD migration entry. This just adds a new conditional test
> > differentiating the two while walking the page table.
> > 
> > Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> > Signed-off-by: Anshuman Khandual 
> > ---
> > On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> > exclusive which makes the current conditional block work for both mapped
> > and migration entries. This is not same with arm64 where pmd_trans_huge()
> > returns positive for both mapped and migration entries. Could some one
> > please explain why pmd_trans_huge() has to return false for migration
> > entries which just install swap bits and its still a PMD ?
> 
> I guess it's just a design choice. Any reason why arm64 cannot do the
> same?

Anshuman, would it work to:

#define pmd_trans_huge(pmd) (pmd_present(pmd) && !(pmd_val(pmd) & 
PMD_TABLE_BIT))

?

> > Nonetheless pmd_present() seems to be a better check to distinguish
> > between mapped and (non-mapped non-present) migration entries without
> > any ambiguity.
> 
> Can we instead reverse order of check:
> 
> if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
>   pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>   if (!pmd_present(*pvmw->pmd)) {
>   ...
>   } else if (likely(pmd_trans_huge(*pvmw->pmd))) {
>   ...
>   } else {
>   ...
>   }
> ...
> 
> This should cover both imeplementations of pmd_trans_huge().

I'd much rather have portable semantics for pmd_trans_huge(), if we can
achieve that efficiently. But that would be fast /and/ correct, so perhaps
I'm being too hopeful :)

Will


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Kirill A. Shutemov
On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional test
> differentiating the two while walking the page table.
> 
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()
> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ?

I guess it's just a design choice. Any reason why arm64 cannot do the
same?

> Nonetheless pmd_present() seems to be a better check to distinguish
> between mapped and (non-mapped non-present) migration entries without
> any ambiguity.

Can we instead reverse order of check:

if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (!pmd_present(*pvmw->pmd)) {
...
} else if (likely(pmd_trans_huge(*pvmw->pmd))) {
...
} else {
...
}
...

This should cover both imeplementations of pmd_trans_huge().

-- 
 Kirill A. Shutemov


Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-09 Thread Kirill A. Shutemov
On Tue, Oct 09, 2018 at 09:28:58AM +0530, Anshuman Khandual wrote:
> A normal mapped THP page at PMD level should be correctly differentiated
> from a PMD migration entry while walking the page table. A mapped THP would
> additionally check positive for pmd_present() along with pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new conditional test
> differentiating the two while walking the page table.
> 
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> exclusive which makes the current conditional block work for both mapped
> and migration entries. This is not same with arm64 where pmd_trans_huge()
> returns positive for both mapped and migration entries. Could some one
> please explain why pmd_trans_huge() has to return false for migration
> entries which just install swap bits and its still a PMD ?

I guess it's just a design choice. Any reason why arm64 cannot do the
same?

> Nonetheless pmd_present() seems to be a better check to distinguish
> between mapped and (non-mapped non-present) migration entries without
> any ambiguity.

Can we instead reverse order of check:

if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (!pmd_present(*pvmw->pmd)) {
...
} else if (likely(pmd_trans_huge(*pvmw->pmd))) {
...
} else {
...
}
...

This should cover both imeplementations of pmd_trans_huge().

-- 
 Kirill A. Shutemov


[PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-08 Thread Anshuman Khandual
A normal mapped THP page at PMD level should be correctly differentiated
from a PMD migration entry while walking the page table. A mapped THP would
additionally check positive for pmd_present() along with pmd_trans_huge()
as compared to a PMD migration entry. This just adds a new conditional test
differentiating the two while walking the page table.

Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
Signed-off-by: Anshuman Khandual 
---
On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
exclusive which makes the current conditional block work for both mapped
and migration entries. This is not same with arm64 where pmd_trans_huge()
returns positive for both mapped and migration entries. Could some one
please explain why pmd_trans_huge() has to return false for migration
entries which just install swap bits and its still a PMD ? Nonetheless
pmd_present() seems to be a better check to distinguish between mapped
and (non-mapped non-present) migration entries without any ambiguity.

 mm/page_vma_mapped.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ae3c2a3..b384396 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -161,7 +161,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
pmde = READ_ONCE(*pvmw->pmd);
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
-   if (likely(pmd_trans_huge(*pvmw->pmd))) {
+   if (likely(pmd_trans_huge(*pvmw->pmd) &&
+   pmd_present(*pvmw->pmd))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
if (pmd_page(*pvmw->pmd) != page)
-- 
2.7.4



[PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-08 Thread Anshuman Khandual
A normal mapped THP page at PMD level should be correctly differentiated
from a PMD migration entry while walking the page table. A mapped THP would
additionally check positive for pmd_present() along with pmd_trans_huge()
as compared to a PMD migration entry. This just adds a new conditional test
differentiating the two while walking the page table.

Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
Signed-off-by: Anshuman Khandual 
---
On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
exclusive which makes the current conditional block work for both mapped
and migration entries. This is not same with arm64 where pmd_trans_huge()
returns positive for both mapped and migration entries. Could some one
please explain why pmd_trans_huge() has to return false for migration
entries which just install swap bits and its still a PMD ? Nonetheless
pmd_present() seems to be a better check to distinguish between mapped
and (non-mapped non-present) migration entries without any ambiguity.

 mm/page_vma_mapped.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ae3c2a3..b384396 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -161,7 +161,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
pmde = READ_ONCE(*pvmw->pmd);
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
-   if (likely(pmd_trans_huge(*pvmw->pmd))) {
+   if (likely(pmd_trans_huge(*pvmw->pmd) &&
+   pmd_present(*pvmw->pmd))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
if (pmd_page(*pvmw->pmd) != page)
-- 
2.7.4