Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 04/08/2020 05:45 PM, Gerald Schaefer wrote: > On Wed, 8 Apr 2020 12:41:51 +0530 > Anshuman Khandual wrote: > > [...] >>> Some thing like this instead. pte_t pte = READ_ONCE(*ptep); pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); We cannot use mk_pte_phys() as it is defined only on some platforms without any generic fallback for others. >>> >>> Oh, didn't know that, sorry. What about using mk_pte() instead, at least >>> it would result in a present pte: >>> >>> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); >> >> Lets use mk_pte() here but can we do this instead >> >> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK; >> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot)); >> > > Sure, that will also work. > > BTW, this RANDOM_ORVALUE is not really very random, the way it is > defined. For s390 we already changed it to mask out some arch bits, > but I guess there are other archs and bits that would always be > set with this "not so random" value, and I wonder if/how that would > affect all the tests using this value, see also below. RANDOM_ORVALUE is a constant which was added in order to make sure that the page table entries should have some non-zero value before getting called with pxx_clear() and followed by a pxx_none() check. This is currently used only in pxx_clear_tests() tests. Hence there is no impact for the existing tests. > >>> >>> And if you also want to do some with the existing value, which seems >>> to be an empty pte, then maybe just check if writing and reading that >>> value with set_huge_pte_at() / huge_ptep_get() returns the same, >>> i.e. initially w/o RANDOM_ORVALUE. >>> >>> So, in combination, like this (BTW, why is the barrier() needed, it >>> is not used for the other set_huge_pte_at() calls later?): >> >> Ahh missed, will add them. Earlier we faced problem without it after >> set_pte_at() for a test on powerpc (64) platform. Hence just added it >> here to be extra careful. >> >>> >>> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test >>> struct page *page = pfn_to_page(pfn); >>> pte_t pte = READ_ONCE(*ptep); >>> >>> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); >>> + set_huge_pte_at(mm, vaddr, ptep, pte); >>> + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); >>> + >>> + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), >>> prot)); >>> set_huge_pte_at(mm, vaddr, ptep, pte); >>> barrier(); >>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); >>> >>> This would actually add a new test "write empty pte with >>> set_huge_pte_at(), then verify with huge_ptep_get()", which happens >>> to trigger a warning on s390 :-) >> >> On arm64 as well which checks for pte_present() in set_huge_pte_at(). >> But PTE present check is not really present in each set_huge_pte_at() >> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT. >> Hence wondering if we should add this new test here which will keep >> giving warnings on s390 and arm64 (at the least). > > Hmm, interesting. I forgot about huge swap / migration, which is not > (and probably cannot be) supported on s390. The pte_present() check > on arm64 seems to check for such huge swap / migration entries, > according to the comment. > > The new test "write empty pte with set_huge_pte_at(), then verify > with huge_ptep_get()" would then probably trigger the > WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty > ptes with set_huge_pte_at()" is not really a valid use case in practice, > or else you would have seen this warning before. In that case, it > might not be a good idea to add this test. Got it. > > I also do wonder now, why the original test with > "pte = __pte(pte_val(pte) | RANDOM_ORVALUE);" > did not also trigger that warning on arm64. On s390 this test failed > exactly because the constructed pte was not present (initially empty, > or'ing RANDOM_ORVALUE does not make it present for s390). I guess this > just worked by chance on arm64, because the bits from RANDOM_ORVALUE > also happened to mark the pte present for arm64. That is correct. RANDOM_ORVALUE has got PTE_PROT_NONE bit set that makes the PTE test for pte_present(). On arm64 platform, #define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE))) > > This brings us back to the question above, regarding the "randomness" > of RANDOM_ORVALUE. Not really sure what the intention behind that was, > but maybe it would make sense to restrict this RANDOM_ORVALUE to > non-arch-specific bits, i.e. only bits that would be part of the > address value within a page table entry? Or was it intentionally > chosen to also mess with other bits? As mentioned before, RANDOM_ORVALUE just helped make a given page table entry contain non-zero values before getting cleared. AFAICS we should not need RANDOM_ORVALUE for HugeTLB test here. I believe
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On Wed, 8 Apr 2020 12:41:51 +0530 Anshuman Khandual wrote: [...] > > > >> > >> Some thing like this instead. > >> > >> pte_t pte = READ_ONCE(*ptep); > >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > >> > >> We cannot use mk_pte_phys() as it is defined only on some platforms > >> without any generic fallback for others. > > > > Oh, didn't know that, sorry. What about using mk_pte() instead, at least > > it would result in a present pte: > > > > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); > > Lets use mk_pte() here but can we do this instead > > paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK; > pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot)); > Sure, that will also work. BTW, this RANDOM_ORVALUE is not really very random, the way it is defined. For s390 we already changed it to mask out some arch bits, but I guess there are other archs and bits that would always be set with this "not so random" value, and I wonder if/how that would affect all the tests using this value, see also below. > > > > And if you also want to do some with the existing value, which seems > > to be an empty pte, then maybe just check if writing and reading that > > value with set_huge_pte_at() / huge_ptep_get() returns the same, > > i.e. initially w/o RANDOM_ORVALUE. > > > > So, in combination, like this (BTW, why is the barrier() needed, it > > is not used for the other set_huge_pte_at() calls later?): > > Ahh missed, will add them. Earlier we faced problem without it after > set_pte_at() for a test on powerpc (64) platform. Hence just added it > here to be extra careful. > > > > > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test > > struct page *page = pfn_to_page(pfn); > > pte_t pte = READ_ONCE(*ptep); > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + set_huge_pte_at(mm, vaddr, ptep, pte); > > + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > + > > + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), > > prot)); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > > > This would actually add a new test "write empty pte with > > set_huge_pte_at(), then verify with huge_ptep_get()", which happens > > to trigger a warning on s390 :-) > > On arm64 as well which checks for pte_present() in set_huge_pte_at(). > But PTE present check is not really present in each set_huge_pte_at() > implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT. > Hence wondering if we should add this new test here which will keep > giving warnings on s390 and arm64 (at the least). Hmm, interesting. I forgot about huge swap / migration, which is not (and probably cannot be) supported on s390. The pte_present() check on arm64 seems to check for such huge swap / migration entries, according to the comment. The new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()" would then probably trigger the WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty ptes with set_huge_pte_at()" is not really a valid use case in practice, or else you would have seen this warning before. In that case, it might not be a good idea to add this test. I also do wonder now, why the original test with "pte = __pte(pte_val(pte) | RANDOM_ORVALUE);" did not also trigger that warning on arm64. On s390 this test failed exactly because the constructed pte was not present (initially empty, or'ing RANDOM_ORVALUE does not make it present for s390). I guess this just worked by chance on arm64, because the bits from RANDOM_ORVALUE also happened to mark the pte present for arm64. This brings us back to the question above, regarding the "randomness" of RANDOM_ORVALUE. Not really sure what the intention behind that was, but maybe it would make sense to restrict this RANDOM_ORVALUE to non-arch-specific bits, i.e. only bits that would be part of the address value within a page table entry? Or was it intentionally chosen to also mess with other bits? Regards, Gerald
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 04/07/2020 09:24 PM, Gerald Schaefer wrote: > On Sun, 5 Apr 2020 17:58:14 +0530 > Anshuman Khandual wrote: > > [...] >>> >>> Could be fixed like this (the first de-reference is a bit special, >>> because at that point *ptep does not really point to a large (pmd) entry >>> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() >>> >> >> There seems to be an inconsistency on s390 platform. Even though it defines >> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET >> which should have forced it fallback on generic huge_ptep_get() but it does >> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when >> an arch uses . s390 does not use that and hence gets >> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds >> confusing ? But I might not have the entire context here. > > Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() > was initially introduced because of s390, and now we don't select > __HAVE_ARCH_HUGE_PTEP_GET... > > As you realized, I guess this is because we do not use generic hugetlb.h. > And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad > ("hugetlb: introduce generic version of huge_ptep_get"), that was probably > the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Understood. > > Nothing really wrong with that, but yes, very confusing. Maybe we could > also select it for s390, even though it wouldn't have any functional > impact (so far), just for less confusion. Maybe also thinking about > using the generic hugetlb.h, not sure if the original reasons for not > doing so would still apply. Now I only need to find the time... Seems like something worth to explore if we could remove this confusion. > >> >>> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, >>> because we do have some special bits there in our large pmds. It seems >>> to also work w/o that alignment, but it feels a bit wrong): >> >> Sure, we can accommodate that. >> >>> >>> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test >>> unsigned long vaddr, pgprot_t >>> prot) >>> { >>> struct page *page = pfn_to_page(pfn); >>> - pte_t pte = READ_ONCE(*ptep); >>> + pte_t pte; >>> >>> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); >>> + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); >> >> So that keeps the existing value in 'ptep' pointer at bay and instead >> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at >> least provide the seed that can be ORed with RANDOM_ORVALUE before >> being masked with PMD_MASK. Do you see any problem ? > > Yes, unfortunately. The problem is that the resulting pte is not marked > as present. The conversion pte -> (huge) pmd, which is done in > set_huge_pte_at() for s390, will establish an empty pmd for non-present > ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent > huge_ptep_get() will not result in the same original pte value. If you Ohh. > want to preserve and check the RANDOM_ORVALUE, it has to be a present > pte, hence the mk_pte(_phys). Understood and mk_pte() is also available on all platforms. > >> >> Some thing like this instead. >> >> pte_t pte = READ_ONCE(*ptep); >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); >> >> We cannot use mk_pte_phys() as it is defined only on some platforms >> without any generic fallback for others. > > Oh, didn't know that, sorry. What about using mk_pte() instead, at least > it would result in a present pte: > > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); Lets use mk_pte() here but can we do this instead paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK; pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot)); > > And if you also want to do some with the existing value, which seems > to be an empty pte, then maybe just check if writing and reading that > value with set_huge_pte_at() / huge_ptep_get() returns the same, > i.e. initially w/o RANDOM_ORVALUE. > > So, in combination, like this (BTW, why is the barrier() needed, it > is not used for the other set_huge_pte_at() calls later?): Ahh missed, will add them. Earlier we faced problem without it after set_pte_at() for a test on powerpc (64) platform. Hence just added it here to be extra careful. > > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test > struct page *page = pfn_to_page(pfn); > pte_t pte = READ_ONCE(*ptep); > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > + set_huge_pte_at(mm, vaddr, ptep, pte); > + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > + > + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), > prot)); > set_huge_pte_at(mm, vaddr, ptep, pte); > barrier(); > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > >
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t > > prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 03/31/2020 06:00 PM, Gerald Schaefer wrote: > On Tue, 24 Mar 2020 10:52:52 +0530 > Anshuman Khandual wrote: > >> This series adds more arch page table helper tests. The new tests here are >> either related to core memory functions and advanced arch pgtable helpers. >> This also creates a documentation file enlisting all expected semantics as >> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). >> >> This series has been tested on arm64 and x86 platforms. There is just one >> expected failure on arm64 that will be fixed when we enable THP migration. >> >> [ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782 >> >> which corresponds to >> >> WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd >> >> There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD >> ifdefs scattered across the test. But consolidating all the fallback stubs >> is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is >> not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE. >> >> This series has been build tested on many platforms including the ones that >> subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE. >> > > Hi Anshuman, > > thanks for the update. There are a couple of issues on s390, some might > also affect other archs. Sure, thanks for taking a look and giving it a spin on s390. > > 1) The pxd_huge_tests are using pxd_set/clear_huge, which defaults to > returning 0 if !CONFIG_HAVE_ARCH_HUGE_VMAP. As result, the checks for > !pxd_test/clear_huge in the pxd_huge_tests will always trigger the > warning. This should affect all archs w/o CONFIG_HAVE_ARCH_HUGE_VMAP. > Could be fixed like this: > > @@ -923,8 +923,10 @@ void __init debug_vm_pgtable(void) > pmd_leaf_tests(pmd_aligned, prot); > pud_leaf_tests(pud_aligned, prot); > > - pmd_huge_tests(pmdp, pmd_aligned, prot); > - pud_huge_tests(pudp, pud_aligned, prot); > + if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) { > + pmd_huge_tests(pmdp, pmd_aligned, prot); > + pud_huge_tests(pudp, pud_aligned, prot); > + } That is correct. It was an omission on my part and will fix it. > > pte_savedwrite_tests(pte_aligned, prot); > pmd_savedwrite_tests(pmd_aligned, prot); > > BTW, please add some comments to the various #ifdef/#else stuff, especially > when the different parts are far away and/or nested. Sure, will do. > > 2) The hugetlb_advanced_test will fail because it directly de-references > huge *ptep pointers instead of using huge_ptep_get() for this. We have > very different pagetable entry layout for pte and (large) pmd on s390, > and unfortunately the whole hugetlbfs code is using pte_t instead of pmd_t > like THP. For this reason, huge_ptep_get() was introduced, which will > return a "converted" pte, because directly reading from a *ptep (pointing > to a large pmd) will not return a proper pte. Only ARM has also an > implementation of huge_ptep_get(), so they could be affected, depending > on what exactly they need it for. Currently, we dont support ARM (32). But as huge_ptep_get() already got a fallback, its better to use that than a direct READ_ONCE(). > > Could be fixed like this (the first de-reference is a bit special, > because at that point *ptep does not really point to a large (pmd) entry > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() There seems to be an inconsistency on s390 platform. Even though it defines a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET which should have forced it fallback on generic huge_ptep_get() but it does not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when an arch uses . s390 does not use that and hence gets away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds confusing ? But I might not have the entire context here. > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > because we do have some special bits there in our large pmds. It seems > to also work w/o that alignment, but it feels a bit wrong): Sure, we can accommodate that. > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > unsigned long vaddr, pgprot_t prot) > { > struct page *page = pfn_to_page(pfn); > - pte_t pte = READ_ONCE(*ptep); > + pte_t pte; > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); So that keeps the existing value in 'ptep' pointer at bay and instead construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at least provide the seed that can be ORed with RANDOM_ORVALUE before being masked with PMD_MASK. Do you see any problem ? Some thing like this instead. pte_t pte = READ_ONCE(*ptep); pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); We cannot use mk_pte_phys() as it is
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On Tue, 24 Mar 2020 10:52:52 +0530 Anshuman Khandual wrote: > This series adds more arch page table helper tests. The new tests here are > either related to core memory functions and advanced arch pgtable helpers. > This also creates a documentation file enlisting all expected semantics as > suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). > > This series has been tested on arm64 and x86 platforms. There is just one > expected failure on arm64 that will be fixed when we enable THP migration. > > [ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782 > > which corresponds to > > WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd > > There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD > ifdefs scattered across the test. But consolidating all the fallback stubs > is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is > not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE. > > This series has been build tested on many platforms including the ones that > subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE. > Hi Anshuman, thanks for the update. There are a couple of issues on s390, some might also affect other archs. 1) The pxd_huge_tests are using pxd_set/clear_huge, which defaults to returning 0 if !CONFIG_HAVE_ARCH_HUGE_VMAP. As result, the checks for !pxd_test/clear_huge in the pxd_huge_tests will always trigger the warning. This should affect all archs w/o CONFIG_HAVE_ARCH_HUGE_VMAP. Could be fixed like this: @@ -923,8 +923,10 @@ void __init debug_vm_pgtable(void) pmd_leaf_tests(pmd_aligned, prot); pud_leaf_tests(pud_aligned, prot); - pmd_huge_tests(pmdp, pmd_aligned, prot); - pud_huge_tests(pudp, pud_aligned, prot); + if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) { + pmd_huge_tests(pmdp, pmd_aligned, prot); + pud_huge_tests(pudp, pud_aligned, prot); + } pte_savedwrite_tests(pte_aligned, prot); pmd_savedwrite_tests(pmd_aligned, prot); BTW, please add some comments to the various #ifdef/#else stuff, especially when the different parts are far away and/or nested. 2) The hugetlb_advanced_test will fail because it directly de-references huge *ptep pointers instead of using huge_ptep_get() for this. We have very different pagetable entry layout for pte and (large) pmd on s390, and unfortunately the whole hugetlbfs code is using pte_t instead of pmd_t like THP. For this reason, huge_ptep_get() was introduced, which will return a "converted" pte, because directly reading from a *ptep (pointing to a large pmd) will not return a proper pte. Only ARM has also an implementation of huge_ptep_get(), so they could be affected, depending on what exactly they need it for. Could be fixed like this (the first de-reference is a bit special, because at that point *ptep does not really point to a large (pmd) entry yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, because we do have some special bits there in our large pmds. It seems to also work w/o that alignment, but it feels a bit wrong): @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test unsigned long vaddr, pgprot_t prot) { struct page *page = pfn_to_page(pfn); - pte_t pte = READ_ONCE(*ptep); + pte_t pte; - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); - pte = READ_ONCE(*ptep); + pte = huge_ptep_get(ptep); WARN_ON(!huge_pte_none(pte)); pte = mk_huge_pte(page, prot); set_huge_pte_at(mm, vaddr, ptep, pte); huge_ptep_set_wrprotect(mm, vaddr, ptep); - pte = READ_ONCE(*ptep); + pte = huge_ptep_get(ptep); WARN_ON(huge_pte_write(pte)); pte = mk_huge_pte(page, prot); set_huge_pte_at(mm, vaddr, ptep, pte); huge_ptep_get_and_clear(mm, vaddr, ptep); - pte = READ_ONCE(*ptep); + pte = huge_ptep_get(ptep); WARN_ON(!huge_pte_none(pte)); pte = mk_huge_pte(page, prot); @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test pte = huge_pte_mkwrite(pte); pte = huge_pte_mkdirty(pte); huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); - pte = READ_ONCE(*ptep); + pte = huge_ptep_get(ptep); WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); } #else 3) The pmd_protnone_tests() has an issue, because it passes a pmd to pmd_protnone() which has not been marked as large. We check for large pmd in the s390 implementation of pmd_protnone(), and will fail if a pmd is not large. We had similar issues before, in other helpers, where I
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 03/27/2020 12:30 PM, Christophe Leroy wrote: > > > On 03/27/2020 06:46 AM, Anshuman Khandual wrote: >> >> On 03/26/2020 08:53 PM, Christophe Leroy wrote: >>> >>> >>> Le 26/03/2020 à 03:23, Anshuman Khandual a écrit : On 03/24/2020 10:52 AM, Anshuman Khandual wrote: > This series adds more arch page table helper tests. The new tests here are > either related to core memory functions and advanced arch pgtable helpers. > This also creates a documentation file enlisting all expected semantics as > suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). > > This series has been tested on arm64 and x86 platforms. If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really appreciated. Thank you. >>> >>> On powerpc 8xx (PPC32), I get: >>> >>> [ 53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture >>> page table helpers >>> [ 53.347403] [ cut here ] >>> [ 53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 >>> debug_vm_pgtable+0x280/0x3f4 >> >> mm/debug_vm_pgtable.c:647 ? >> >> With the following commits in place >> >> 53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table >> helper >> 5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers >> bcaf120a7 mm/debug: Add tests validating arch page table helpers for core >> features >> d6ed5a4a5 x86/memory: Drop pud_mknotpresent() >> 0739d1f8d mm/debug: Add tests validating architecture page table helpers >> 16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7 > > I have: > > facaa5eb5909 (HEAD -> helpers0) mm/debug: Add tests validating arch advanced > page table helpers > 6389fed515fc mm/debug: Add tests validating arch page table helpers for core > features > dc14ecc8b94e mm/debug: add tests validating architecture page table helpers > c6624071c338 (origin/merge, merge) Automatic merge of branches 'master', > 'next' and 'fixes' into merge > 58e05c5508e6 Automatic merge of branches 'master', 'next' and 'fixes' into > merge > 1b649e0bcae7 (origin/master, origin/HEAD) Merge > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net > > origin is https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > > I can't see your last patch in powerpc mailing list > (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166237) My bad, did not update the last patch with required lists (will fix). > >> >> mm/debug_vm_pgtable.c:647 is here. > > Line 647 is: > > WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp))); Both set of definitions suggest that the last three bits (if present) on the PTE will be discarded during PTE->SWP->PTE conversion which might be leading to this mismatch and subsequent failure. arch/powerpc/include/asm/nohash/32/pgtable.h arch/powerpc/include/asm/book3s/32/pgtable.h #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) Also there are some more architectures (microblaze, sh, etc) where these conversions are not always preserving. On powerpc64, it sets back _PAGE_PTE irrespective of whether the bit was originally set or not. Probably it is wrong to expect that PTE->SWP->PTE conversion will be always preserving. So wondering if it is worth changing this test to accommodate all such architectures or just drop it instead. > > >> >> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION >> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) >> { >> swp_entry_t swp; >> pmd_t pmd; -> Line #647 >> >> pmd = pfn_pmd(pfn, prot); >> swp = __pmd_to_swp_entry(pmd); >> WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp))); >> } >> #else >> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { } >> #end >> >> Did I miss something ? >> > > [...] > >> Could you please point me to the exact test which is failing ? >> >>> [ 53.519778] Freeing unused kernel memory: 608K >>> >>> >> So I assume that the system should have come till runtime just fine apart >> from >> the above warning message because. >> > > Yes it boots fine otherwise. Cool, that is good to know. > > Christophe >
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 03/27/2020 06:46 AM, Anshuman Khandual wrote: On 03/26/2020 08:53 PM, Christophe Leroy wrote: Le 26/03/2020 à 03:23, Anshuman Khandual a écrit : On 03/24/2020 10:52 AM, Anshuman Khandual wrote: This series adds more arch page table helper tests. The new tests here are either related to core memory functions and advanced arch pgtable helpers. This also creates a documentation file enlisting all expected semantics as suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). This series has been tested on arm64 and x86 platforms. If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really appreciated. Thank you. On powerpc 8xx (PPC32), I get: [ 53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers [ 53.347403] [ cut here ] [ 53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 debug_vm_pgtable+0x280/0x3f4 mm/debug_vm_pgtable.c:647 ? With the following commits in place 53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table helper 5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers bcaf120a7 mm/debug: Add tests validating arch page table helpers for core features d6ed5a4a5 x86/memory: Drop pud_mknotpresent() 0739d1f8d mm/debug: Add tests validating architecture page table helpers 16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7 I have: facaa5eb5909 (HEAD -> helpers0) mm/debug: Add tests validating arch advanced page table helpers 6389fed515fc mm/debug: Add tests validating arch page table helpers for core features dc14ecc8b94e mm/debug: add tests validating architecture page table helpers c6624071c338 (origin/merge, merge) Automatic merge of branches 'master', 'next' and 'fixes' into merge 58e05c5508e6 Automatic merge of branches 'master', 'next' and 'fixes' into merge 1b649e0bcae7 (origin/master, origin/HEAD) Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net origin is https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git I can't see your last patch in powerpc mailing list (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166237) mm/debug_vm_pgtable.c:647 is here. Line 647 is: WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp))); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { swp_entry_t swp; pmd_t pmd; -> Line #647 pmd = pfn_pmd(pfn, prot); swp = __pmd_to_swp_entry(pmd); WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp))); } #else static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { } #end Did I miss something ? [...] Could you please point me to the exact test which is failing ? [ 53.519778] Freeing unused kernel memory: 608K So I assume that the system should have come till runtime just fine apart from the above warning message because. Yes it boots fine otherwise. Christophe
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 03/26/2020 08:53 PM, Christophe Leroy wrote: > > > Le 26/03/2020 à 03:23, Anshuman Khandual a écrit : >> >> >> On 03/24/2020 10:52 AM, Anshuman Khandual wrote: >>> This series adds more arch page table helper tests. The new tests here are >>> either related to core memory functions and advanced arch pgtable helpers. >>> This also creates a documentation file enlisting all expected semantics as >>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). >>> >>> This series has been tested on arm64 and x86 platforms. >> >> If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE >> enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really >> appreciated. Thank you. >> > > On powerpc 8xx (PPC32), I get: > > [ 53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture > page table helpers > [ 53.347403] [ cut here ] > [ 53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 > debug_vm_pgtable+0x280/0x3f4 mm/debug_vm_pgtable.c:647 ? With the following commits in place 53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table helper 5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers bcaf120a7 mm/debug: Add tests validating arch page table helpers for core features d6ed5a4a5 x86/memory: Drop pud_mknotpresent() 0739d1f8d mm/debug: Add tests validating architecture page table helpers 16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7 mm/debug_vm_pgtable.c:647 is here. #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { swp_entry_t swp; pmd_t pmd; -> Line #647 pmd = pfn_pmd(pfn, prot); swp = __pmd_to_swp_entry(pmd); WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp))); } #else static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { } #end Did I miss something ? > [ 53.360140] CPU: 0 PID: 1 Comm: swapper Not tainted > 5.6.0-rc7-s3k-dev-01090-g92710e99881f #3544 > [ 53.368718] NIP: c0777c04 LR: c0777bb8 CTR: > [ 53.373720] REGS: c9023df0 TRAP: 0700 Not tainted > (5.6.0-rc7-s3k-dev-01090-g92710e99881f) > [ 53.382042] MSR: 00029032 CR: 22000222 XER: 2000 > [ 53.388667] > [ 53.388667] GPR00: c0777bb8 c9023ea8 c612 0001 1e41 > 007641c9 > [ 53.388667] GPR08: 0001 82000222 > c00039b8 > [ 53.388667] GPR16: fff0 065fc000 1e41 > c660 01e4 > [ 53.388667] GPR24: 01d9 c062d14c c65fc000 c642d448 06c9 > c65f8000 c65fc040 > [ 53.423400] NIP [c0777c04] debug_vm_pgtable+0x280/0x3f4 > [ 53.428559] LR [c0777bb8] debug_vm_pgtable+0x234/0x3f4 > [ 53.433593] Call Trace: > [ 53.436048] [c9023ea8] [c0777bb8] debug_vm_pgtable+0x234/0x3f4 (unreliable) > [ 53.442936] [c9023f28] [c00039e0] kernel_init+0x28/0x124 > [ 53.448184] [c9023f38] [c000f174] ret_from_kernel_thread+0x14/0x1c > [ 53.454245] Instruction dump: > [ 53.457180] 41a20008 4bea3ed9 62890021 7d36b92e 7d36b82e 71290fd0 3149 > 7d2a4910 > [ 53.464838] 0f09 5789077e 3149 7d2a4910 <0f09> 38c0 > 38a0 3880 > [ 53.472671] ---[ end trace fd5dd92744dc0065 ]--- Could you please point me to the exact test which is failing ? > [ 53.519778] Freeing unused kernel memory: 608K > > So I assume that the system should have come till runtime just fine apart from the above warning message because.
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
Le 26/03/2020 à 03:23, Anshuman Khandual a écrit : On 03/24/2020 10:52 AM, Anshuman Khandual wrote: This series adds more arch page table helper tests. The new tests here are either related to core memory functions and advanced arch pgtable helpers. This also creates a documentation file enlisting all expected semantics as suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). This series has been tested on arm64 and x86 platforms. If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really appreciated. Thank you. On powerpc 8xx (PPC32), I get: [ 53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers [ 53.347403] [ cut here ] [ 53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 debug_vm_pgtable+0x280/0x3f4 [ 53.360140] CPU: 0 PID: 1 Comm: swapper Not tainted 5.6.0-rc7-s3k-dev-01090-g92710e99881f #3544 [ 53.368718] NIP: c0777c04 LR: c0777bb8 CTR: [ 53.373720] REGS: c9023df0 TRAP: 0700 Not tainted (5.6.0-rc7-s3k-dev-01090-g92710e99881f) [ 53.382042] MSR: 00029032 CR: 22000222 XER: 2000 [ 53.388667] [ 53.388667] GPR00: c0777bb8 c9023ea8 c612 0001 1e41 007641c9 [ 53.388667] GPR08: 0001 82000222 c00039b8 [ 53.388667] GPR16: fff0 065fc000 1e41 c660 01e4 [ 53.388667] GPR24: 01d9 c062d14c c65fc000 c642d448 06c9 c65f8000 c65fc040 [ 53.423400] NIP [c0777c04] debug_vm_pgtable+0x280/0x3f4 [ 53.428559] LR [c0777bb8] debug_vm_pgtable+0x234/0x3f4 [ 53.433593] Call Trace: [ 53.436048] [c9023ea8] [c0777bb8] debug_vm_pgtable+0x234/0x3f4 (unreliable) [ 53.442936] [c9023f28] [c00039e0] kernel_init+0x28/0x124 [ 53.448184] [c9023f38] [c000f174] ret_from_kernel_thread+0x14/0x1c [ 53.454245] Instruction dump: [ 53.457180] 41a20008 4bea3ed9 62890021 7d36b92e 7d36b82e 71290fd0 3149 7d2a4910 [ 53.464838] 0f09 5789077e 3149 7d2a4910 <0f09> 38c0 38a0 3880 [ 53.472671] ---[ end trace fd5dd92744dc0065 ]--- [ 53.519778] Freeing unused kernel memory: 608K
Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests
On 03/24/2020 10:52 AM, Anshuman Khandual wrote: > This series adds more arch page table helper tests. The new tests here are > either related to core memory functions and advanced arch pgtable helpers. > This also creates a documentation file enlisting all expected semantics as > suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). > > This series has been tested on arm64 and x86 platforms. If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really appreciated. Thank you. - Anshuman
[PATCH V2 0/3] mm/debug: Add more arch page table helper tests
This series adds more arch page table helper tests. The new tests here are either related to core memory functions and advanced arch pgtable helpers. This also creates a documentation file enlisting all expected semantics as suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40). This series has been tested on arm64 and x86 platforms. There is just one expected failure on arm64 that will be fixed when we enable THP migration. [ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782 which corresponds to WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD ifdefs scattered across the test. But consolidating all the fallback stubs is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE. This series has been build tested on many platforms including the ones that subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE. This series is based on v5.6-rc7 after applying these following patches. 1. https://patchwork.kernel.org/patch/11431277/ 2. https://patchwork.kernel.org/patch/11452185/ Changes in V2: - Dropped CONFIG_ARCH_HAS_PTE_SPECIAL per Christophe - Dropped CONFIG_NUMA_BALANCING per Christophe - Dropped CONFIG_HAVE_ARCH_SOFT_DIRTY per Christophe - Dropped CONFIG_MIGRATION per Christophe - Replaced CONFIG_S390 with __HAVE_ARCH_PMDP_INVALIDATE - Moved page allocation & free inside swap_migration_tests() per Christophe - Added CONFIG_TRANSPARENT_HUGEPAGE to protect pfn_pmd() - Added CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD to protect pfn_pud() - Added a patch for other arch advanced page table helper tests - Added a patch creating a documentation for page table helper semantics Changes in V1: (https://patchwork.kernel.org/patch/11408253/) Cc: Jonathan Corbet Cc: Andrew Morton Cc: Mike Rapoport Cc: Vineet Gupta Cc: Catalin Marinas Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Kirill A. Shutemov Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux-ri...@lists.infradead.org Cc: x...@kernel.org Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Anshuman Khandual (3): mm/debug: Add tests validating arch page table helpers for core features mm/debug: Add tests validating arch advanced page table helpers Documentation/mm: Add descriptions for arch page table helpers Documentation/vm/arch_pgtable_helpers.rst | 256 ++ mm/debug_vm_pgtable.c | 594 +- 2 files changed, 839 insertions(+), 11 deletions(-) create mode 100644 Documentation/vm/arch_pgtable_helpers.rst -- 2.20.1