gt; > * gup_hugepte() -> gup_fast_hugepte()
>
> I just realized that we end up calling these from follow_hugepd() as well.
> And something seems to be off, because gup_fast_hugepd() won't have the VMA
> even in the slow-GUP case to pass it to gup_must_unshare().
>
> So these
On Tue, Apr 16, 2024 at 10:58:33AM +, Christophe Leroy wrote:
>
>
> Le 15/04/2024 à 21:12, Christophe Leroy a écrit :
> >
> >
> > Le 12/04/2024 à 16:30, Peter Xu a écrit :
> >> On Fri, Apr 12, 2024 at 02:08:03PM +, Christophe Leroy wrote:
> >&
On Fri, Apr 12, 2024 at 02:08:03PM +, Christophe Leroy wrote:
>
>
> Le 11/04/2024 à 18:15, Peter Xu a écrit :
> > On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote:
> >> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote:
> >>>
On Thu, Apr 11, 2024 at 06:55:44PM +0200, Paolo Bonzini wrote:
> On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote:
> > Paolo,
> >
> > I may miss a bunch of details here (as I still remember some change_pte
> > patches previously on the list..), however not sure whether
to position my next step; it seems like at least I should not
adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks,
or you're going to have an RFC soon then I can base on top?
Thanks,
--
Peter Xu
On Wed, Apr 10, 2024 at 04:30:41PM +, Christophe Leroy wrote:
>
>
> Le 10/04/2024 à 17:28, Peter Xu a écrit :
> > On Tue, Apr 09, 2024 at 08:43:55PM -0300, Jason Gunthorpe wrote:
> >> On Fri, Apr 05, 2024 at 05:42:44PM -0400, Peter Xu wrote:
> >>> In
On Tue, Apr 09, 2024 at 08:43:55PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 05, 2024 at 05:42:44PM -0400, Peter Xu wrote:
> > In short, hugetlb mappings shouldn't be special comparing to other huge pXd
> > and large folio (cont-pXd) mappings for most of the walkers in my
ked because I remember Andrea used to have a custom tree
maintaining that part:
https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968
Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?
Thanks,
--
Peter Xu
On Fri, Apr 05, 2024 at 03:16:33PM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 04, 2024 at 05:48:03PM -0400, Peter Xu wrote:
> > On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote:
> > > The more I look at this the more I think we need to get to Matthew's
&g
r too much if we go the generic route now?
Considering that we already have most of pmd/pud entries around in the mm
walker ops. So far it sounds better we leave it for later, until further
justifed to be useful. And that won't block it if it ever justified to be
needed, I'd say it can also be seen as a step forward if I can make it to
remove hugetlb_entry() first.
Comments welcomed (before I start to work on anything..).
Thanks,
--
Peter Xu
On Thu, Apr 04, 2024 at 08:24:04AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 03, 2024 at 02:25:20PM -0400, Peter Xu wrote:
>
> > > I'd say the BUILD_BUG has done it's job and found an issue, fix it by
> > > not defining pud_leaf? I don't see any calls
On Wed, Apr 03, 2024 at 09:08:41AM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 02, 2024 at 07:35:45PM -0400, Peter Xu wrote:
> > On Tue, Apr 02, 2024 at 07:53:20PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Apr 02, 2024 at 06:43:56PM -0400, Peter Xu wrote:
> > >
>
On Tue, Apr 02, 2024 at 07:53:20PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 02, 2024 at 06:43:56PM -0400, Peter Xu wrote:
>
> > I actually tested this without hitting the issue (even though I didn't
> > mention it in the cover letter..). I re-kicked the build test, it
On Tue, Apr 02, 2024 at 12:05:49PM -0700, Nathan Chancellor wrote:
> Hi Peter (and LoongArch folks),
>
> On Wed, Mar 27, 2024 at 11:23:24AM -0400, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > The comment in the code explains the reasons. We took a differen
lbs of all sizes for the tests:
>
> "transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2"
This helps, thanks.
--
Peter Xu
On Tue, Apr 02, 2024 at 06:39:31PM +0200, David Hildenbrand wrote:
> On 02.04.24 18:20, Peter Xu wrote:
> > On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
> > > On 02.04.24 16:48, Ryan Roberts wrote:
> > > > Hi Peter,
> >
> >
t;flags);
> }
>
> Which is called from can_follow_write_pmd(), called just after the assert I
> just commented out.
>
>
> It's triggered by this test:
>
> # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd
> hugetlb (32768 kB)
>
> Which is the first MAP_PRIVATE test for cont-pmd mapped hugetlb. (All
> MAP_SHARED tests are passing).
>
>
> Looks like can_follow_write_pmd() returns early for VM_SHARED mappings.
>
> I don't think we only keep the PAE flag in the head page for hugetlb pages?
> So we can't just remove this assert?
>
> I tried just commenting it out and get assert further down follow_huge_pmd():
>
> VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
> !PageAnonExclusive(page), page);
I just replied in another email; we can try the two patches I attached, or
we can wait until I do some tests (but will be mostly unavailable this
afternoon).
Thanks,
--
Peter Xu
On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
> On 02.04.24 16:48, Ryan Roberts wrote:
> > Hi Peter,
Hey, Ryan,
Thanks for the report!
> >
> > On 27/03/2024 15:23, pet...@redhat.com wrote:
> > > From: Peter Xu
> > >
> > &
nfig you tried there; as I am doing some build tests
recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could
avoid a lot of issues, I think it's due to libc missing. But maybe not the
case there.
The series makes sense to me, the naming is confusing. Btw, thanks for
posting this as RFC. This definitely has a conflict with the other gup
series that I had; I'll post v4 of that shortly.
--
Peter Xu
On Fri, Mar 22, 2024 at 01:10:00PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2024 at 06:07:50PM -0400, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > v3:
> > - Rebased to latest mm-unstalbe (a824831a082f, of March 21th)
> > - Dropped patch to introduc
On Fri, Mar 22, 2024 at 08:45:59PM -0400, Peter Xu wrote:
> On Fri, Mar 22, 2024 at 01:48:18PM -0700, Andrew Morton wrote:
> > On Thu, 21 Mar 2024 18:08:02 -0400 pet...@redhat.com wrote:
> >
> > > From: Peter Xu
> > >
> > > Now follow_page() is read
On Fri, Mar 22, 2024 at 01:48:18PM -0700, Andrew Morton wrote:
> On Thu, 21 Mar 2024 18:08:02 -0400 pet...@redhat.com wrote:
>
> > From: Peter Xu
> >
> > Now follow_page() is ready to handle hugetlb pages in whatever form, and
> > over all architectures. S
On Fri, Mar 22, 2024 at 10:14:56AM -0700, SeongJae Park wrote:
> Hi Peter,
Hi, SeongJae,
>
> On Thu, 21 Mar 2024 18:07:53 -0400 pet...@redhat.com wrote:
>
> > From: Peter Xu
> >
> > These macros can be helpful when we plan to merge hugetlb code into generi
ore
generic issue to solve, IOW, we still don't do that for !hugetlb cont_pte
large folios, before or after this series.
>
> Reviewed-by: Jason Gunthorpe
Thanks!
--
Peter Xu
On Wed, Mar 20, 2024 at 05:40:39PM +, Christophe Leroy wrote:
>
>
> Le 20/03/2024 à 17:09, Peter Xu a écrit :
> > On Wed, Mar 20, 2024 at 06:16:43AM +, Christophe Leroy wrote:
> >> At the first place that was to get a close fit between hardware
> >> paget
o similar with 8M pages.
>
> I'll give it a try and see how it goes.
So you're talking about 8M only for 8xx, am I right?
There seem to be other PowerPC systems use hugepd. Is it possible that we
convert all hugepd into cont_pte form?
Thanks,
--
Peter Xu
t; - (pmd_val(pmd) & (_PAGE_VALID|_PAGE_PMD_HUGE)) != _PAGE_VALID;
> > + return pmd_leaf(pmd);;
>
> There is a redundant semicolon in the end.
Will touch it up, thanks. PS: This will be dropped as a whole in patch 12.
--
Peter Xu
On Thu, Mar 14, 2024 at 08:56:59AM +, Christophe Leroy wrote:
>
>
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu
> >
> > This API is not used anymore, drop it for the whole tree.
> >
> > Signed-off-by: Peter
On Thu, Mar 14, 2024 at 08:50:20AM +, Christophe Leroy wrote:
>
>
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu
> >
> > Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(),
> > reuse it.
On Thu, Mar 14, 2024 at 08:45:34AM +, Christophe Leroy wrote:
>
>
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu
> >
> > PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> > constantly returns 0 for hash M
hould define pgd_huge*() instead of pud_huge*(), so
that it looks like the only way to provide such a treewide clean API is to
properly define those APIs for aarch64, and define different pud helpers
for either 3/4 levels. But I confess I don't think I fully digested all
the bits.
Thanks,
--
Peter Xu
On Thu, Mar 07, 2024 at 02:12:33PM -0400, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2024 at 06:41:35PM +0800, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > Swap pud entries do not always return true for pud_huge() for all archs.
> > x86 and sparc (so far) allow
On Wed, Mar 06, 2024 at 11:56:56PM +1100, Michael Ellerman wrote:
> pet...@redhat.com writes:
> > From: Peter Xu
> >
> > PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> > constantly returns 0 for hash MMUs. AFAICT that is fine to be re
On Mon, Mar 04, 2024 at 09:03:34AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 29, 2024 at 04:42:55PM +0800, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > pud_leaf() has a fallback macro defined in include/linux/pgtable.h already.
> > Drop the extra two f
liction, while in the past
it was a silent confliction between the old pud_leaf() macro and pud_leaf()
defintion, the macro could have silently overwrote the function.
IIUC such pud_leaf() is not needed as we have a global fallback. I'll add
a pre-requisite patch to remove such pXd_leaf() definitions.
--
Peter Xu
On Wed, Feb 28, 2024 at 09:50:52AM +, Christophe Leroy wrote:
> Le 28/02/2024 à 09:53, pet...@redhat.com a écrit :
> > From: Peter Xu
> >
> > [based on latest akpm/mm-unstable, commit 1274e7646240]
> >
> > These two APIs are mostly always the same. It
On Wed, Feb 21, 2024 at 08:57:53AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 21, 2024 at 05:37:37PM +0800, Peter Xu wrote:
> > On Mon, Jan 15, 2024 at 01:55:51PM -0400, Jason Gunthorpe wrote:
> > > On Wed, Jan 03, 2024 at 05:14:13PM +0800, pet...@redhat.com wrote:
>
er of this function in this series? When
> does this re-use happen??
It's reused in patch 12 ("mm/gup: Handle hugepd for follow_page()").
Thanks,
--
Peter Xu
gt; pud = READ_ONCE(*pudp);
> > - if (pud_none(pud))
> > + if (pud_none(pud) || !pud_present(pud))
> > return no_page_table(vma, flags, address);
>
> Isn't 'pud_none() || !pud_present()' redundent? A none pud is
> non-present, by definition?
Hmm yes, seems redundant. Let me drop it.
>
> > - if (pud_devmap(pud)) {
> > + if (pud_huge(pud)) {
> > ptl = pud_lock(mm, pudp);
> > - page = follow_devmap_pud(vma, address, pudp, flags,
> > &ctx->pgmap);
> > + page = follow_huge_pud(vma, address, pudp, flags, ctx);
> > spin_unlock(ptl);
> > if (page)
> > return page;
>
> Otherwise it looks OK to me
>
> Reviewed-by: Jason Gunthorpe
Thanks!
--
Peter Xu
On Mon, Jan 15, 2024 at 01:55:51PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 03, 2024 at 05:14:13PM +0800, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > ARM defines pmd_thp_or_huge(), detecting either a THP or a huge PMD. It
> > can be a helpful helper if we
On Mon, Jan 15, 2024 at 01:37:37PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 03, 2024 at 05:14:11PM +0800, pet...@redhat.com wrote:
> > From: Peter Xu
> >
> > Introduce a config option that will be selected as long as huge leaves are
> > involved in pgtable (thp
will even walk that radix table. But I can overlook important
things here.
It'll be definitely great if hugepd can be merged into some existing forms
like a generic pgtable (IMHO cont_* is such case: it's the same as no
cont_* entries for softwares, while hardware can accelerate with TLB hits
on larger ranges). But I can be asking a very silly question here too, as
I can overlook very important things.
Thanks,
--
Peter Xu
On Mon, Dec 25, 2023 at 02:34:48PM +0800, Muchun Song wrote:
> Reviewed-by: Muchun Song
You're using the old email address here. Do you want me to also use the
linux.dev one that you suggested me to use?
--
Peter Xu
/asm/pgtable.h:#define pmd_thp_or_huge(pmd) (pmd_huge(pmd)
|| pmd_trans_huge(pmd))
So far this series only touches generic code. Would you mind I keep this
patch as-is, and leave renaming to later?
>
> BTW, please cc me via the new email (muchun.s...@linux.dev) next edition.
Sure. Thanks for taking a look.
--
Peter Xu
Copy Muchun, which I forgot since the start, sorry.
--
Peter Xu
On Tue, Dec 19, 2023 at 11:28:54AM -0500, James Houghton wrote:
> On Tue, Dec 19, 2023 at 2:57 AM wrote:
> >
> > From: Peter Xu
> >
> > Introduce "pud_t pud" in the function, so the code won't dereference *pudp
> > multiple time. Not only b
8XX also has cont_pte support, so we
actually have three users indeed, if not counting potential future archs
adding support to also get that same tlb benefit.
Thanks,
--
Peter Xu
On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
> > I don't have any micro-benchmarks for GUP though, if that's your question.
> > Is
> > there an easy-to-use test I can run to get some numbe
ch of "#ifdef ARCH_HAS_HUGEPD" in generic
code, which is not preferred either. For gup, it might be relatively easy
when comparing to the rest. I'm still hesitating for the long term plan.
Please let me know if you have any thoughts on any of above.
Thanks!
--
Peter Xu
be tested if gup is not yet touched
from your side, afaict. I'll see whether I can provide some rough numbers
instead in the next post (I'll probably only be able to test it in a VM,
though, but hopefully that should still reflect mostly the truth).
--
Peter Xu
haven't yet worked on gup then, after I glimpsed the above
series.
It's a matter of whether one follow_page_mask() call can fetch more than
one page* for a cont_pte entry on aarch64 for a large non-hugetlb folio
(and if this series lands, it'll be the same to hugetlb or non-hugetlb).
Now the current code can only fetch one page I think.
Thanks,
--
Peter Xu
err = walk_hugetlb_range(start, end, walk);
} else
err = walk_pgd_range(start, end, walk);
It means to me as long as the vma is hugetlb, it'll not trigger any code in
walk_pgd_range(), but only walk_hugetlb_range(). Do you perhaps mean
hugepd is used outside hugetlbfs?
Thanks,
--
Peter Xu
to know if Ryan has worked on cont_pte
support for gup on large folios, and whether there's any performance number
to share. It's definitely good news to me because it means Ryan's work can
also then benefit hugetlb if this series will be merged, I just don't know
how much difference there will be.
Thanks,
--
Peter Xu
d(hugepd_t hugepd, unsigned long addr,
unsigned int pdshift, unsigned long end, unsigned int flags,
struct page **pages, int *nr)
--
Peter Xu
On Wed, Nov 22, 2023 at 12:00:24AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 21, 2023 at 10:59:35AM -0500, Peter Xu wrote:
> > > What prevents us from ever using hugepd with file mappings? I think
> > > it would naturally fit in with how large folios for the pagecache
On Mon, Nov 20, 2023 at 12:26:24AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 15, 2023 at 08:29:02PM -0500, Peter Xu wrote:
> > Hugepd format is only used in PowerPC with hugetlbfs. In commit
> > a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to
> >
apply to hugepd.
Drop that check, not only because it'll never be true for hugepd, but also
it paves way for reusing the function outside fast-gup.
Cc: Lorenzo Stoakes
Cc: Michael Ellerman
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu
---
mm/gup.c | 5 -
1 file changed,
, so we must not drop
> + * ptl before pgt_pmd is removed, so uffd private needs rechecking.
> + */
> + if (userfaultfd_armed(vma) &&
> + !(vma->vm_flags & VM_SHARED))
> + goto recheck;
> + }
> + }
>
> - /* Huge page lock is still held, so page table must remain empty */
> - pml = pmd_lock(mm, pmd);
> - if (ptl != pml)
> - spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
> pmdp_get_lockless_sync();
> if (ptl != pml)
> @@ -1648,6 +1665,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm,
> unsigned long addr,
> }
> if (start_pte)
> pte_unmap_unlock(start_pte, ptl);
> + if (pml && pml != ptl)
> + spin_unlock(pml);
> if (notified)
> mmu_notifier_invalidate_range_end(&range);
> drop_hpage:
> --
> 2.35.3
--
Peter Xu
> mmap_read_lock()")
> Signed-off-by: Hugh Dickins
The locking is indeed slightly complicated.. but I didn't spot anything
wrong.
Acked-by: Peter Xu
Thanks,
--
Peter Xu
ck_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> + };
> + unsigned int __page_type;
> + atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> + unsigned long pt_memcg_data;
> +#endif
> +};
--
Peter Xu
_rcu)
We need to be careful on the lock being freed in pgtable_pte_page_dtor(),
in Hugh's series IIUC we need the spinlock being there for the rcu section
alongside the page itself. So even if to do so we'll need to also rcu call
pgtable_pte_page_dtor() when needed.
--
Peter Xu
the empty pgtable)? As that doesn't seems to beat the
purpose of the patchset as notifiers shouldn't fail.
>
> (FWIW, last I looked, there also seemed to be some other issues with
> MMU notifier usage wrt IOMMUv2, see the thread
> <https://lore.kernel.org/linux-mm/yzbaf9hw1%2frek...@nvidia.com/>.)
>
>
> > + if (ptl != pml)
> > + spin_unlock(ptl);
> > + spin_unlock(pml);
> > +
> > + mm_dec_nr_ptes(mm);
> > + page_table_check_pte_clear_range(mm, addr, pgt_pmd);
> > + pte_free_defer(mm, pmd_pgtable(pgt_pmd));
> > }
> > - i_mmap_unlock_write(mapping);
> > - return target_result;
> > + i_mmap_unlock_read(mapping);
> > }
> >
> > /**
> > @@ -2261,9 +2210,11 @@ static int collapse_file(struct mm_struct *mm,
> > unsigned long addr,
> >
> > /*
> > * Remove pte page tables, so we can re-fault the page as huge.
> > +* If MADV_COLLAPSE, adjust result to call
> > collapse_pte_mapped_thp().
> > */
> > - result = retract_page_tables(mapping, start, mm, addr, hpage,
> > -cc);
> > + retract_page_tables(mapping, start);
> > + if (cc && !cc->is_khugepaged)
> > + result = SCAN_PTE_MAPPED_HUGEPAGE;
> > unlock_page(hpage);
> >
> > /*
> > --
> > 2.35.3
> >
>
--
Peter Xu
into
> detail in responses to you there - thanks for your patience :)
Not a problem at all here!
>
> On Mon, 29 May 2023, Peter Xu wrote:
> > On Sun, May 28, 2023 at 11:25:15PM -0700, Hugh Dickins wrote:
> ...
> > > @@ -1748,123 +1747,73 @@ static void
> &g
gt; + spin_unlock(pml);
> +
> + mm_dec_nr_ptes(mm);
> + page_table_check_pte_clear_range(mm, addr, pgt_pmd);
> + pte_free_defer(mm, pmd_pgtable(pgt_pmd));
> }
> - i_mmap_unlock_write(mapping);
> - return target_result;
> + i_mmap_unlock_read(mapping);
> }
>
> /**
> @@ -2261,9 +2210,11 @@ static int collapse_file(struct mm_struct *mm,
> unsigned long addr,
>
> /*
>* Remove pte page tables, so we can re-fault the page as huge.
> + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp().
>*/
> - result = retract_page_tables(mapping, start, mm, addr, hpage,
> - cc);
> + retract_page_tables(mapping, start);
> + if (cc && !cc->is_khugepaged)
> + result = SCAN_PTE_MAPPED_HUGEPAGE;
> unlock_page(hpage);
>
> /*
> --
> 2.35.3
>
--
Peter Xu
_range_single().
> - Remove zap_page_range.
>
> [1]
> https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.krav...@oracle.com/
> Suggested-by: Peter Xu
> Signed-off-by: Mike Kravetz
Acked-by: Peter Xu
--
Peter Xu
t;
> [1]
> https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.krav...@oracle.com/
> Suggested-by: Peter Xu
> Signed-off-by: Mike Kravetz
Acked-by: Peter Xu
Thanks!
--
Peter Xu
ration_entry_wait_huge(pte, ptl);
> + goto retry;
> + }
> + /*
> + * hwpoisoned entry is treated as no_page_table in
> + * follow_page_mask().
> + */
> + }
> +out:
> + spin_unlock(ptl);
> + return page;
> +}
--
Peter Xu
off-work on Mon & Tue,
but maybe I'll still try).
--
Peter Xu
On Fri, Oct 28, 2022 at 08:27:57AM -0700, Mike Kravetz wrote:
> On 10/27/22 15:34, Peter Xu wrote:
> > On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> > > On 10/26/22 17:59, Peter Xu wrote:
> >
> > If we want to use the vma read lock to protect here
On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> On 10/26/22 17:59, Peter Xu wrote:
> > Hi, Mike,
> >
> > On Sun, Sep 18, 2022 at 07:13:48PM -0700, Mike Kravetz wrote:
> > > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >
uld be safe here because the worst case
is the caller will fetch a wrong page, but then it should be invalidated
very soon with mmu notifiers. One thing worth mention is that pmd unshare
should never free a pgtable page.
IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock
in fast-gup too but I also think it's safe. But I hope I didn't miss
something.
--
Peter Xu
mar K.V
> Signed-off-by: Yang Shi
Acked-by: Peter Xu
--
Peter Xu
On Tue, Sep 06, 2022 at 01:08:10PM -0700, Suren Baghdasaryan wrote:
> On Tue, Sep 6, 2022 at 12:39 PM Peter Xu wrote:
> >
> > On Thu, Sep 01, 2022 at 10:35:07AM -0700, Suren Baghdasaryan wrote:
> > > Due to the possibility of do_swap_page dropping mmap_lock, abort fault
)
> vm_fault_t ret = 0;
> void *shadow = NULL;
>
> + if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> + ret = VM_FAULT_RETRY;
> + goto out;
> + }
> +
May want to fail early similarly for handle_userfault() too for similar
reason. Thanks,
--
Peter Xu
just become no-diff after rebase, though.. I'm not sure how
the ordering would be at last, but anyway I think this patch stands as its
own too..
Acked-by: Peter Xu
Thanks for tolerant with my nitpickings,
>
> ---
>
> New for v4
> ---
> mm/migrate_device.c | 2 +-
try
> after madvise returns. Fix this by flushing the TLB while holding the
> PTL.
>
> Signed-off-by: Alistair Popple
> Reported-by: Nadav Amit
> Reviewed-by: "Huang, Ying"
> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while
> collecting pages")
> Cc: sta...@vger.kernel.org
Acked-by: Peter Xu
--
Peter Xu
On Fri, Aug 26, 2022 at 06:46:02PM +0200, David Hildenbrand wrote:
> On 26.08.22 17:55, Peter Xu wrote:
> > On Fri, Aug 26, 2022 at 04:47:22PM +0200, David Hildenbrand wrote:
> >>> To me anon exclusive only shows this mm exclusively owns this page. I
> >>> d
ast is the magic bit, we have to make sure that we won't see new
> GUP pins, thus the TLB flush.
>
> include/linux/mm.h:gup_must_unshare() contains documentation.
Hmm.. Shouldn't ptep_get_and_clear() (e.g., xchg() on x86_64) already
guarantees that no other process/thread will see this pte anymore
afterwards?
--
Peter Xu
On Fri, Aug 26, 2022 at 11:02:58AM +1000, Alistair Popple wrote:
>
> Peter Xu writes:
>
> > On Fri, Aug 26, 2022 at 08:21:44AM +1000, Alistair Popple wrote:
> >>
> >> Peter Xu writes:
> >>
> >> > On Wed, Aug 24, 2022 at 01:03:38PM +1000, Al
ght be pinned (or have
> swap-cache allocated to it, but I'm hoping to at least get that fixed).
If so I'd suggest even more straightforward document for either this
trylock() or on the APIs (e.g. for migrate_vma_setup()). This behavior is
IMHO hiding deep and many people may not realize. I'll comment in the
comment update patch.
Thanks.
--
Peter Xu
On Fri, Aug 26, 2022 at 08:21:44AM +1000, Alistair Popple wrote:
>
> Peter Xu writes:
>
> > On Wed, Aug 24, 2022 at 01:03:38PM +1000, Alistair Popple wrote:
> >> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that
> >> installs migration en
).
I think the order can be changed if explicitly did so (e.g. fork() plus
mremap() for anonymous here) but I just want to make sure I get the whole
point of it.
Thanks,
--
Peter Xu
On Thu, Aug 25, 2022 at 10:42:41AM +1000, Alistair Popple wrote:
>
> Peter Xu writes:
>
> > On Wed, Aug 24, 2022 at 04:25:44PM -0400, Peter Xu wrote:
> >> On Wed, Aug 24, 2022 at 11:56:25AM +1000, Alistair Popple wrote:
> >> > >> Still I don't kno
On Wed, Aug 24, 2022 at 04:25:44PM -0400, Peter Xu wrote:
> On Wed, Aug 24, 2022 at 11:56:25AM +1000, Alistair Popple wrote:
> > >> Still I don't know whether there'll be any side effect of having stall
> > >> tlbs
> > >> in !present ptes bec
age, or will we try to lock_page() again
somewhere?
The future unmap op is also based on this "cpages", not "npages":
if (args->cpages)
migrate_vma_unmap(args);
So I never figured out how this code really works. It'll be great if you
could shed some light to it.
Thanks,
--
Peter Xu
claimed data
> won't be written back to swap storage as it is considered uptodate,
> resulting in data loss if the page is subsequently accessed.
>
> Prevent this by copying the dirty bit to the page when removing the pte
> to match what try_to_migrate_one() does.
>
> S
ike
e.g. mprotect(), there's a strong barrier of not allowing further write
after mprotect() returns.
Still I don't know whether there'll be any side effect of having stall tlbs
in !present ptes because I'm not familiar enough with the private dev swap
migration code. But I think having them will be safe, even if redundant.
Thanks,
--
Peter Xu
arch_leave_lazy_mmu_mode();
pte_unmap_unlock();
I may miss something, but even if not it already doesn't look pretty.
Thanks,
--
Peter Xu
hed solution can be worse than using per-pte
ptep_clear_flush(). It may enlarge the race window but fundamentally
(iiuc) they're the same thing here as long as there's no atomic way to both
"clear pte and flush tlb".
[1] https://lore.kernel.org/lkml/e37036e0-566e-40c7-ad15-720cdb003...@gmail.com/
--
Peter Xu
On Wed, Aug 17, 2022 at 11:49:03AM +1000, Alistair Popple wrote:
>
> Peter Xu writes:
>
> > On Tue, Aug 16, 2022 at 04:10:29PM +0800, huang ying wrote:
> >> > @@ -193,11 +194,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >> >
hen we can keep using ptep_get_and_clear() afaiu but keep "pte"
updated.
Thanks,
--
Peter Xu
tectures can provide their own version. */
> +__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
> +{
> + return ~(0UL);
I'm wondering whether it's better to return 0 rather than ~0 by default.
Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
valid address ranges with ~0, or perhaps I misread?
Thanks,
--
Peter Xu
keeping
them as-is.
Acked-by: Geert Uytterhoeven
Acked-by: Peter Zijlstra (Intel)
Acked-by: Johannes Weiner
Acked-by: Vineet Gupta
Acked-by: Guo Ren
Acked-by: Max Filippov
Acked-by: Christian Borntraeger
Acked-by: Michael Ellerman (powerpc)
Acked-by: Catalin Marinas
Reviewed-by: Alistair Po
dev@lists.ozlabs.org, "David
S . Miller"
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev"
On Mon, May 30, 2022 at 07:03:31PM +0200, Heiko Carstens wrote:
> On Mon, May 30, 2022 at 12:00:52PM -0400, Peter Xu wrote:
> > On
Morton , linuxppc-dev@lists.ozlabs.org,
>"David S . Miller"
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev"
On Mon, May 30, 2022 at 11:52:54AM -0400, Peter Xu wrote:
> On Mon, May 30, 2022 at 11:35:10AM +0200, Christian Borntraeger wr
> > -*/
> > - mmap_read_lock(mm);
> > - goto out_gmap;
> > + if (gmap) {
> > + mmap_read_lock(mm);
> > + goto out_gmap;
> > + }
> > + goto out;
>
> Yes, that makes sense. With that
>
> Acked-by: Christian Borntraeger
Looks sane, thanks Heiko, Christian. I'll cook another one.
--
Peter Xu
Acked-by: Max Filippov
Reviewed-by: Alistair Popple
Reviewed-by: Ingo Molnar
Signed-off-by: Peter Xu
---
v4:
- Picked up a-bs and r-bs
- Fix grammar in the comment of faultin_page() [Ingo]
- Fix s390 for gmap since gmap needs the mmap lock [Heiko]
v3:
- Rebase to akpm/mm-unstable
- Copy ar
t;
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev"
On Fri, May 27, 2022 at 12:46:31PM +0200, Ingo Molnar wrote:
>
> * Peter Xu wrote:
>
> > This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> > pr
t;
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev"
Hi, Heiko,
On Fri, May 27, 2022 at 02:23:42PM +0200, Heiko Carstens wrote:
> On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> > I observed that for each of the shared fil
citly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.
Signed-off-by: Peter Xu
---
v3:
- Rebase to akpm/mm-unstable
- Copy arch maintainers
---
arch/alpha/mm/fault.c | 4 +
101 - 200 of 215 matches
Mail list logo