Re: [PATCH 1/2] Fix userfaultfd_api to return EINVAL as expected

2024-05-08 Thread Peter Xu
> > > Signed-off-by: Audra Mitchell > > > --- > > > fs/userfaultfd.c | 5 + > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > index 60dcfafdc11a..17210558de79 100644 > > > --- a/fs/userfaultfd.c > > > +++ b/fs/userfaultfd.c > > > @@ -2073,6 +2073,11 @@ static int userfaultfd_api(struct userfaultfd_ctx > > > *ctx, > > > uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED; > > > uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC; > > > #endif > > > + > > > + ret = -EINVAL; > > > + if (features & ~uffdio_api.features) > > > + goto err_out; > > > + > > > uffdio_api.ioctls = UFFD_API_IOCTLS; > > > ret = -EFAULT; > > > if (copy_to_user(buf, _api, sizeof(uffdio_api))) > > > > CCing Peter. > > > > -- > > Cheers, > > > > David / dhildenb > > > -- Peter Xu

Re: [PATCH v1 2/2] mm/hugetlb: document why hugetlb uses folio_mapcount() for COW reuse decisions

2024-05-02 Thread Peter Xu
ally around how hugetlb pages > cannot really be overcommitted, and why we don't particularly care about > these vmsplice() leaks for hugetlb -- in contrast to ordinary memory. > > [1] > https://lore.kernel.org/all/8b42a24d-caf0-46ef-9e15-0f88d47d2...@redhat.com/ > > Suggested-

Re: [PATCH v1] selftests: mm: gup_longterm: test unsharing logic when R/O pinning

2024-04-30 Thread Peter Xu
gt; provided with the VMA to decide at some point that long-term R/O pinning > a !anon page is fine. > > Cc: Andrew Morton > Cc: Shuah Khan > Cc: Peter Xu > Signed-off-by: David Hildenbrand Acked-by: Peter Xu Thanks, -- Peter Xu

Re: BUG selftests/mm]

2024-03-12 Thread Peter Xu
On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote: > On Mon, Mar 11, 2024 at 2:27 PM James Houghton wrote: > > > > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu wrote: > > > > > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote: >

Re: BUG selftests/mm]

2024-03-11 Thread Peter Xu
s possibly even less of a concern to directly use pr_err_ratelimited(). Thanks, -- Peter Xu

Re: BUG selftests/mm]

2024-03-11 Thread Peter Xu
On Mon, Mar 11, 2024 at 03:48:14PM +0100, David Hildenbrand wrote: > On 11.03.24 15:35, Peter Xu wrote: > > On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote: > > > On 09.03.24 20:12, Mirsad Todorovac wrote: > > > > Hi, > > > > > > &

Re: BUG selftests/mm]

2024-03-11 Thread Peter Xu
flooding whoever kicks off the selftests, but indeed this seems to be able to be used by anyone to trigger such endless reports in dmesg. The issue with requiring a privilege means any hypervisor that will need to use this to emulate memory errors will also require such privilege, and it can be a problem. Logically such "hwpoison errors" are not real so it is not needed to be reported in dmesg, but now we're leveraging it to be exactly the same as a real hw error to share the code path, iiuc (e.g. on MCE injections). One option is to use a different marker reflecting that such hwpoison error is internal, so we don't need to report in dmesg. That'll also require (besides another bit in pte markers) one extra VM_FAULT_* flag just for such reports. Might be slightly an overkill, but I don't see another better way; not reporting HWPOISON will complicate at least kvm use case even more. Or.. does syslog has its own protection in general for such printk floods? It'll be easier if that's not a concern to flood then, but I'm not sure from that regard. Thanks, -- Peter Xu

Re: [PATCH v3] KVM: selftests: Fix the dirty_log_test semaphore imbalance

2024-02-05 Thread Peter Xu
On Mon, Feb 05, 2024 at 06:05:02PM +0800, Peter Xu wrote: > Shaoqin, Sean, > > Apologies for a late comment. I'm trying to remember what I wrote.. > > On Fri, Feb 02, 2024 at 01:43:32AM -0500, Shaoqin Huang wrote: > > Why sem_vcpu_cont and sem_vcpu_stop can be non-zer

Re: [PATCH v3] KVM: selftests: Fix the dirty_log_test semaphore imbalance

2024-02-05 Thread Peter Xu
u_cont) > at the end of each dirty-ring test. It can cause two cases: As a possible alternative, would it work if we simply reset all the sems for each run? Then we don't care about the leftovers. E.g. sem_destroy() at the end of run_test(), then always init to 0 at entry. -- Peter Xu

Re: [PATCH v2 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock

2024-01-02 Thread Peter Xu
itting the large folio we > unlock and release it because after the split the old folio might not be > the one that contains the src_addr. > > Fixes: 94b01c885131 ("userfaultfd: UFFDIO_MOVE uABI") > Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu Thanks, -- Peter Xu

Re: [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock

2024-01-02 Thread Peter Xu
gt; if (err) > goto out; > + goto retry; > } Do we also need to clear src_folio and src_folio_pte? If the folio is a thp, I think it means it's pte mapped here. Then after the split we may want to fetch the small folio after the split, not the head one? -- Peter Xu

Re: [PATCH v4 5/5] selftests/mm: add UFFDIO_MOVE ioctl test

2023-10-30 Thread Peter Xu
h > thp test are you referring to? The new "move-pmd" test. I meant maybe it makes sense to have one separate MOVE test for when one ioctl(MOVE) covers a large range which can cover some thps. Then that will trigger thp paths. Assuming the fault paths are already covered in the generic "move" test. Thanks, -- Peter Xu

Re: [PATCH v4 1/5] mm/rmap: support move to different root anon_vma in folio_move_anon_rmap()

2023-10-30 Thread Peter Xu
s is a preparation for UFFDIO_MOVE, which will hold the folio lock, > the anon_vma lock in write mode, and the mmap_lock in read mode. > > Signed-off-by: Andrea Arcangeli > Signed-off-by: Suren Baghdasaryan Acked-by: Peter Xu -- Peter Xu

Re: [PATCH v4 3/5] selftests/mm: call uffd_test_ctx_clear at the end of the test

2023-10-30 Thread Peter Xu
nr_pages > which might differ from one test run to another. > Fix this by calling uffd_test_ctx_clear() after each test is done. > > Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu -- Peter Xu

Re: [PATCH v4 5/5] selftests/mm: add UFFDIO_MOVE ioctl test

2023-10-30 Thread Peter Xu
st_alloc = pmd_align_areas, > + .pre_release = pmd_restore_areas, > +}; > + > /* > * Test the returned uffdio_register.ioctls with different register modes. > * Note that _UFFDIO_ZEROPAGE is tested separately in the zeropage test. > @@ -1141,6 +1268,20 @@ uffd_test_case_t uffd_tests[] = { > .mem_targets = MEM_ALL, > .uffd_feature_required = 0, > }, > + { > + .name = "move", > + .uffd_fn = uffd_move_test, > + .mem_targets = MEM_ANON, > + .uffd_feature_required = UFFD_FEATURE_MOVE, > + .test_case_ops = _move_test_case_ops, > + }, > + { > + .name = "move-pmd", > + .uffd_fn = uffd_move_test, > + .mem_targets = MEM_ANON, > + .uffd_feature_required = UFFD_FEATURE_MOVE, > + .test_case_ops = _move_pmd_test_case_ops, > + }, > { > .name = "wp-fork", > .uffd_fn = uffd_wp_fork_test, > -- > 2.42.0.820.g83a721a137-goog > -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-23 Thread Peter Xu
Now rethinking with the recently merged WP_ASYNC: we ignore uffd-wp, which means dirty from uffd-wp async tracking POV, that matches with soft-dirty always set. Looks all good. Perhaps something like "Follow mremap() behavior; ignore uffd-wp for now" should work? -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-22 Thread Peter Xu
On Thu, Oct 19, 2023 at 02:24:06PM -0700, Suren Baghdasaryan wrote: > On Thu, Oct 12, 2023 at 3:00 PM Peter Xu wrote: > > > > On Sun, Oct 08, 2023 at 11:42:27PM -0700, Suren Baghdasaryan wrote: > > > From: Andrea Arcangeli > > > > > > Implement the uA

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-22 Thread Peter Xu
en in a single-mm support. Anyone would like to add that can already refer to discussion in this thread. I hope I'm clear. -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-19 Thread Peter Xu
h() (so pte is cleared) we recheck page pinning, if pinned fail MOVE and put the page back. Note that we can't do that check after installing it into dest pgtables, because then someone can start to pin it from dest mm already. Thanks, -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-19 Thread Peter Xu
work more thoroughly to remove cross-mm implications if so, just like when renaming REMAP to MOVE. Thanks, -- Peter Xu

Re: [PATCH v3 3/3] selftests/mm: add UFFDIO_MOVE ioctl test

2023-10-19 Thread Peter Xu
On Thu, Oct 19, 2023 at 10:29:27AM -0700, Axel Rasmussen wrote: > On Thu, Oct 19, 2023 at 8:43 AM Suren Baghdasaryan wrote: > > > > On Thu, Oct 12, 2023 at 3:29 PM Peter Xu wrote: > > > > > > On Sun, Oct 08, 2023 at 11:42:28PM -0700, Suren Baghdasaryan

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-17 Thread Peter Xu
om its own opinion. The next is forwarding that to someone else. Parent process is fine taking uffd of child with EVENT_FORK, as I mentioned, but besides that nothing else I can think of that can violate this guard to manipulate a random process. Thanks, -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-16 Thread Peter Xu
in mind that it won't be affected by someone mremap() pinned pages and we want to keep it working.. All of it just seems to be an accident.. One step back, we're free to define UFFDIO_MOVE anyway, and we don't necessarily need to always follow mremap(). E.g., mremap() also supports ksm pages, but IIUC we already decide to not support that for now on UFFDIO_MOVE. UFFDIO_MOVE seems all fine to make it clear on failing at pinned pages from the 1st day if that satisfies our goals, too. > > > > > In general, pinning lose its whole point here to me for an userspace either > > if it DONTNEEDs it or REMAP it. What would be great to do here is we unpin > > it upon DONTNEED/REMAP/whatever drops the page, because it loses its > > coherency anyway, IMHO. > > Further, moving a part of a THP would fail either way, because the pinned > THP cannot get split. > > -- > Cheers, > > David / dhildenb > Thanks, -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-13 Thread Peter Xu
On Fri, Oct 13, 2023 at 09:49:10AM -0700, Lokesh Gidra wrote: > On Fri, Oct 13, 2023 at 9:08 AM Peter Xu wrote: > > > > On Fri, Oct 13, 2023 at 11:56:31AM +0200, David Hildenbrand wrote: > > > Hi Peter, > > > > Hi, David, > > > > > > > >

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-13 Thread Peter Xu
* Flags controlling behavior of move */ >__s64 move; /* Number of bytes moved, or negated error */ >}; > > That probably has to be documented as well, in which address space dst and > src reside. Agreed, some better documentation will never hurt. Dst should be in the mm address space that was bound to the userfault descriptor. Src should be in the current mm address space. Thanks, -- Peter Xu

Re: [PATCH v3 3/3] selftests/mm: add UFFDIO_MOVE ioctl test

2023-10-12 Thread Peter Xu
); > + if (pthread_join(uffd_mon, NULL)) > + err("join() failed"); > + > + if (args.missing_faults != nr_pages || args.minor_faults != 0) > + uffd_test_fail("stats check error"); > + else > + uffd_test_pass();

Re: [PATCH v3 1/3] mm/rmap: support move to different root anon_vma in folio_move_anon_rmap()

2023-10-12 Thread Peter Xu
ad(_anon_vma->rwsem); > + goto retry_under_rcu; Is adding this specific label worthwhile? How about rcu unlock and goto retry (then it'll also be clear that we won't hold rcu read lock for unpredictable time)? > + } > + > /* >* If the folio is still mapped, then this anon_vma is still >* its anon_vma, and holding the mutex ensures that it will -- Peter Xu

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-12 Thread Peter Xu
fail. > + */ > + if (unlikely(pmd_trans_huge(dst_pmdval))) { > + err = -EEXIST; > + break; > + } > + > + ptl =

Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

2023-10-12 Thread Peter Xu
t;> ENOMEM Allocating memory needed for the operation failed. > > >>> > > >>> ESRCH > > >>> The faulting process has exited at the time of a > > >>> UFFDIO_MOVE operation. > > >>> > > >> > > >> A general comment simply because I realized that just now: does anything > > >> speak against limiting the operations now to a single MM? > > >> > > >> The use cases I heard so far don't need it. If ever required, we could > > >> consider extending it. > > >> > > >> Let's reduce complexity and KIS unless really required. > > > > > > Let me check if there are use cases that require moves between MMs. > > > Andrea seems to have put considerable effort to make it work between > > > MMs and it would be a pity to lose that. I can send a follow-up patch > > > to recover that functionality and even if it does not get merged, it > > > can be used in the future as a reference. But first let me check if we > > > can drop it. > > For the compaction use case that we have it's fine to limit it to > single MM. However, for general use I think Peter will have a better > idea. I used to have the same thought with David on whether we can simplify the design to e.g. limit it to single mm. Then I found that the trickiest is actually patch 1 together with the anon_vma manipulations, and the problem is that's not avoidable even if we restrict the api to apply on single mm. What else we can benefit from single mm? One less mmap read lock, but probably that's all we can get; IIUC we need to keep most of the rest of the code, e.g. pgtable walks, double pgtable lockings, etc. Actually, even though I have no solid clue, but I had a feeling that there can be some interesting way to leverage this across-mm movement, while keeping things all safe (by e.g. elaborately requiring other proc to create uffd and deliver to this proc). Considering Andrea's original version already contains those bits and all above, I'd vote that we go ahead with supporting two MMs. Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-10-03 Thread Peter Xu
the basis for > making decisions. For example: https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarca...@redhat.com/ -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-10-03 Thread Peter Xu
me, no strong opinion, but is there any strong one over MOVE? MOVE is a fine name, however considering UFFDIO_REMAP's long history.. I tend to prefer keeping it called as REMAP - it still sounds sane, and anyone who knows REMAP will know this is exactly that. Thanks, -- Peter Xu

Re: [PATCH v2 1/3] userfaultfd: UFFDIO_REMAP: rmap preparation

2023-10-02 Thread Peter Xu
> > I'm curious: why don't we need this for existing users of > page_move_anon_rmap()? What's special about UFFDIO_REMAP? Totally no expert on anon vma so I'm prone to errors, but IIUC the difference here is root anon vma cannot change in page_move_anon_rmap(), while UFFDIO_REMAP can. Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-10-02 Thread Peter Xu
, data won't get lost during movement, and it will generate a missing event after moved, with latest data showing up on dest. I'm not sure that means such a fallback is a problem, Suren may know better with the use case. Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
hat regard. Again, I don't know whether anyone cares on any of those, though.. Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
ch we can already hit similar case, mfill_atomic_install_pte() has: ret = -EAGAIN; dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, ); if (!dst_pte) goto out; Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
On Thu, Sep 28, 2023 at 07:51:18PM +0200, David Hildenbrand wrote: > On 28.09.23 19:21, Peter Xu wrote: > > On Thu, Sep 28, 2023 at 07:05:40PM +0200, David Hildenbrand wrote: > > > As described as reply to v1, without fork() and KSM, the PAE bit should > > > stick aro

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
ing folio_mapcount() for pte, and another helper (perhaps: _nr_pages_mapped==COMPOUND_MAPPED && _entire_mapcount==1) for thp. But I know that's not ideal either. -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
if (fatal_signal_pending(current)) { err = -EINTR; goto out; } /* Continue with the loop */ break; default: goto out; } Not super good but maybe slightly better? No strong opinions, but if you agree that looks better we can use it. > + } > + > +out: > + mmap_read_unlock(dst_mm); > + if (dst_mm != src_mm) > + mmap_read_unlock(src_mm); > + BUG_ON(moved < 0); > + BUG_ON(err > 0); > + BUG_ON(!moved && !err); > + return moved ? moved : err; > +} I think for the rest I'll read the new version (e.g. I saw discussion on proper handling of pmd swap entries, which is not yet addressed, but probably will in the next one). Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
a page or installing a new one with valid content imply VM_WRITE to me on either side.. Thanks, -- Peter Xu

Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

2023-09-28 Thread Peter Xu
arers(src_folio) != 1) { > > ^ here you should check PageAnonExclusive. Please get rid of any implicit > explicit/implcit mapcount checks. David, is PageAnon 100% accurate now in the current tree? IOW, can it be possible that the page has total_mapcount==1 but missing AnonExclusive bit in any possible way? Thanks, -- Peter Xu

Re: [PATCH v2 1/3] userfaultfd: UFFDIO_REMAP: rmap preparation

2023-09-28 Thread Peter Xu
> + > if (atomic_dec_and_test(_vma->refcount)) { > /* > * Oops, we held the last refcount, release the lock > -- > 2.42.0.515.g380fc7ccd1-goog > -- Peter Xu