> > > Signed-off-by: Audra Mitchell
> > > ---
> > > fs/userfaultfd.c | 5 +
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index 60dcfafdc11a..17210558de79 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -2073,6 +2073,11 @@ static int userfaultfd_api(struct userfaultfd_ctx
> > > *ctx,
> > > uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
> > > uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
> > > #endif
> > > +
> > > + ret = -EINVAL;
> > > + if (features & ~uffdio_api.features)
> > > + goto err_out;
> > > +
> > > uffdio_api.ioctls = UFFD_API_IOCTLS;
> > > ret = -EFAULT;
> > > if (copy_to_user(buf, _api, sizeof(uffdio_api)))
> >
> > CCing Peter.
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
--
Peter Xu
ally around how hugetlb pages
> cannot really be overcommitted, and why we don't particularly care about
> these vmsplice() leaks for hugetlb -- in contrast to ordinary memory.
>
> [1]
> https://lore.kernel.org/all/8b42a24d-caf0-46ef-9e15-0f88d47d2...@redhat.com/
>
> Suggested-
gt; provided with the VMA to decide at some point that long-term R/O pinning
> a !anon page is fine.
>
> Cc: Andrew Morton
> Cc: Shuah Khan
> Cc: Peter Xu
> Signed-off-by: David Hildenbrand
Acked-by: Peter Xu
Thanks,
--
Peter Xu
On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote:
> On Mon, Mar 11, 2024 at 2:27 PM James Houghton wrote:
> >
> > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu wrote:
> > >
> > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
>
s possibly even less of a concern
to directly use pr_err_ratelimited().
Thanks,
--
Peter Xu
On Mon, Mar 11, 2024 at 03:48:14PM +0100, David Hildenbrand wrote:
> On 11.03.24 15:35, Peter Xu wrote:
> > On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote:
> > > On 09.03.24 20:12, Mirsad Todorovac wrote:
> > > > Hi,
> > > >
> > &
flooding whoever kicks off the selftests, but
indeed this seems to be able to be used by anyone to trigger such endless
reports in dmesg.
The issue with requiring a privilege means any hypervisor that will need to
use this to emulate memory errors will also require such privilege, and it
can be a problem.
Logically such "hwpoison errors" are not real so it is not needed to be
reported in dmesg, but now we're leveraging it to be exactly the same as a
real hw error to share the code path, iiuc (e.g. on MCE injections).
One option is to use a different marker reflecting that such hwpoison error
is internal, so we don't need to report in dmesg. That'll also require
(besides another bit in pte markers) one extra VM_FAULT_* flag just for
such reports. Might be slightly an overkill, but I don't see another
better way; not reporting HWPOISON will complicate at least kvm use case
even more.
Or.. does syslog has its own protection in general for such printk floods?
It'll be easier if that's not a concern to flood then, but I'm not sure
from that regard.
Thanks,
--
Peter Xu
On Mon, Feb 05, 2024 at 06:05:02PM +0800, Peter Xu wrote:
> Shaoqin, Sean,
>
> Apologies for a late comment. I'm trying to remember what I wrote..
>
> On Fri, Feb 02, 2024 at 01:43:32AM -0500, Shaoqin Huang wrote:
> > Why sem_vcpu_cont and sem_vcpu_stop can be non-zer
u_cont)
> at the end of each dirty-ring test. It can cause two cases:
As a possible alternative, would it work if we simply reset all the sems
for each run? Then we don't care about the leftovers. E.g. sem_destroy()
at the end of run_test(), then always init to 0 at entry.
--
Peter Xu
itting the large folio we
> unlock and release it because after the split the old folio might not be
> the one that contains the src_addr.
>
> Fixes: 94b01c885131 ("userfaultfd: UFFDIO_MOVE uABI")
> Signed-off-by: Suren Baghdasaryan
Reviewed-by: Peter Xu
Thanks,
--
Peter Xu
gt; if (err)
> goto out;
> + goto retry;
> }
Do we also need to clear src_folio and src_folio_pte? If the folio is a
thp, I think it means it's pte mapped here. Then after the split we may
want to fetch the small folio after the split, not the head one?
--
Peter Xu
h
> thp test are you referring to?
The new "move-pmd" test.
I meant maybe it makes sense to have one separate MOVE test for when one
ioctl(MOVE) covers a large range which can cover some thps. Then that will
trigger thp paths. Assuming the fault paths are already covered in the
generic "move" test.
Thanks,
--
Peter Xu
s is a preparation for UFFDIO_MOVE, which will hold the folio lock,
> the anon_vma lock in write mode, and the mmap_lock in read mode.
>
> Signed-off-by: Andrea Arcangeli
> Signed-off-by: Suren Baghdasaryan
Acked-by: Peter Xu
--
Peter Xu
nr_pages
> which might differ from one test run to another.
> Fix this by calling uffd_test_ctx_clear() after each test is done.
>
> Signed-off-by: Suren Baghdasaryan
Reviewed-by: Peter Xu
--
Peter Xu
st_alloc = pmd_align_areas,
> + .pre_release = pmd_restore_areas,
> +};
> +
> /*
> * Test the returned uffdio_register.ioctls with different register modes.
> * Note that _UFFDIO_ZEROPAGE is tested separately in the zeropage test.
> @@ -1141,6 +1268,20 @@ uffd_test_case_t uffd_tests[] = {
> .mem_targets = MEM_ALL,
> .uffd_feature_required = 0,
> },
> + {
> + .name = "move",
> + .uffd_fn = uffd_move_test,
> + .mem_targets = MEM_ANON,
> + .uffd_feature_required = UFFD_FEATURE_MOVE,
> + .test_case_ops = _move_test_case_ops,
> + },
> + {
> + .name = "move-pmd",
> + .uffd_fn = uffd_move_test,
> + .mem_targets = MEM_ANON,
> + .uffd_feature_required = UFFD_FEATURE_MOVE,
> + .test_case_ops = _move_pmd_test_case_ops,
> + },
> {
> .name = "wp-fork",
> .uffd_fn = uffd_wp_fork_test,
> --
> 2.42.0.820.g83a721a137-goog
>
--
Peter Xu
Now rethinking with the recently merged WP_ASYNC: we ignore uffd-wp, which
means dirty from uffd-wp async tracking POV, that matches with soft-dirty
always set. Looks all good.
Perhaps something like "Follow mremap() behavior; ignore uffd-wp for now"
should work?
--
Peter Xu
On Thu, Oct 19, 2023 at 02:24:06PM -0700, Suren Baghdasaryan wrote:
> On Thu, Oct 12, 2023 at 3:00 PM Peter Xu wrote:
> >
> > On Sun, Oct 08, 2023 at 11:42:27PM -0700, Suren Baghdasaryan wrote:
> > > From: Andrea Arcangeli
> > >
> > > Implement the uA
en in a single-mm support. Anyone
would like to add that can already refer to discussion in this thread.
I hope I'm clear.
--
Peter Xu
h() (so pte is
cleared) we recheck page pinning, if pinned fail MOVE and put the page
back. Note that we can't do that check after installing it into dest
pgtables, because then someone can start to pin it from dest mm already.
Thanks,
--
Peter Xu
work more thoroughly to remove cross-mm implications
if so, just like when renaming REMAP to MOVE.
Thanks,
--
Peter Xu
On Thu, Oct 19, 2023 at 10:29:27AM -0700, Axel Rasmussen wrote:
> On Thu, Oct 19, 2023 at 8:43 AM Suren Baghdasaryan wrote:
> >
> > On Thu, Oct 12, 2023 at 3:29 PM Peter Xu wrote:
> > >
> > > On Sun, Oct 08, 2023 at 11:42:28PM -0700, Suren Baghdasaryan
om its own opinion. The next is forwarding that to someone else.
Parent process is fine taking uffd of child with EVENT_FORK, as I
mentioned, but besides that nothing else I can think of that can violate
this guard to manipulate a random process.
Thanks,
--
Peter Xu
in mind that
it won't be affected by someone mremap() pinned pages and we want to keep
it working..
All of it just seems to be an accident..
One step back, we're free to define UFFDIO_MOVE anyway, and we don't
necessarily need to always follow mremap(). E.g., mremap() also supports
ksm pages, but IIUC we already decide to not support that for now on
UFFDIO_MOVE. UFFDIO_MOVE seems all fine to make it clear on failing at
pinned pages from the 1st day if that satisfies our goals, too.
>
> >
> > In general, pinning lose its whole point here to me for an userspace either
> > if it DONTNEEDs it or REMAP it. What would be great to do here is we unpin
> > it upon DONTNEED/REMAP/whatever drops the page, because it loses its
> > coherency anyway, IMHO.
>
> Further, moving a part of a THP would fail either way, because the pinned
> THP cannot get split.
>
> --
> Cheers,
>
> David / dhildenb
>
Thanks,
--
Peter Xu
On Fri, Oct 13, 2023 at 09:49:10AM -0700, Lokesh Gidra wrote:
> On Fri, Oct 13, 2023 at 9:08 AM Peter Xu wrote:
> >
> > On Fri, Oct 13, 2023 at 11:56:31AM +0200, David Hildenbrand wrote:
> > > Hi Peter,
> >
> > Hi, David,
> >
> > >
> > >
* Flags controlling behavior of move */
>__s64 move; /* Number of bytes moved, or negated error */
>};
>
> That probably has to be documented as well, in which address space dst and
> src reside.
Agreed, some better documentation will never hurt. Dst should be in the mm
address space that was bound to the userfault descriptor. Src should be in
the current mm address space.
Thanks,
--
Peter Xu
);
> + if (pthread_join(uffd_mon, NULL))
> + err("join() failed");
> +
> + if (args.missing_faults != nr_pages || args.minor_faults != 0)
> + uffd_test_fail("stats check error");
> + else
> + uffd_test_pass();
ad(_anon_vma->rwsem);
> + goto retry_under_rcu;
Is adding this specific label worthwhile? How about rcu unlock and goto
retry (then it'll also be clear that we won't hold rcu read lock for
unpredictable time)?
> + }
> +
> /*
>* If the folio is still mapped, then this anon_vma is still
>* its anon_vma, and holding the mutex ensures that it will
--
Peter Xu
fail.
> + */
> + if (unlikely(pmd_trans_huge(dst_pmdval))) {
> + err = -EEXIST;
> + break;
> + }
> +
> + ptl =
t;> ENOMEM Allocating memory needed for the operation failed.
> > >>>
> > >>> ESRCH
> > >>> The faulting process has exited at the time of a
> > >>> UFFDIO_MOVE operation.
> > >>>
> > >>
> > >> A general comment simply because I realized that just now: does anything
> > >> speak against limiting the operations now to a single MM?
> > >>
> > >> The use cases I heard so far don't need it. If ever required, we could
> > >> consider extending it.
> > >>
> > >> Let's reduce complexity and KIS unless really required.
> > >
> > > Let me check if there are use cases that require moves between MMs.
> > > Andrea seems to have put considerable effort to make it work between
> > > MMs and it would be a pity to lose that. I can send a follow-up patch
> > > to recover that functionality and even if it does not get merged, it
> > > can be used in the future as a reference. But first let me check if we
> > > can drop it.
>
> For the compaction use case that we have it's fine to limit it to
> single MM. However, for general use I think Peter will have a better
> idea.
I used to have the same thought with David on whether we can simplify the
design to e.g. limit it to single mm. Then I found that the trickiest is
actually patch 1 together with the anon_vma manipulations, and the problem
is that's not avoidable even if we restrict the api to apply on single mm.
What else we can benefit from single mm? One less mmap read lock, but
probably that's all we can get; IIUC we need to keep most of the rest of
the code, e.g. pgtable walks, double pgtable lockings, etc.
Actually, even though I have no solid clue, but I had a feeling that there
can be some interesting way to leverage this across-mm movement, while
keeping things all safe (by e.g. elaborately requiring other proc to create
uffd and deliver to this proc).
Considering Andrea's original version already contains those bits and all
above, I'd vote that we go ahead with supporting two MMs.
Thanks,
--
Peter Xu
the basis for
> making decisions.
For example:
https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarca...@redhat.com/
--
Peter Xu
me, no strong opinion, but is there any strong one over MOVE?
MOVE is a fine name, however considering UFFDIO_REMAP's long history.. I
tend to prefer keeping it called as REMAP - it still sounds sane, and
anyone who knows REMAP will know this is exactly that.
Thanks,
--
Peter Xu
>
> I'm curious: why don't we need this for existing users of
> page_move_anon_rmap()? What's special about UFFDIO_REMAP?
Totally no expert on anon vma so I'm prone to errors, but IIUC the
difference here is root anon vma cannot change in page_move_anon_rmap(),
while UFFDIO_REMAP can.
Thanks,
--
Peter Xu
, data won't get lost
during movement, and it will generate a missing event after moved, with
latest data showing up on dest.
I'm not sure that means such a fallback is a problem, Suren may know
better with the use case.
Thanks,
--
Peter Xu
hat regard.
Again, I don't know whether anyone cares on any of those, though..
Thanks,
--
Peter Xu
ch we can already hit similar
case, mfill_atomic_install_pte() has:
ret = -EAGAIN;
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
if (!dst_pte)
goto out;
Thanks,
--
Peter Xu
On Thu, Sep 28, 2023 at 07:51:18PM +0200, David Hildenbrand wrote:
> On 28.09.23 19:21, Peter Xu wrote:
> > On Thu, Sep 28, 2023 at 07:05:40PM +0200, David Hildenbrand wrote:
> > > As described as reply to v1, without fork() and KSM, the PAE bit should
> > > stick aro
ing folio_mapcount() for pte, and another
helper (perhaps: _nr_pages_mapped==COMPOUND_MAPPED && _entire_mapcount==1)
for thp. But I know that's not ideal either.
--
Peter Xu
if (fatal_signal_pending(current)) {
err = -EINTR;
goto out;
}
/* Continue with the loop */
break;
default:
goto out;
}
Not super good but maybe slightly better? No strong opinions, but if you
agree that looks better we can use it.
> + }
> +
> +out:
> + mmap_read_unlock(dst_mm);
> + if (dst_mm != src_mm)
> + mmap_read_unlock(src_mm);
> + BUG_ON(moved < 0);
> + BUG_ON(err > 0);
> + BUG_ON(!moved && !err);
> + return moved ? moved : err;
> +}
I think for the rest I'll read the new version (e.g. I saw discussion on
proper handling of pmd swap entries, which is not yet addressed, but
probably will in the next one).
Thanks,
--
Peter Xu
a page or installing a new one with valid
content imply VM_WRITE to me on either side..
Thanks,
--
Peter Xu
arers(src_folio) != 1) {
>
> ^ here you should check PageAnonExclusive. Please get rid of any implicit
> explicit/implcit mapcount checks.
David, is PageAnon 100% accurate now in the current tree?
IOW, can it be possible that the page has total_mapcount==1 but missing
AnonExclusive bit in any possible way?
Thanks,
--
Peter Xu
> +
> if (atomic_dec_and_test(_vma->refcount)) {
> /*
> * Oops, we held the last refcount, release the lock
> --
> 2.42.0.515.g380fc7ccd1-goog
>
--
Peter Xu
41 matches
Mail list logo