Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-06-23 Thread Dave Hansen
On 05/14/2015 10:31 AM, Andrea Arcangeli wrote:
 +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode,
 +  int wake_flags, void *key)
 +{
 + struct userfaultfd_wake_range *range = key;
 + int ret;
 + struct userfaultfd_wait_queue *uwq;
 + unsigned long start, len;
 +
 + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
 + ret = 0;
 + /* don't wake the pending ones to avoid reads to block */
 + if (uwq-pending  !ACCESS_ONCE(uwq-ctx-released))
 + goto out;
 + /* len == 0 means wake all */
 + start = range-start;
 + len = range-len;
 + if (len  (start  uwq-address || start + len = uwq-address))
 + goto out;
 + ret = wake_up_state(wq-private, mode);
 + if (ret)
 + /* wake only once, autoremove behavior */
 + list_del_init(wq-task_list);
 +out:
 + return ret;
 +}
...
 +static __always_inline int validate_range(struct mm_struct *mm,
 +   __u64 start, __u64 len)
 +{
 + __u64 task_size = mm-task_size;
 +
 + if (start  ~PAGE_MASK)
 + return -EINVAL;
 + if (len  ~PAGE_MASK)
 + return -EINVAL;
 + if (!len)
 + return -EINVAL;
 + if (start  mmap_min_addr)
 + return -EINVAL;
 + if (start = task_size)
 + return -EINVAL;
 + if (len  task_size - start)
 + return -EINVAL;
 + return 0;
 +}

Hey Andrea,

Down in userfaultfd_wake_function(), it looks like you intended for a
len=0 to mean wake all.  But the validate_range() that we do from
userspace has a !len check in it, which keeps us from passing a len=0 in
from userspace.

Was that wake all for some internal use, or is the check too strict?

I was trying to use the wake ioctl after an madvise() (as opposed to
filling things in using a userfd copy).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-06-23 Thread Andrea Arcangeli
Hi Dave,

On Tue, Jun 23, 2015 at 12:00:19PM -0700, Dave Hansen wrote:
 Down in userfaultfd_wake_function(), it looks like you intended for a
 len=0 to mean wake all.  But the validate_range() that we do from
 userspace has a !len check in it, which keeps us from passing a len=0 in
 from userspace.
 Was that wake all for some internal use, or is the check too strict?

It's for internal use or userfaultfd_release that has to wake them all
(after setting ctx-released) if the uffd is closed. It avoids to
enlarge the structure by depending on the invariant that userland
cannot pass len=0.

If we'd accept len=0 from userland as valid, I'd be safer if it does
nothing like in madvise, I doubt we want to expose this non standard
kernel internal behavior to userland.

 I was trying to use the wake ioctl after an madvise() (as opposed to
 filling things in using a userfd copy).

madvise will return 0 if len=0, mremap would return -EINVAL if new_len
is zero, mmap also returns -EINVAL if len is 0, not all MM syscalls
are as permissive as madvise. Can't you pass the same len you pass to
madvise to UFFDIO_WAKE (or just skip the call if the madvise len is
zero)?

Thanks,
Andrea
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization

2015-05-27 Thread Thomas Martitz
Andrea Arcangeli aarcange at redhat.com writes:

 
 Once an userfaultfd has been created and certain region of the process
 virtual address space have been registered into it, the thread
 responsible for doing the memory externalization can manage the page
 faults in userland by talking to the kernel using the userfaultfd
 protocol.
 
 poll() can be used to know when there are new pending userfaults to be
 read (POLLIN).
 


Hello,

I already asked this for v3 but got no reply, so trying again:

I'm wondering why a new syscall was chosen over a simple special file
/dev/userfault (analogous to /dev/shm) to obtain an fd. In my book the
special file has only advantanges: no additional syscall is needed, system
admins can tweak access to this feature via normal file permissions, and
signaling the availability of the feature in the kernel simply by the
existence of the dev file.

I already wondered the same for memfd(). Here I can perhaps follow that
there is a need such fds before /dev is mounted (because PID1 might need
it). But not for this case as devtmpfs should be mounted early enough.

Not saying it's the wrong decision, but I want to learn about the rationale.

Best regards.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html