Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
On 05/14/2015 10:31 AM, Andrea Arcangeli wrote: +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, + int wake_flags, void *key) +{ + struct userfaultfd_wake_range *range = key; + int ret; + struct userfaultfd_wait_queue *uwq; + unsigned long start, len; + + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); + ret = 0; + /* don't wake the pending ones to avoid reads to block */ + if (uwq-pending !ACCESS_ONCE(uwq-ctx-released)) + goto out; + /* len == 0 means wake all */ + start = range-start; + len = range-len; + if (len (start uwq-address || start + len = uwq-address)) + goto out; + ret = wake_up_state(wq-private, mode); + if (ret) + /* wake only once, autoremove behavior */ + list_del_init(wq-task_list); +out: + return ret; +} ... +static __always_inline int validate_range(struct mm_struct *mm, + __u64 start, __u64 len) +{ + __u64 task_size = mm-task_size; + + if (start ~PAGE_MASK) + return -EINVAL; + if (len ~PAGE_MASK) + return -EINVAL; + if (!len) + return -EINVAL; + if (start mmap_min_addr) + return -EINVAL; + if (start = task_size) + return -EINVAL; + if (len task_size - start) + return -EINVAL; + return 0; +} Hey Andrea, Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean wake all. But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that wake all for some internal use, or is the check too strict? I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Hi Dave, On Tue, Jun 23, 2015 at 12:00:19PM -0700, Dave Hansen wrote: Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean wake all. But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that wake all for some internal use, or is the check too strict? It's for internal use or userfaultfd_release that has to wake them all (after setting ctx-released) if the uffd is closed. It avoids to enlarge the structure by depending on the invariant that userland cannot pass len=0. If we'd accept len=0 from userland as valid, I'd be safer if it does nothing like in madvise, I doubt we want to expose this non standard kernel internal behavior to userland. I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy). madvise will return 0 if len=0, mremap would return -EINVAL if new_len is zero, mmap also returns -EINVAL if len is 0, not all MM syscalls are as permissive as madvise. Can't you pass the same len you pass to madvise to UFFDIO_WAKE (or just skip the call if the madvise len is zero)? Thanks, Andrea -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Andrea Arcangeli aarcange at redhat.com writes: Once an userfaultfd has been created and certain region of the process virtual address space have been registered into it, the thread responsible for doing the memory externalization can manage the page faults in userland by talking to the kernel using the userfaultfd protocol. poll() can be used to know when there are new pending userfaults to be read (POLLIN). Hello, I already asked this for v3 but got no reply, so trying again: I'm wondering why a new syscall was chosen over a simple special file /dev/userfault (analogous to /dev/shm) to obtain an fd. In my book the special file has only advantanges: no additional syscall is needed, system admins can tweak access to this feature via normal file permissions, and signaling the availability of the feature in the kernel simply by the existence of the dev file. I already wondered the same for memfd(). Here I can perhaps follow that there is a need such fds before /dev is mounted (because PID1 might need it). But not for this case as devtmpfs should be mounted early enough. Not saying it's the wrong decision, but I want to learn about the rationale. Best regards. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html