On Fri, Nov 20, 2020 at 02:04:46PM +0300, Andrey Gruzdev wrote:
> > > +    RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
> > > +        /* Nothing to do with read-only and MMIO-writable regions */
> > > +        if (bs->mr->readonly || bs->mr->rom_device) {
> > > +            continue;
> > > +        }
> > > +
> > > +        /* Register block memory with UFFD to track writes */
> > > +        if (uffd_register_memory(rs->uffdio_fd, (hwaddr) bs->host,
> > > +                bs->max_length, false, true)) {
> > > +            goto fail;
> > > +        }
> > > +        /* Apply UFFD write protection to the block memory range */
> > > +        if (uffd_protect_memory(rs->uffdio_fd, (hwaddr) bs->host,
> > > +                bs->max_length, true)) {
> > 
> > Here logically we need to undo the previous register first, however 
> > userfaultfd
> > will auto-clean these when close(fd), so it's ok.  However still better to
> > unwind the protection of pages, I think.  So...
> > 
> 
> It should auto-clean, but as an additional safety measure - yes.

I'm afraid it will only clean up the registers, but not the page table updates;
at least that should be what we do now in the kernel. I'm not sure whether we
should always force the kernel to unprotect those when close(). The problem is
the registered range is normally quite large while the wr-protect range can be
very small (page-based), so that could take a lot of time, which can be
unnecessary, since the userspace is the one that knows the best on which range
was protected.

Indeed I can't think if anything really bad even if not unprotect the pages as
you do right now - what will happen is that the wr-protected pages will have
UFFD_WP set and PAGE_RW cleared in the page tables even after the close(fd).
It means after the snapshot got cancelled those wr-protected pages could still
trigger page fault again when being written, though since it's not covered by
uffd-wp protected vmas, it'll become a "normal cow" fault, and the write bit
will be recovered.  However the UFFD_WP bit in the page tables could got
leftover there...  So maybe it's still best to unprotect from userspace.

There's an idea that maybe we can auto-remove the UFFD_WP bit in kernel when
cow happens for a page, but that's definitely out of topic (and we must make
sure things like "enforced cow for read-only get_user_pages() won't happen
again").  No hard to do that in userspace, anyways.

-- 
Peter Xu


Reply via email to