On Thu, Feb 03, 2022 at 06:19:22PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Wed, Jan 19, 2022 at 01:42:47PM +0000, Dr. David Alan Gilbert wrote: > > > * Peter Xu (pet...@redhat.com) wrote: > > > > Commit ba1b7c812c ("migration/ram: Optimize ram_save_host_page()") > > > > managed to > > > > optimize host huge page use case by scanning the dirty bitmap when > > > > looking for > > > > the next dirty small page to migrate. > > > > > > > > However when updating the pss->page before returning from that > > > > function, we > > > > used MIN() of these two values: (1) next dirty bit, or (2) end of > > > > current sent > > > > huge page, to fix up pss->page. > > > > > > > > That sounds unnecessary, because I see nowhere that requires pss->page > > > > to be > > > > not going over current huge page boundary. > > > > > > > > What we need here is probably MAX() instead of MIN() so that we'll start > > > > scanning from the next dirty bit next time. Since pss->page can't be > > > > smaller > > > > than hostpage_boundary (the loop guarantees it), it probably means we > > > > don't > > > > need to fix it up at all. > > > > > > > > Cc: Keqian Zhu <zhukeqi...@huawei.com> > > > > Cc: Kunkun Jiang <jiangkun...@huawei.com> > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > > > > > > > Hmm, I think that's potentially necessary. note that the start of > > > ram_save_host_page stores the 'start_page' at entry. > > > That' start_page' goes to the ram_save_release_protection and so > > > I think it needs to be pagesize aligned for the mmap/uffd that happens. > > > > Right, that's indeed a functional change, but IMHO it's also fine. > > > > When reaching ram_save_release_protection(), what we guaranteed is that > > below > > page range contains no dirty bits in ramblock dirty bitmap: > > > > range0 = [start_page, pss->page) > > > > Side note: inclusive on start, but not inclusive on the end side of range0 > > (that is, pss->page can be pointing to a dirty page). > > > > What ram_save_release_protection() does is to unprotect the pages and let > > them > > run free. If we're sure range0 contains no dirty page, it means we have > > already copied them over into the snapshot, so IIUC it's safe to unprotect > > all > > of it (even if it's already bigger than the host page size)? > > I think what's worrying me is the alignment of the address going into > UFFDIO_WRITEPROTECT in uffd_change_protection - if it was previously > huge page aligned and now isn't, what breaks? (Did it support > hugepages?)
Good point.. It doesn't support huge pages yet, but we'd better keep it always page aligned for the unprotect ioctl. > > > That can be slightly less efficient for live snapshot in some extreme cases > > (when unprotect, we'll need to walk the pgtables in the uffd ioctl()), but I > > don't assume live snapshot to be run on a huge VM, so hopefully it's still > > fine? Not to mention it should make live migration a little bit faster, > > assuming that's more frequently used.. > > Hmm I don't think I understand that statement. I meant since we've scanned over those clean pages we don't need to scan it again in the next find_dirty_block() call for precopy, per the "faster" statement. But to make it simple I think I'll drop this patch in the next version. Thanks! -- Peter Xu