Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
On 12.08.15 at 19:24, andrew.coop...@citrix.com wrote: On 12/08/15 16:26, Jan Beulich wrote: On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote: On 12/08/15 15:19, Jan Beulich wrote: @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do if ( !resuming ) { +/* + * Mark dirty all currently write-mapped pages on the final iteration + * of a HVM save operation. + */ +if ( has_hvm_container_domain(d) d-is_shut_down + d-shutdown_code == SHUTDOWN_suspend ) +hvm_mapped_guest_frames_mark_dirty(d); I am not sure whether this is useful. There are situations such as memory snapshot where it is insufficient, as the domain doesn't suspend. Perhaps the condition could be refined then? And then - isn't memory snapshot what Remus does (which I checked does a suspend in the tool stack)? XenServer live memory snapshots of windows VMs uses a windows API to quiesce IO, pauses the domain, performs a non-live save, then unpauses the domain. Such an action would want the these bits in the bitmap, but would not match those conditions. As said - the conditions could be adjusted (e.g. to also include tool stack paused domains). It would probably be better to hook this off a specific request from the toolstack, as the migration code is in a far better position to know whether this information is needed or can be deferred to the next iteration. Which next iteration (when we're talking about the final one)? I considered tool stack involvement, but would like to go that route only if unavoidable. It is wrong for Xen to guess. The toolstack should explicitly ask for them when it wants them. I have just written my slides for Seattle, which cover some of the outstanding issues with regards to guests and logdirty. As migration with nested virt doesn't function at all, fixing these entries in the logdirty bitmap is not urgent if you don't fancy extending/tweaking xen_domctl_shadow_op. Agreed - while I'd like patch 1 to go in for 4.6, this one can wait. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
On 12/08/15 15:19, Jan Beulich wrote: Rather than dirtying a page when establishing a (permanent) mapping, dirty it when the page gets unmapped, or - if still mapped - on the final iteration of a save operation. (Transient mappings continue to get dirtied upon getting mapped, to avoid the overhead of tracking.) Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1556,6 +1556,8 @@ int hvm_domain_initialise(struct domain INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list); spin_lock_init(d-arch.hvm_domain.irq_lock); spin_lock_init(d-arch.hvm_domain.uc_lock); +spin_lock_init(d-arch.hvm_domain.write_map.lock); +INIT_LIST_HEAD(d-arch.hvm_domain.write_map.list); hvm_init_cacheattr_region_list(d); @@ -3667,6 +3669,11 @@ int hvm_virtual_to_linear_addr( return 1; } +struct hvm_write_map { +struct list_head list; +struct page_info *page; +}; + /* On non-NULL return, we leave this function holding an additional * ref on the underlying mfn, if any */ static void *_hvm_map_guest_frame(unsigned long gfn, bool_t permanent, @@ -3694,15 +3701,30 @@ static void *_hvm_map_guest_frame(unsign if ( writable ) { -if ( !p2m_is_discard_write(p2mt) ) -paging_mark_dirty(d, page_to_mfn(page)); -else +if ( unlikely(p2m_is_discard_write(p2mt)) ) *writable = 0; +else if ( !permanent ) +paging_mark_dirty(d, page_to_mfn(page)); } if ( !permanent ) return __map_domain_page(page); +if ( writable *writable ) +{ +struct hvm_write_map *track = xmalloc(struct hvm_write_map); + +if ( !track ) +{ +put_page(page); +return NULL; +} +track-page = page; +spin_lock(d-arch.hvm_domain.write_map.lock); +list_add_tail(track-list, d-arch.hvm_domain.write_map.list); Map and unmap of non-permenant pages will happen in a stacked fashion, so putting non-persistent pages on the head of the list will be more efficient when walking the list for removal. +spin_unlock(d-arch.hvm_domain.write_map.lock); +} + map = __map_domain_page_global(page); if ( !map ) put_page(page); @@ -3725,18 +3747,45 @@ void *hvm_map_guest_frame_ro(unsigned lo void hvm_unmap_guest_frame(void *p, bool_t permanent) { unsigned long mfn; +struct page_info *page; if ( !p ) return; mfn = domain_page_map_to_mfn(p); +page = mfn_to_page(mfn); if ( !permanent ) unmap_domain_page(p); else +{ +struct domain *d = page_get_owner(page); +struct hvm_write_map *track; + unmap_domain_page_global(p); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +if ( track-page == page ) +{ +paging_mark_dirty(d, mfn); +list_del(track-list); +xfree(track); +break; +} +spin_unlock(d-arch.hvm_domain.write_map.lock); +} + +put_page(page); +} + +void hvm_mapped_guest_frames_mark_dirty(struct domain *d) +{ +struct hvm_write_map *track; -put_page(mfn_to_page(mfn)); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +paging_mark_dirty(d, page_to_mfn(track-page)); This is potentially a long running operation. It might be easier to ASSERT() an upper limit of mapped pages than to make this restartable. +spin_unlock(d-arch.hvm_domain.write_map.lock); } static void *hvm_map_entry(unsigned long va, bool_t *writable) --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -29,6 +29,7 @@ #include asm/hvm/nestedhvm.h #include xen/numa.h #include xsm/xsm.h +#include public/sched.h /* SHUTDOWN_suspend */ #include mm-locks.h @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do if ( !resuming ) { +/* + * Mark dirty all currently write-mapped pages on the final iteration + * of a HVM save operation. + */ +if ( has_hvm_container_domain(d) d-is_shut_down + d-shutdown_code == SHUTDOWN_suspend ) +hvm_mapped_guest_frames_mark_dirty(d); I am not sure whether this is useful. There are situations such as memory snapshot where it is insufficient, as the domain doesn't suspend. It would probably be better to hook this off a specific request from the toolstack, as the migration code is in a far better position to know whether this information is needed or can be deferred to the next iteration. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org
Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote: On 12/08/15 15:19, Jan Beulich wrote: +if ( writable *writable ) +{ +struct hvm_write_map *track = xmalloc(struct hvm_write_map); + +if ( !track ) +{ +put_page(page); +return NULL; +} +track-page = page; +spin_lock(d-arch.hvm_domain.write_map.lock); +list_add_tail(track-list, d-arch.hvm_domain.write_map.list); Map and unmap of non-permenant pages will happen in a stacked fashion, so putting non-persistent pages on the head of the list will be more efficient when walking the list for removal. But this is dealing with permanent mappings. +void hvm_mapped_guest_frames_mark_dirty(struct domain *d) +{ +struct hvm_write_map *track; -put_page(mfn_to_page(mfn)); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +paging_mark_dirty(d, page_to_mfn(track-page)); This is potentially a long running operation. It might be easier to ASSERT() an upper limit of mapped pages than to make this restartable. I don't think I can come up with a sensible upper limit - do you have a good suggestion (including a rationale)? Also I don't view this as an immediate problem - as long as we're limiting HVM guests to 128 vCPU-s we're not going to see many more than 128 such mappings, and even those only from nested HVM code. (So to answer my question - 256 would seem a reasonable limit for now.) --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -29,6 +29,7 @@ #include asm/hvm/nestedhvm.h #include xen/numa.h #include xsm/xsm.h +#include public/sched.h /* SHUTDOWN_suspend */ #include mm-locks.h @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do if ( !resuming ) { +/* + * Mark dirty all currently write-mapped pages on the final iteration + * of a HVM save operation. + */ +if ( has_hvm_container_domain(d) d-is_shut_down + d-shutdown_code == SHUTDOWN_suspend ) +hvm_mapped_guest_frames_mark_dirty(d); I am not sure whether this is useful. There are situations such as memory snapshot where it is insufficient, as the domain doesn't suspend. Perhaps the condition could be refined then? And then - isn't memory snapshot what Remus does (which I checked does a suspend in the tool stack)? It would probably be better to hook this off a specific request from the toolstack, as the migration code is in a far better position to know whether this information is needed or can be deferred to the next iteration. Which next iteration (when we're talking about the final one)? I considered tool stack involvement, but would like to go that route only if unavoidable. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
Rather than dirtying a page when establishing a (permanent) mapping, dirty it when the page gets unmapped, or - if still mapped - on the final iteration of a save operation. (Transient mappings continue to get dirtied upon getting mapped, to avoid the overhead of tracking.) Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1556,6 +1556,8 @@ int hvm_domain_initialise(struct domain INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list); spin_lock_init(d-arch.hvm_domain.irq_lock); spin_lock_init(d-arch.hvm_domain.uc_lock); +spin_lock_init(d-arch.hvm_domain.write_map.lock); +INIT_LIST_HEAD(d-arch.hvm_domain.write_map.list); hvm_init_cacheattr_region_list(d); @@ -3667,6 +3669,11 @@ int hvm_virtual_to_linear_addr( return 1; } +struct hvm_write_map { +struct list_head list; +struct page_info *page; +}; + /* On non-NULL return, we leave this function holding an additional * ref on the underlying mfn, if any */ static void *_hvm_map_guest_frame(unsigned long gfn, bool_t permanent, @@ -3694,15 +3701,30 @@ static void *_hvm_map_guest_frame(unsign if ( writable ) { -if ( !p2m_is_discard_write(p2mt) ) -paging_mark_dirty(d, page_to_mfn(page)); -else +if ( unlikely(p2m_is_discard_write(p2mt)) ) *writable = 0; +else if ( !permanent ) +paging_mark_dirty(d, page_to_mfn(page)); } if ( !permanent ) return __map_domain_page(page); +if ( writable *writable ) +{ +struct hvm_write_map *track = xmalloc(struct hvm_write_map); + +if ( !track ) +{ +put_page(page); +return NULL; +} +track-page = page; +spin_lock(d-arch.hvm_domain.write_map.lock); +list_add_tail(track-list, d-arch.hvm_domain.write_map.list); +spin_unlock(d-arch.hvm_domain.write_map.lock); +} + map = __map_domain_page_global(page); if ( !map ) put_page(page); @@ -3725,18 +3747,45 @@ void *hvm_map_guest_frame_ro(unsigned lo void hvm_unmap_guest_frame(void *p, bool_t permanent) { unsigned long mfn; +struct page_info *page; if ( !p ) return; mfn = domain_page_map_to_mfn(p); +page = mfn_to_page(mfn); if ( !permanent ) unmap_domain_page(p); else +{ +struct domain *d = page_get_owner(page); +struct hvm_write_map *track; + unmap_domain_page_global(p); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +if ( track-page == page ) +{ +paging_mark_dirty(d, mfn); +list_del(track-list); +xfree(track); +break; +} +spin_unlock(d-arch.hvm_domain.write_map.lock); +} + +put_page(page); +} + +void hvm_mapped_guest_frames_mark_dirty(struct domain *d) +{ +struct hvm_write_map *track; -put_page(mfn_to_page(mfn)); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +paging_mark_dirty(d, page_to_mfn(track-page)); +spin_unlock(d-arch.hvm_domain.write_map.lock); } static void *hvm_map_entry(unsigned long va, bool_t *writable) --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -29,6 +29,7 @@ #include asm/hvm/nestedhvm.h #include xen/numa.h #include xsm/xsm.h +#include public/sched.h /* SHUTDOWN_suspend */ #include mm-locks.h @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do if ( !resuming ) { +/* + * Mark dirty all currently write-mapped pages on the final iteration + * of a HVM save operation. + */ +if ( has_hvm_container_domain(d) d-is_shut_down + d-shutdown_code == SHUTDOWN_suspend ) +hvm_mapped_guest_frames_mark_dirty(d); + domain_pause(d); /* --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -145,6 +145,12 @@ struct hvm_domain { unsigned long *io_bitmap; +/* List of permanently write-mapped pages. */ +struct { +spinlock_t lock; +struct list_head list; +} write_map; + union { struct vmx_domain vmx; struct svm_domain svm; --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -444,6 +444,7 @@ void *hvm_map_guest_frame_rw(unsigned lo bool_t *writable); void *hvm_map_guest_frame_ro(unsigned long gfn, bool_t permanent); void hvm_unmap_guest_frame(void *p, bool_t permanent); +void hvm_mapped_guest_frames_mark_dirty(struct domain *); static inline void hvm_set_info_guest(struct vcpu *v) { x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw() Rather than dirtying a page when establishing a (permanent) mapping,
Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
On 12/08/15 16:26, Jan Beulich wrote: On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote: On 12/08/15 15:19, Jan Beulich wrote: +if ( writable *writable ) +{ +struct hvm_write_map *track = xmalloc(struct hvm_write_map); + +if ( !track ) +{ +put_page(page); +return NULL; +} +track-page = page; +spin_lock(d-arch.hvm_domain.write_map.lock); +list_add_tail(track-list, d-arch.hvm_domain.write_map.list); Map and unmap of non-permenant pages will happen in a stacked fashion, so putting non-persistent pages on the head of the list will be more efficient when walking the list for removal. But this is dealing with permanent mappings. So it does. Sorry for the noise. +void hvm_mapped_guest_frames_mark_dirty(struct domain *d) +{ +struct hvm_write_map *track; -put_page(mfn_to_page(mfn)); +spin_lock(d-arch.hvm_domain.write_map.lock); +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list) +paging_mark_dirty(d, page_to_mfn(track-page)); This is potentially a long running operation. It might be easier to ASSERT() an upper limit of mapped pages than to make this restartable. I don't think I can come up with a sensible upper limit - do you have a good suggestion (including a rationale)? Also I don't view this as an immediate problem - as long as we're limiting HVM guests to 128 vCPU-s we're not going to see many more than 128 such mappings, and even those only from nested HVM code. (So to answer my question - 256 would seem a reasonable limit for now.) --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -29,6 +29,7 @@ #include asm/hvm/nestedhvm.h #include xen/numa.h #include xsm/xsm.h +#include public/sched.h /* SHUTDOWN_suspend */ #include mm-locks.h @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do if ( !resuming ) { +/* + * Mark dirty all currently write-mapped pages on the final iteration + * of a HVM save operation. + */ +if ( has_hvm_container_domain(d) d-is_shut_down + d-shutdown_code == SHUTDOWN_suspend ) +hvm_mapped_guest_frames_mark_dirty(d); I am not sure whether this is useful. There are situations such as memory snapshot where it is insufficient, as the domain doesn't suspend. Perhaps the condition could be refined then? And then - isn't memory snapshot what Remus does (which I checked does a suspend in the tool stack)? XenServer live memory snapshots of windows VMs uses a windows API to quiesce IO, pauses the domain, performs a non-live save, then unpauses the domain. Such an action would want the these bits in the bitmap, but would not match those conditions. It would probably be better to hook this off a specific request from the toolstack, as the migration code is in a far better position to know whether this information is needed or can be deferred to the next iteration. Which next iteration (when we're talking about the final one)? I considered tool stack involvement, but would like to go that route only if unavoidable. It is wrong for Xen to guess. The toolstack should explicitly ask for them when it wants them. I have just written my slides for Seattle, which cover some of the outstanding issues with regards to guests and logdirty. As migration with nested virt doesn't function at all, fixing these entries in the logdirty bitmap is not urgent if you don't fancy extending/tweaking xen_domctl_shadow_op. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel