Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

2015-08-13 Thread Jan Beulich
 On 12.08.15 at 19:24, andrew.coop...@citrix.com wrote:
 On 12/08/15 16:26, Jan Beulich wrote:
 On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote:
 On 12/08/15 15:19, Jan Beulich wrote:
 @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
  
  if ( !resuming )
  {
 +/*
 + * Mark dirty all currently write-mapped pages on the final 
 iteration
 + * of a HVM save operation.
 + */
 +if ( has_hvm_container_domain(d)  d-is_shut_down 
 + d-shutdown_code == SHUTDOWN_suspend )
 +hvm_mapped_guest_frames_mark_dirty(d);
 I am not sure whether this is useful.  There are situations such as
 memory snapshot where it is insufficient, as the domain doesn't suspend.
 Perhaps the condition could be refined then? And then - isn't
 memory snapshot what Remus does (which I checked does a
 suspend in the tool stack)?
 
 XenServer live memory snapshots of windows VMs uses a windows API to
 quiesce IO, pauses the domain, performs a non-live save, then unpauses
 the domain.
 
 Such an action would want the these bits in the bitmap, but would not
 match those conditions.

As said - the conditions could be adjusted (e.g. to also include
tool stack paused domains).

 It would probably be better to hook this off a specific request from the
 toolstack, as the migration code is in a far better position to know
 whether this information is needed or can be deferred to the next iteration.
 Which next iteration (when we're talking about the final one)?

 I considered tool stack involvement, but would like to go that
 route only if unavoidable.
 
 It is wrong for Xen to guess.  The toolstack should explicitly ask for
 them when it wants them.
 
 I have just written my slides for Seattle, which cover some of the
 outstanding issues with regards to guests and logdirty.  As migration
 with nested virt doesn't function at all, fixing these entries in the
 logdirty bitmap is not urgent if you don't fancy extending/tweaking
 xen_domctl_shadow_op.

Agreed - while I'd like patch 1 to go in for 4.6, this one can wait.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

2015-08-12 Thread Andrew Cooper
On 12/08/15 15:19, Jan Beulich wrote:
 Rather than dirtying a page when establishing a (permanent) mapping,
 dirty it when the page gets unmapped, or - if still mapped - on the
 final iteration of a save operation. (Transient mappings continue to
 get dirtied upon getting mapped, to avoid the overhead of tracking.)

 Signed-off-by: Jan Beulich jbeul...@suse.com

 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -1556,6 +1556,8 @@ int hvm_domain_initialise(struct domain 
  INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list);
  spin_lock_init(d-arch.hvm_domain.irq_lock);
  spin_lock_init(d-arch.hvm_domain.uc_lock);
 +spin_lock_init(d-arch.hvm_domain.write_map.lock);
 +INIT_LIST_HEAD(d-arch.hvm_domain.write_map.list);
  
  hvm_init_cacheattr_region_list(d);
  
 @@ -3667,6 +3669,11 @@ int hvm_virtual_to_linear_addr(
  return 1;
  }
  
 +struct hvm_write_map {
 +struct list_head list;
 +struct page_info *page;
 +};
 +
  /* On non-NULL return, we leave this function holding an additional 
   * ref on the underlying mfn, if any */
  static void *_hvm_map_guest_frame(unsigned long gfn, bool_t permanent,
 @@ -3694,15 +3701,30 @@ static void *_hvm_map_guest_frame(unsign
  
  if ( writable )
  {
 -if ( !p2m_is_discard_write(p2mt) )
 -paging_mark_dirty(d, page_to_mfn(page));
 -else
 +if ( unlikely(p2m_is_discard_write(p2mt)) )
  *writable = 0;
 +else if ( !permanent )
 +paging_mark_dirty(d, page_to_mfn(page));
  }
  
  if ( !permanent )
  return __map_domain_page(page);
  
 +if ( writable  *writable )
 +{
 +struct hvm_write_map *track = xmalloc(struct hvm_write_map);
 +
 +if ( !track )
 +{
 +put_page(page);
 +return NULL;
 +}
 +track-page = page;
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_add_tail(track-list, d-arch.hvm_domain.write_map.list);

Map and unmap of non-permenant pages will happen in a stacked fashion,
so putting non-persistent pages on the head of the list will be more
efficient when walking the list for removal.

 +spin_unlock(d-arch.hvm_domain.write_map.lock);
 +}
 +
  map = __map_domain_page_global(page);
  if ( !map )
  put_page(page);
 @@ -3725,18 +3747,45 @@ void *hvm_map_guest_frame_ro(unsigned lo
  void hvm_unmap_guest_frame(void *p, bool_t permanent)
  {
  unsigned long mfn;
 +struct page_info *page;
  
  if ( !p )
  return;
  
  mfn = domain_page_map_to_mfn(p);
 +page = mfn_to_page(mfn);
  
  if ( !permanent )
  unmap_domain_page(p);
  else
 +{
 +struct domain *d = page_get_owner(page);
 +struct hvm_write_map *track;
 +
  unmap_domain_page_global(p);
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
 +if ( track-page == page )
 +{
 +paging_mark_dirty(d, mfn);
 +list_del(track-list);
 +xfree(track);
 +break;
 +}
 +spin_unlock(d-arch.hvm_domain.write_map.lock);
 +}
 +
 +put_page(page);
 +}
 +
 +void hvm_mapped_guest_frames_mark_dirty(struct domain *d)
 +{
 +struct hvm_write_map *track;
  
 -put_page(mfn_to_page(mfn));
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
 +paging_mark_dirty(d, page_to_mfn(track-page));

This is potentially a long running operation.  It might be easier to
ASSERT() an upper limit of mapped pages than to make this restartable.

 +spin_unlock(d-arch.hvm_domain.write_map.lock);
  }
  
  static void *hvm_map_entry(unsigned long va, bool_t *writable)
 --- a/xen/arch/x86/mm/paging.c
 +++ b/xen/arch/x86/mm/paging.c
 @@ -29,6 +29,7 @@
  #include asm/hvm/nestedhvm.h
  #include xen/numa.h
  #include xsm/xsm.h
 +#include public/sched.h /* SHUTDOWN_suspend */
  
  #include mm-locks.h
  
 @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
  
  if ( !resuming )
  {
 +/*
 + * Mark dirty all currently write-mapped pages on the final iteration
 + * of a HVM save operation.
 + */
 +if ( has_hvm_container_domain(d)  d-is_shut_down 
 + d-shutdown_code == SHUTDOWN_suspend )
 +hvm_mapped_guest_frames_mark_dirty(d);

I am not sure whether this is useful.  There are situations such as
memory snapshot where it is insufficient, as the domain doesn't suspend.

It would probably be better to hook this off a specific request from the
toolstack, as the migration code is in a far better position to know
whether this information is needed or can be deferred to the next iteration.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org

Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

2015-08-12 Thread Jan Beulich
 On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote:
 On 12/08/15 15:19, Jan Beulich wrote:
 +if ( writable  *writable )
 +{
 +struct hvm_write_map *track = xmalloc(struct hvm_write_map);
 +
 +if ( !track )
 +{
 +put_page(page);
 +return NULL;
 +}
 +track-page = page;
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_add_tail(track-list, d-arch.hvm_domain.write_map.list);
 
 Map and unmap of non-permenant pages will happen in a stacked fashion,
 so putting non-persistent pages on the head of the list will be more
 efficient when walking the list for removal.

But this is dealing with permanent mappings.

 +void hvm_mapped_guest_frames_mark_dirty(struct domain *d)
 +{
 +struct hvm_write_map *track;
  
 -put_page(mfn_to_page(mfn));
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
 +paging_mark_dirty(d, page_to_mfn(track-page));
 
 This is potentially a long running operation.  It might be easier to
 ASSERT() an upper limit of mapped pages than to make this restartable.

I don't think I can come up with a sensible upper limit - do you have a
good suggestion (including a rationale)? Also I don't view this as an
immediate problem - as long as we're limiting HVM guests to 128 vCPU-s
we're not going to see many more than 128 such mappings, and even
those only from nested HVM code. (So to answer my question - 256
would seem a reasonable limit for now.)

 --- a/xen/arch/x86/mm/paging.c
 +++ b/xen/arch/x86/mm/paging.c
 @@ -29,6 +29,7 @@
  #include asm/hvm/nestedhvm.h
  #include xen/numa.h
  #include xsm/xsm.h
 +#include public/sched.h /* SHUTDOWN_suspend */
  
  #include mm-locks.h
  
 @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
  
  if ( !resuming )
  {
 +/*
 + * Mark dirty all currently write-mapped pages on the final 
 iteration
 + * of a HVM save operation.
 + */
 +if ( has_hvm_container_domain(d)  d-is_shut_down 
 + d-shutdown_code == SHUTDOWN_suspend )
 +hvm_mapped_guest_frames_mark_dirty(d);
 
 I am not sure whether this is useful.  There are situations such as
 memory snapshot where it is insufficient, as the domain doesn't suspend.

Perhaps the condition could be refined then? And then - isn't
memory snapshot what Remus does (which I checked does a
suspend in the tool stack)?

 It would probably be better to hook this off a specific request from the
 toolstack, as the migration code is in a far better position to know
 whether this information is needed or can be deferred to the next iteration.

Which next iteration (when we're talking about the final one)?

I considered tool stack involvement, but would like to go that
route only if unavoidable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

2015-08-12 Thread Jan Beulich
Rather than dirtying a page when establishing a (permanent) mapping,
dirty it when the page gets unmapped, or - if still mapped - on the
final iteration of a save operation. (Transient mappings continue to
get dirtied upon getting mapped, to avoid the overhead of tracking.)

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1556,6 +1556,8 @@ int hvm_domain_initialise(struct domain 
 INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list);
 spin_lock_init(d-arch.hvm_domain.irq_lock);
 spin_lock_init(d-arch.hvm_domain.uc_lock);
+spin_lock_init(d-arch.hvm_domain.write_map.lock);
+INIT_LIST_HEAD(d-arch.hvm_domain.write_map.list);
 
 hvm_init_cacheattr_region_list(d);
 
@@ -3667,6 +3669,11 @@ int hvm_virtual_to_linear_addr(
 return 1;
 }
 
+struct hvm_write_map {
+struct list_head list;
+struct page_info *page;
+};
+
 /* On non-NULL return, we leave this function holding an additional 
  * ref on the underlying mfn, if any */
 static void *_hvm_map_guest_frame(unsigned long gfn, bool_t permanent,
@@ -3694,15 +3701,30 @@ static void *_hvm_map_guest_frame(unsign
 
 if ( writable )
 {
-if ( !p2m_is_discard_write(p2mt) )
-paging_mark_dirty(d, page_to_mfn(page));
-else
+if ( unlikely(p2m_is_discard_write(p2mt)) )
 *writable = 0;
+else if ( !permanent )
+paging_mark_dirty(d, page_to_mfn(page));
 }
 
 if ( !permanent )
 return __map_domain_page(page);
 
+if ( writable  *writable )
+{
+struct hvm_write_map *track = xmalloc(struct hvm_write_map);
+
+if ( !track )
+{
+put_page(page);
+return NULL;
+}
+track-page = page;
+spin_lock(d-arch.hvm_domain.write_map.lock);
+list_add_tail(track-list, d-arch.hvm_domain.write_map.list);
+spin_unlock(d-arch.hvm_domain.write_map.lock);
+}
+
 map = __map_domain_page_global(page);
 if ( !map )
 put_page(page);
@@ -3725,18 +3747,45 @@ void *hvm_map_guest_frame_ro(unsigned lo
 void hvm_unmap_guest_frame(void *p, bool_t permanent)
 {
 unsigned long mfn;
+struct page_info *page;
 
 if ( !p )
 return;
 
 mfn = domain_page_map_to_mfn(p);
+page = mfn_to_page(mfn);
 
 if ( !permanent )
 unmap_domain_page(p);
 else
+{
+struct domain *d = page_get_owner(page);
+struct hvm_write_map *track;
+
 unmap_domain_page_global(p);
+spin_lock(d-arch.hvm_domain.write_map.lock);
+list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
+if ( track-page == page )
+{
+paging_mark_dirty(d, mfn);
+list_del(track-list);
+xfree(track);
+break;
+}
+spin_unlock(d-arch.hvm_domain.write_map.lock);
+}
+
+put_page(page);
+}
+
+void hvm_mapped_guest_frames_mark_dirty(struct domain *d)
+{
+struct hvm_write_map *track;
 
-put_page(mfn_to_page(mfn));
+spin_lock(d-arch.hvm_domain.write_map.lock);
+list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
+paging_mark_dirty(d, page_to_mfn(track-page));
+spin_unlock(d-arch.hvm_domain.write_map.lock);
 }
 
 static void *hvm_map_entry(unsigned long va, bool_t *writable)
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -29,6 +29,7 @@
 #include asm/hvm/nestedhvm.h
 #include xen/numa.h
 #include xsm/xsm.h
+#include public/sched.h /* SHUTDOWN_suspend */
 
 #include mm-locks.h
 
@@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
 
 if ( !resuming )
 {
+/*
+ * Mark dirty all currently write-mapped pages on the final iteration
+ * of a HVM save operation.
+ */
+if ( has_hvm_container_domain(d)  d-is_shut_down 
+ d-shutdown_code == SHUTDOWN_suspend )
+hvm_mapped_guest_frames_mark_dirty(d);
+
 domain_pause(d);
 
 /*
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -145,6 +145,12 @@ struct hvm_domain {
 
 unsigned long *io_bitmap;
 
+/* List of permanently write-mapped pages. */
+struct {
+spinlock_t lock;
+struct list_head list;
+} write_map;
+
 union {
 struct vmx_domain vmx;
 struct svm_domain svm;
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -444,6 +444,7 @@ void *hvm_map_guest_frame_rw(unsigned lo
  bool_t *writable);
 void *hvm_map_guest_frame_ro(unsigned long gfn, bool_t permanent);
 void hvm_unmap_guest_frame(void *p, bool_t permanent);
+void hvm_mapped_guest_frames_mark_dirty(struct domain *);
 
 static inline void hvm_set_info_guest(struct vcpu *v)
 {


x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

Rather than dirtying a page when establishing a (permanent) mapping,

Re: [Xen-devel] [PATCH v2 2/2] x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

2015-08-12 Thread Andrew Cooper
On 12/08/15 16:26, Jan Beulich wrote:
 On 12.08.15 at 17:13, andrew.coop...@citrix.com wrote:
 On 12/08/15 15:19, Jan Beulich wrote:
 +if ( writable  *writable )
 +{
 +struct hvm_write_map *track = xmalloc(struct hvm_write_map);
 +
 +if ( !track )
 +{
 +put_page(page);
 +return NULL;
 +}
 +track-page = page;
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_add_tail(track-list, d-arch.hvm_domain.write_map.list);
 Map and unmap of non-permenant pages will happen in a stacked fashion,
 so putting non-persistent pages on the head of the list will be more
 efficient when walking the list for removal.
 But this is dealing with permanent mappings.

So it does.  Sorry for the noise.


 +void hvm_mapped_guest_frames_mark_dirty(struct domain *d)
 +{
 +struct hvm_write_map *track;
  
 -put_page(mfn_to_page(mfn));
 +spin_lock(d-arch.hvm_domain.write_map.lock);
 +list_for_each_entry(track, d-arch.hvm_domain.write_map.list, list)
 +paging_mark_dirty(d, page_to_mfn(track-page));
 This is potentially a long running operation.  It might be easier to
 ASSERT() an upper limit of mapped pages than to make this restartable.
 I don't think I can come up with a sensible upper limit - do you have a
 good suggestion (including a rationale)? Also I don't view this as an
 immediate problem - as long as we're limiting HVM guests to 128 vCPU-s
 we're not going to see many more than 128 such mappings, and even
 those only from nested HVM code. (So to answer my question - 256
 would seem a reasonable limit for now.)

 --- a/xen/arch/x86/mm/paging.c
 +++ b/xen/arch/x86/mm/paging.c
 @@ -29,6 +29,7 @@
  #include asm/hvm/nestedhvm.h
  #include xen/numa.h
  #include xsm/xsm.h
 +#include public/sched.h /* SHUTDOWN_suspend */
  
  #include mm-locks.h
  
 @@ -422,6 +423,14 @@ static int paging_log_dirty_op(struct do
  
  if ( !resuming )
  {
 +/*
 + * Mark dirty all currently write-mapped pages on the final 
 iteration
 + * of a HVM save operation.
 + */
 +if ( has_hvm_container_domain(d)  d-is_shut_down 
 + d-shutdown_code == SHUTDOWN_suspend )
 +hvm_mapped_guest_frames_mark_dirty(d);
 I am not sure whether this is useful.  There are situations such as
 memory snapshot where it is insufficient, as the domain doesn't suspend.
 Perhaps the condition could be refined then? And then - isn't
 memory snapshot what Remus does (which I checked does a
 suspend in the tool stack)?

XenServer live memory snapshots of windows VMs uses a windows API to
quiesce IO, pauses the domain, performs a non-live save, then unpauses
the domain.

Such an action would want the these bits in the bitmap, but would not
match those conditions.


 It would probably be better to hook this off a specific request from the
 toolstack, as the migration code is in a far better position to know
 whether this information is needed or can be deferred to the next iteration.
 Which next iteration (when we're talking about the final one)?

 I considered tool stack involvement, but would like to go that
 route only if unavoidable.

It is wrong for Xen to guess.  The toolstack should explicitly ask for
them when it wants them.

I have just written my slides for Seattle, which cover some of the
outstanding issues with regards to guests and logdirty.  As migration
with nested virt doesn't function at all, fixing these entries in the
logdirty bitmap is not urgent if you don't fancy extending/tweaking
xen_domctl_shadow_op.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel