Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Olaf Hering
On Wed, Aug 30, Wei Liu wrote:

> > Can this actually happen with the available senders? If not, this is
> > again the missing memory map.
> Probably not now, but as said, you shouldn't rely on the structure of
> the stream unless it is stated in the spec.

Well, what can happen with todays implementation on the sender side is
the case of a ballooned guest with enough holes within a batch. These
will trigger 1G allocations before the releasing of memory happens. To
solve this, the releasing of memory has to happen more often, probably
after crossing each 2M boundary.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Wei Liu
On Wed, Aug 30, 2017 at 04:27:19PM +0200, Olaf Hering wrote:
> On Wed, Aug 30, Wei Liu wrote:
> 
> > As far as I can tell the algorithm in the patch can't handle:
> > 
> > 1. First pfn in a batch points to start of second 1G address space
> > 2. Second pfn in a batch points to a page in the middle of first 1G
> > 3. Guest can only use 1G ram
> 
> In which way does it not handle it? Over-allocation is supposed to be
> handled by the "ctx->restore.tot_pages + sp->count >
> ctx->restore.max_pages" checks. Do you mean the second 1G is allocated,
> then max_pages is reached, and allocation in other areas is not possible
> anymore?

Yes.

> Can this actually happen with the available senders? If not, this is
> again the missing memory map.
> 

Probably not now, but as said, you shouldn't rely on the structure of
the stream unless it is stated in the spec.

And it can definitely happen with post-copy algorithm.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Olaf Hering
On Wed, Aug 30, Wei Liu wrote:

> As far as I can tell the algorithm in the patch can't handle:
> 
> 1. First pfn in a batch points to start of second 1G address space
> 2. Second pfn in a batch points to a page in the middle of first 1G
> 3. Guest can only use 1G ram

In which way does it not handle it? Over-allocation is supposed to be
handled by the "ctx->restore.tot_pages + sp->count >
ctx->restore.max_pages" checks. Do you mean the second 1G is allocated,
then max_pages is reached, and allocation in other areas is not possible
anymore?
Can this actually happen with the available senders? If not, this is
again the missing memory map.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Wei Liu
On Sat, Aug 26, 2017 at 12:33:32PM +0200, Olaf Hering wrote:
[...]
> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
> + const xen_pfn_t *original_pfns,
> + const uint32_t *types)
> +{
> +xc_interface *xch = ctx->xch;
> +xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0];
> +unsigned i, freed = 0, order;
> +int rc = -1;
> +
> +for ( i = 0; i < count; ++i )
> +{
> +if ( original_pfns[i] < min_pfn )
> +min_pfn = original_pfns[i];
> +if ( original_pfns[i] > max_pfn )
> +max_pfn = original_pfns[i];
> +}
> +DPRINTF("batch of %u pfns between %" PRI_xen_pfn " %" PRI_xen_pfn "\n",
> +count, min_pfn, max_pfn);
> +
> +for ( i = 0; i < count; ++i )
> +{
> +if ( (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
> +  types[i] != XEN_DOMCTL_PFINFO_BROKEN) &&
> + !pfn_is_populated(ctx, original_pfns[i]) )
> +{
> +rc = x86_hvm_allocate_pfn(ctx, original_pfns[i]);
> +if ( rc )
> +goto err;
> +rc = pfn_set_populated(ctx, original_pfns[i]);
> +if ( rc )
> +goto err;
> +}
> +}
>
As far as I can tell the algorithm in the patch can't handle:

1. First pfn in a batch points to start of second 1G address space
2. Second pfn in a batch points to a page in the middle of first 1G
3. Guest can only use 1G ram

This is a valid scenario in post-copy migration algorithm.

Please correct me if I'm wrong.

> +
> +/*
> + * Scan the entire superpage because several batches will fit into
> + * a superpage, and it is unknown which pfn triggered the allocation.
> + */
> +order = SUPERPAGE_1GB_SHIFT;
> +pfn = min_pfn = (min_pfn >> order) << order;
> +
> +while ( pfn <= max_pfn )
> +{
> +struct xc_sr_bitmap *bm;
> +bm = >x86_hvm.restore.allocated_pfns;
> +if ( !xc_sr_bitmap_resize(bm, pfn) )
> +{
> +PERROR("Failed to realloc allocated_pfns %" PRI_xen_pfn, pfn);
> +goto err;
> +}
> +if ( !pfn_is_populated(ctx, pfn) &&
> +xc_sr_test_and_clear_bit(pfn, bm) ) {
> +xen_pfn_t p = pfn;
> +rc = xc_domain_decrease_reservation_exact(xch, ctx->domid, 1, 0, 
> );
> +if ( rc )
> +{
> +PERROR("Failed to release pfn %" PRI_xen_pfn, pfn);
> +goto err;
> +}
> +ctx->restore.tot_pages--;
> +freed++;
> +}
> +pfn++;
> +}
> +if ( freed )
> +DPRINTF("freed %u between %" PRI_xen_pfn " %" PRI_xen_pfn "\n",
> +freed, min_pfn, max_pfn);
> +
> +rc = 0;
> +
> + err:
> +return rc;
> +}
> +

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-29 Thread Olaf Hering
On Sat, Aug 26, Olaf Hering wrote:

> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,

> +/*
> + * Scan the entire superpage because several batches will fit into
> + * a superpage, and it is unknown which pfn triggered the allocation.
> + */
> +order = SUPERPAGE_1GB_SHIFT;
> +pfn = min_pfn = (min_pfn >> order) << order;

Scanning an entire superpage again and again looked expensive, but with
the debug change below it turned out that the loop which peeks at each
single bit in populated_pfns is likely not a bootleneck.

Migrating a domU with a simple workload that touches pages to mark them
dirty will set the min_pfn/max_pfn to a large range anyway after the
first iteration. This large range may also happen with an idle domU. A
small domU takes 78 seconds to migrate, and just the freeing part takes
1.4 seconds. Similar for a large domain, the loop takes 1% of the time.

 78 seconds, 1.4 seconds, 2119 calls  (8GB, 12*512M memdirty)
695 seconds, 7.6 seconds, 18076 calls (72GB, 12*5G memdirty)

Olaf

track time spent if decrease_reservation is needed

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 0fa0fbea4d..5ec8b6fee6 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -353,6 +353,9 @@ struct xc_sr_context
 struct xc_sr_bitmap attempted_1g;
 struct xc_sr_bitmap attempted_2m;
 struct xc_sr_bitmap allocated_pfns;
+
+unsigned long tv_nsec;
+unsigned long iterations;
 } restore;
 };
 } x86_hvm;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 8cd9289d1a..f6aad329e2 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -769,6 +769,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
 {
 ctx.restore.ops = restore_ops_x86_hvm;
 if ( restore() )
+;
 return -1;
 }
 else
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 2b0eca0c7c..11758b3f7d 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 #include "xc_sr_common_x86.h"
 
@@ -248,6 +249,12 @@ static int x86_hvm_stream_complete(struct xc_sr_context 
*ctx)
 
 static int x86_hvm_cleanup(struct xc_sr_context *ctx)
 {
+xc_interface *xch = ctx->xch;
+errno = 0;
+PERROR("tv_nsec %lu.%lu iterations %lu",
+ctx->x86_hvm.restore.tv_nsec / 10UL,
+ctx->x86_hvm.restore.tv_nsec % 10UL,
+ctx->x86_hvm.restore.iterations);
 free(ctx->x86_hvm.restore.context);
 xc_sr_bitmap_free(>x86_hvm.restore.attempted_1g);
 xc_sr_bitmap_free(>x86_hvm.restore.attempted_2m);
@@ -440,6 +447,28 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 return rc;
 }
 
+static void diff_timespec(struct xc_sr_context *ctx, const struct timespec 
*old, const struct timespec *new, struct timespec *diff)
+{
+xc_interface *xch = ctx->xch;
+if (new->tv_sec == old->tv_sec && new->tv_nsec == old->tv_nsec)
+PERROR("%s: time did not move: %ld/%ld == %ld/%ld", __func__, 
old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec);
+if ( (new->tv_sec < old->tv_sec) || (new->tv_sec == old->tv_sec && 
new->tv_nsec < old->tv_nsec) )
+{
+PERROR("%s: time went backwards: %ld/%ld -> %ld/%ld", __func__, 
old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec);
+diff->tv_sec = diff->tv_nsec = 0;
+return;
+}
+if ((new->tv_nsec - old->tv_nsec) < 0) {
+diff->tv_sec = new->tv_sec - old->tv_sec - 1;
+diff->tv_nsec = new->tv_nsec - old->tv_nsec + 10UL;
+} else {
+diff->tv_sec = new->tv_sec - old->tv_sec;
+diff->tv_nsec = new->tv_nsec - old->tv_nsec;
+}
+if (diff->tv_sec < 0)
+PERROR("%s: time diff broken. old: %ld/%ld new: %ld/%ld diff: %ld/%ld 
", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec, 
diff->tv_sec, diff->tv_nsec);
+}
+
 static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
  const xen_pfn_t *original_pfns,
  const uint32_t *types)
@@ -448,6 +477,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, 
unsigned count,
 xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0];
 unsigned i, freed = 0, order;
 int rc = -1;
+struct timespec a, b, d;
 
 for ( i = 0; i < count; ++i )
 {
@@ -474,6 +504,8 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, 
unsigned count,
 }
 }
 
+if (clock_gettime(CLOCK_MONOTONIC, ))
+PERROR("clock_gettime start");
 /*
  * Scan the entire superpage because several batches will fit into
  * a 

[Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-26 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  25 +++-
 tools/libxc/xc_sr_restore.c |  75 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 288 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 4 files changed, 382 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index da2691ba79..0fa0fbea4d 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-free(mfns);
-
-return rc;
-}
-
 /*
  * Given a list of