Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 30, Wei Liu wrote: > > Can this actually happen with the available senders? If not, this is > > again the missing memory map. > Probably not now, but as said, you shouldn't rely on the structure of > the stream unless it is stated in the spec. Well, what can happen with todays implementation on the sender side is the case of a ballooned guest with enough holes within a batch. These will trigger 1G allocations before the releasing of memory happens. To solve this, the releasing of memory has to happen more often, probably after crossing each 2M boundary. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 30, 2017 at 04:27:19PM +0200, Olaf Hering wrote: > On Wed, Aug 30, Wei Liu wrote: > > > As far as I can tell the algorithm in the patch can't handle: > > > > 1. First pfn in a batch points to start of second 1G address space > > 2. Second pfn in a batch points to a page in the middle of first 1G > > 3. Guest can only use 1G ram > > In which way does it not handle it? Over-allocation is supposed to be > handled by the "ctx->restore.tot_pages + sp->count > > ctx->restore.max_pages" checks. Do you mean the second 1G is allocated, > then max_pages is reached, and allocation in other areas is not possible > anymore? Yes. > Can this actually happen with the available senders? If not, this is > again the missing memory map. > Probably not now, but as said, you shouldn't rely on the structure of the stream unless it is stated in the spec. And it can definitely happen with post-copy algorithm. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 30, Wei Liu wrote: > As far as I can tell the algorithm in the patch can't handle: > > 1. First pfn in a batch points to start of second 1G address space > 2. Second pfn in a batch points to a page in the middle of first 1G > 3. Guest can only use 1G ram In which way does it not handle it? Over-allocation is supposed to be handled by the "ctx->restore.tot_pages + sp->count > ctx->restore.max_pages" checks. Do you mean the second 1G is allocated, then max_pages is reached, and allocation in other areas is not possible anymore? Can this actually happen with the available senders? If not, this is again the missing memory map. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Sat, Aug 26, 2017 at 12:33:32PM +0200, Olaf Hering wrote: [...] > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > + const xen_pfn_t *original_pfns, > + const uint32_t *types) > +{ > +xc_interface *xch = ctx->xch; > +xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0]; > +unsigned i, freed = 0, order; > +int rc = -1; > + > +for ( i = 0; i < count; ++i ) > +{ > +if ( original_pfns[i] < min_pfn ) > +min_pfn = original_pfns[i]; > +if ( original_pfns[i] > max_pfn ) > +max_pfn = original_pfns[i]; > +} > +DPRINTF("batch of %u pfns between %" PRI_xen_pfn " %" PRI_xen_pfn "\n", > +count, min_pfn, max_pfn); > + > +for ( i = 0; i < count; ++i ) > +{ > +if ( (types[i] != XEN_DOMCTL_PFINFO_XTAB && > + types[i] != XEN_DOMCTL_PFINFO_BROKEN) && > + !pfn_is_populated(ctx, original_pfns[i]) ) > +{ > +rc = x86_hvm_allocate_pfn(ctx, original_pfns[i]); > +if ( rc ) > +goto err; > +rc = pfn_set_populated(ctx, original_pfns[i]); > +if ( rc ) > +goto err; > +} > +} > As far as I can tell the algorithm in the patch can't handle: 1. First pfn in a batch points to start of second 1G address space 2. Second pfn in a batch points to a page in the middle of first 1G 3. Guest can only use 1G ram This is a valid scenario in post-copy migration algorithm. Please correct me if I'm wrong. > + > +/* > + * Scan the entire superpage because several batches will fit into > + * a superpage, and it is unknown which pfn triggered the allocation. > + */ > +order = SUPERPAGE_1GB_SHIFT; > +pfn = min_pfn = (min_pfn >> order) << order; > + > +while ( pfn <= max_pfn ) > +{ > +struct xc_sr_bitmap *bm; > +bm = &ctx->x86_hvm.restore.allocated_pfns; > +if ( !xc_sr_bitmap_resize(bm, pfn) ) > +{ > +PERROR("Failed to realloc allocated_pfns %" PRI_xen_pfn, pfn); > +goto err; > +} > +if ( !pfn_is_populated(ctx, pfn) && > +xc_sr_test_and_clear_bit(pfn, bm) ) { > +xen_pfn_t p = pfn; > +rc = xc_domain_decrease_reservation_exact(xch, ctx->domid, 1, 0, > &p); > +if ( rc ) > +{ > +PERROR("Failed to release pfn %" PRI_xen_pfn, pfn); > +goto err; > +} > +ctx->restore.tot_pages--; > +freed++; > +} > +pfn++; > +} > +if ( freed ) > +DPRINTF("freed %u between %" PRI_xen_pfn " %" PRI_xen_pfn "\n", > +freed, min_pfn, max_pfn); > + > +rc = 0; > + > + err: > +return rc; > +} > + ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Sat, Aug 26, Olaf Hering wrote: > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > +/* > + * Scan the entire superpage because several batches will fit into > + * a superpage, and it is unknown which pfn triggered the allocation. > + */ > +order = SUPERPAGE_1GB_SHIFT; > +pfn = min_pfn = (min_pfn >> order) << order; Scanning an entire superpage again and again looked expensive, but with the debug change below it turned out that the loop which peeks at each single bit in populated_pfns is likely not a bootleneck. Migrating a domU with a simple workload that touches pages to mark them dirty will set the min_pfn/max_pfn to a large range anyway after the first iteration. This large range may also happen with an idle domU. A small domU takes 78 seconds to migrate, and just the freeing part takes 1.4 seconds. Similar for a large domain, the loop takes 1% of the time. 78 seconds, 1.4 seconds, 2119 calls (8GB, 12*512M memdirty) 695 seconds, 7.6 seconds, 18076 calls (72GB, 12*5G memdirty) Olaf track time spent if decrease_reservation is needed diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 0fa0fbea4d..5ec8b6fee6 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -353,6 +353,9 @@ struct xc_sr_context struct xc_sr_bitmap attempted_1g; struct xc_sr_bitmap attempted_2m; struct xc_sr_bitmap allocated_pfns; + +unsigned long tv_nsec; +unsigned long iterations; } restore; }; } x86_hvm; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 8cd9289d1a..f6aad329e2 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -769,6 +769,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, { ctx.restore.ops = restore_ops_x86_hvm; if ( restore(&ctx) ) +; return -1; } else diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 2b0eca0c7c..11758b3f7d 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -1,5 +1,6 @@ #include #include +#include #include "xc_sr_common_x86.h" @@ -248,6 +249,12 @@ static int x86_hvm_stream_complete(struct xc_sr_context *ctx) static int x86_hvm_cleanup(struct xc_sr_context *ctx) { +xc_interface *xch = ctx->xch; +errno = 0; +PERROR("tv_nsec %lu.%lu iterations %lu", +ctx->x86_hvm.restore.tv_nsec / 10UL, +ctx->x86_hvm.restore.tv_nsec % 10UL, +ctx->x86_hvm.restore.iterations); free(ctx->x86_hvm.restore.context); xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_1g); xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_2m); @@ -440,6 +447,28 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) return rc; } +static void diff_timespec(struct xc_sr_context *ctx, const struct timespec *old, const struct timespec *new, struct timespec *diff) +{ +xc_interface *xch = ctx->xch; +if (new->tv_sec == old->tv_sec && new->tv_nsec == old->tv_nsec) +PERROR("%s: time did not move: %ld/%ld == %ld/%ld", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec); +if ( (new->tv_sec < old->tv_sec) || (new->tv_sec == old->tv_sec && new->tv_nsec < old->tv_nsec) ) +{ +PERROR("%s: time went backwards: %ld/%ld -> %ld/%ld", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec); +diff->tv_sec = diff->tv_nsec = 0; +return; +} +if ((new->tv_nsec - old->tv_nsec) < 0) { +diff->tv_sec = new->tv_sec - old->tv_sec - 1; +diff->tv_nsec = new->tv_nsec - old->tv_nsec + 10UL; +} else { +diff->tv_sec = new->tv_sec - old->tv_sec; +diff->tv_nsec = new->tv_nsec - old->tv_nsec; +} +if (diff->tv_sec < 0) +PERROR("%s: time diff broken. old: %ld/%ld new: %ld/%ld diff: %ld/%ld ", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec, diff->tv_sec, diff->tv_nsec); +} + static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, const xen_pfn_t *original_pfns, const uint32_t *types) @@ -448,6 +477,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0]; unsigned i, freed = 0, order; int rc = -1; +struct timespec a, b, d; for ( i = 0; i < count; ++i ) { @@ -474,6 +504,8 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, } } +if (clock_gettime(CLOCK_MONOTONIC, &a)) +PERROR("clock_gettime start"); /* * Scan the entire superpage because several batches will fit
[Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 25 +++- tools/libxc/xc_sr_restore.c | 75 +- tools/libxc/xc_sr_restore_x86_hvm.c | 288 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 4 files changed, 382 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index da2691ba79..0fa0fbea4d 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -free(mfns); - -return rc; -} - /* * Given a list of pfns, their types, a