Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Olaf Hering wrote: > I think with the new check of max_pages an overallocation can not happen > anymore. If at some point the domU still has room for a superpage, it > will be allocated. In case the batch does not fully fill the superpage, > the holes will be freed. In the next batch no superpage can be allocated > anymore, but single pages will be used. There is one case where Over-allocation will happen: assume x86_hvm_populate_pfns gets a batch of pfns that fit trigger the allocation of a 1G page. All pfns will fit into that partly populated superpage. Then the guest has a hole right after the max_pfn of that batch. The next batch will start in a new superpage. As a result the freeing part of x86_hvm_populate_pfns will not consider the previous superpage anymore. Now 512MB are allocated, but unpopulated. To handle this case the min_pfn/max_pfn have to be global so that the current batch can free allocated pfns from previous batches. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Wei Liu wrote: > Maybe a middle ground is to scan the batch to see if pfns can be fit > into a whole super page? I don't think you can get a batch as big as 1G > but there should be a lot of 2M batches? I think with the new check of max_pages an overallocation can not happen anymore. If at some point the domU still has room for a superpage, it will be allocated. In case the batch does not fully fill the superpage, the holes will be freed. In the next batch no superpage can be allocated anymore, but single pages will be used. This punching of holes might be inefficent, the win is the usage of superpages in case of contiguous pfns. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, 2017 at 02:51:01PM +0200, Olaf Hering wrote: > On Fri, Aug 25, Wei Liu wrote: > > > I'm still unconvinced this works all the time because it still needs the > > assumption that the stream contains contiguous pfns. > > This is how it is done today. If the pfns start to arrive in another > order the format has to be changed to send a memory layout in advance. > Maybe a middle ground is to scan the batch to see if pfns can be fit into a whole super page? I don't think you can get a batch as big as 1G but there should be a lot of 2M batches? > I will check if some sort of retry logic can be added. > This would be useful too. > > Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Wei Liu wrote: > I'm still unconvinced this works all the time because it still needs the > assumption that the stream contains contiguous pfns. This is how it is done today. If the pfns start to arrive in another order the format has to be changed to send a memory layout in advance. I will check if some sort of retry logic can be added. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Thu, Aug 24, 2017 at 12:14:43PM +0200, Olaf Hering wrote: > During creating of a HVM domU meminit_hvm() tries to map superpages. > After save/restore or migration this mapping is lost, everything is > allocated in single pages. This causes a performance degradition after > migration. > > Add neccessary code to preallocate a superpage for the chunk of pfns > that is received. In case a pfn was not populated on the sending side it > must be freed on the receiving side to avoid over-allocation. > > The existing code for x86_pv is moved unmodified into its own file. > [...] > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > + const xen_pfn_t *original_pfns, > + const uint32_t *types) > +{ > +xc_interface *xch = ctx->xch; > +xen_pfn_t min_pfn = original_pfns[0], max_pfn = original_pfns[0]; > +unsigned i; > +int rc = -1; > + > +for ( i = 0; i < count; ++i ) > +{ > +if ( original_pfns[i] < min_pfn ) > +min_pfn = original_pfns[i]; > +if ( original_pfns[i] > max_pfn ) > +max_pfn = original_pfns[i]; > +if ( (types[i] != XEN_DOMCTL_PFINFO_XTAB && > + types[i] != XEN_DOMCTL_PFINFO_BROKEN) && > + !pfn_is_populated(ctx, original_pfns[i]) ) > +{ > +rc = x86_hvm_allocate_pfn(ctx, original_pfns[i]); > +if ( rc ) > +goto err; > +rc = pfn_set_populated(ctx, original_pfns[i]); > +if ( rc ) > +goto err; > +} > +} > + > +while ( min_pfn < max_pfn ) > +{ > +if ( !xc_sr_bitmap_resize(>x86_hvm.restore.allocated_pfns, > min_pfn) ) > +{ > +PERROR("Failed to realloc allocated_pfns %" PRI_xen_pfn, > min_pfn); > +goto err; > +} > +if ( !pfn_is_populated(ctx, min_pfn) && > +xc_sr_test_and_clear(min_pfn, > >x86_hvm.restore.allocated_pfns) ) { > +xen_pfn_t pfn = min_pfn; > +rc = xc_domain_decrease_reservation_exact(xch, ctx->domid, 1, 0, > ); > +if ( rc ) > +{ > +PERROR("Failed to release pfn %" PRI_xen_pfn, min_pfn); > +goto err; > +} > +ctx->restore.tot_pages--; > +} > +min_pfn++; > +} > + I'm still unconvinced this works all the time because it still needs the assumption that the stream contains contiguous pfns. It works now probably because the mmio hole is placed right under 1G and you preemptively avoid allocating 1G page for the first GB. Suppose we have the following memory layout: [0...A) [mmio_start...B) [B...C] ^1G here A-0 < 1G B-C < 1G The guest can use up to 1G ram And then you receive the following batch of pfns from remote B,A-1 When B arrives, a 1G page is thus allocated, using up all guest's allowance. Decreasing reservation happens only after the batch has been processed, which means A will trigger over-allocation even on a 4K page. Then there is no fallback in x86_hvm_allocate_pfn, which will cause migration to fail. Another less contrived example, without expanding the mmio hole beyond 1G: [0...A) [mmio_start...1G) [1G...C...2G...D...E] E < 2G The guest can use up to 2G ram And then the batch of pfns from remote: C,D,A-1 So maybe you need to adjust the location of decrease reservation? Or do I misread your code? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel