On Fri, Jul 21, 2017 at 04:41:58PM +0200, Igor Mammedov wrote: > On Wed, 19 Jul 2017 18:52:56 +0300 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote: > > > On Wed, 19 Jul 2017 12:46:13 +0100 > > > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: > > > > > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > > > On Wed, 19 Jul 2017 23:17:32 +0800 > > > > > Peng Hao <peng.h...@zte.com.cn> wrote: > > > > > > > > > > > When a guest that has several hotplugged dimms is migrated, in > > > > > > destination host it will fail to resume. Because vhost regions of > > > > > > several dimms in source host are merged and in the restore stage > > > > > > in destination host it computes whether more than vhost slot limit > > > > > > before merging vhost regions of several dimms. > > > > > could you provide a bit more detailed description of the problem > > > > > including command line+used device_add commands on source and > > > > > command line on destination? > > > > > > > > (ccing in Marc Andre and Maxime) > > > > > > > > Hmm, I'd like to understade the situation where you get merging between > > > > RAMBlocks; that complicates some stuff for postcopy. > > > and probably inconsistent merging breaks vhost as well > > > > > > merging might happen if regions are adjacent or overlap > > > but for that to happen merged regions must have equal > > > distance between their GPA:HVA pairs, so that following > > > translation would work: > > > > > > if gva in regionX[gva_start, len, hva_start] > > > hva = hva_start + gva - gva_start > > > > > > while GVA of regions is under QEMU control and deterministic > > > HVA is not, so in migration case merging might happen on source > > > side but not on destination, resulting in different memory maps. > > > > > > Maybe Michael might know details why migration works in vhost usecase, > > > but I don't see vhost sending any vmstate data. > > > > We aren't merging ramblocks at all. > > When we are passing blocks A and B to vhost, if we see that > > > > hvaB=hvaA + lenA > > gpaB=gpaA + lenA > > > > then we can improve performance a bit by passing a single > > chunk to vhost: hvaA,gpaA,lena+lenB > kernel used to maintain flat array map for look up where > such optimization could give some benefit which is negligible > as in practice merging reduces array size only by ~5 entries. > > In addition kernel backend has been converted to interval tree > as flat array doesn't scale, so merging doesn't really matters > there anymore.
In my opinion not merging slots is an obvious waste - I think there were patches that added a cache and that showed some promise. cache will be more effective if regions are bigger. > If we can get rid of merging on QEMU side, resulting memory > map will become of the same size regardless of the order > in which entries are added or chancy random allocation > that could allow region merging (i.e. size will become > deterministic). It seems somehow wrong to avoid doing (even minor) optimizations just to make error handling simpler. > Looking at vhost_user_set_mem_table() it sends actual number of > entries to backend over the wire, so it shouldn't break backend > if it were written right (i.e. uses msg.payload.memory.nregions > instead of VHOST_MEMORY_MAX_NREGIONS from QEMU.), if it breaks > then it's backend's fault and it should be fixed. > > Another thing that could break is too low limit > VHOST_MEMORY_MAX_NREGIONS = 8 > and QEMU started with default options takes upto 7 entries in map > unmerged, so any configuration that consumes additional slots won't > start after upgrade. We could counter the most of issues by rising > VHOST_MEMORY_MAX_NREGIONS limit and/or teaching vhost-user protocol > to fetch limit from backend similar to vhost_kernel_memslots_limit(). I absolutely agree we should fix vhost-user to raise the slot limit, along the lines you suggest. Care looking into it? > > > so it does not affect migration normally. > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Peng Hao <peng.h...@zte.com.cn> > > > > > > Signed-off-by: Wang Yechao <wang.yechao...@zte.com.cn> > > > > > > --- > > > > > > hw/mem/pc-dimm.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c > > > > > > index ea67b46..bb0fa08 100644 > > > > > > --- a/hw/mem/pc-dimm.c > > > > > > +++ b/hw/mem/pc-dimm.c > > > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, > > > > > > MemoryHotplugState *hpms, > > > > > > goto out; > > > > > > } > > > > > > > > > > > > - if (!vhost_has_free_slot()) { > > > > > > + if (!vhost_has_free_slot() && runstate_is_running()) { > > > > > > error_setg(&local_err, "a used vhost backend has no free" > > > > > > " memory slots left"); > > > > > > goto out; > > > > > > > > Even this produces the wrong error message in this case, > > > > it also makes me think if the existing code should undo a lot of > > > > the object_property_set's that happen. > > > > > > > > Dave > > > > > > > > > > > > > > -- > > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK