On Mon, Oct 03, 2011 at 10:00:48AM -0500, Anthony Liguori wrote: > On 10/03/2011 09:41 AM, Michael S. Tsirkin wrote: > >On Mon, Oct 03, 2011 at 08:51:10AM -0500, Anthony Liguori wrote: > >>On 10/03/2011 08:38 AM, Michael S. Tsirkin wrote: > >>>On Mon, Oct 03, 2011 at 07:55:48AM -0500, Anthony Liguori wrote: > >>>>On 10/02/2011 04:08 PM, Michael S. Tsirkin wrote: > >>>>>On Sun, Oct 02, 2011 at 04:21:47PM -0400, Stefan Berger wrote: > >>>>>> > >>>>>>>4) Implement the BERVisitor and make this the default migration > >>>>>>>protocol. > >>>>>>> > >>>>>>>Most of the work will be in 1), though with the implementation in this > >>>>>>>series we should be able to do it incrementally. I'm not sure if the > >>>>>>>best approach is doing the mechanical phase 1 conversion, then doing > >>>>>>>phase 2 sometime after 4), doing phase 1 + 2 as part of 1), or just > >>>>>>>doing VMState conversions which gives basically the same capabilities > >>>>>>>as phase 1 + 2. > >>>>>>> > >>>>>>>Thoughts? > >>>>>>Is anyone working on this? If not I may give it a shot (tomorrow++) > >>>>>>for at least some of the primitives... for enabling vNVRAM metadata > >>>>>>of course. Indefinite length encoding of constructed data types I > >>>>>>suppose won't be used otherwise the visitor interface seems wrong > >>>>>>for parsing and skipping of extra data towards the end of a > >>>>>>structure if version n wrote the stream and appended some of its > >>>>>>version n data and now version m< n is trying to read the struct > >>>>>>and needs to skip the version [m+1, n ] data fields ... in that case > >>>>>>the de-serialization of the stream should probably be stream-driven > >>>>>>rather than structure-driven. > >>>>>> > >>>>>> Stefan > >>>>> > >>>>>Yes I've been struggling with that exactly. > >>>>>Anthony, any thoughts? > >>>> > >>>>It just depends on how you write your visitor. If you used > >>>>sequences, you'd probably do something like this: > >>>> > >>>>start_struct -> > >>>> check for sequence tag, push starting offset and size onto stack > >>>> increment offset to next tag > >>>> > >>>>type_int (et al) -> > >>>> check for explicit type, parse data > >>>> increment offset to next tag > >>>> > >>>>end_struct -> > >>>> pop starting offset and size to temp variables > >>>> set offset to starting offset + size > >>>> > >>>>This is roughly how the QMP input marshaller works FWIW. > >>>> > >>>>Regards, > >>>> > >>>>Anthony Liguori > >>> > >>>One thing I worry about is enabling zero copy for > >>>large string types (e.g. memory migration). > >> > >>Memory shouldn't be done through Visitors. It should be handled as a > >>special case. > > > >OK, that's fine then. > > > >>>So we need to be able to see a tag for memory page + address, > >>>read that from socket directly at the correct virtual address. > >>> > >>>Probably, we can avoid using visitors for memory, and hope > >>>everything else can stand an extra copy since it's small. > >>> > >>>But then, why do we worry about the size of > >>>encoded device state as Anthony seems to do? > >> > >>There's a significant difference between the cost of something on > >>the wire and the cost of doing a memcpy. The cost of the data on > >>the wire is directly proportional to downtime. So if we increase > >>the size of the device state by a factor of 10, we increase the > >>minimum downtime by a factor of 10. > >> > >>Of course, *if* the size of device state is already negligible with > >>respect to the minimum downtime, then it doesn't matter. This is > >>easy to quantify though. For a normal migration session today, > >>what's the total size of the device state in relation to the > >>calculated bandwidth of the minimum downtime? > >> > >>If it's very small, then we can add names and not worry about it. > >> > >>Regards, > >> > >>Anthony Liguori > > > >Yes, it's easy to quantify. I think the following gives us > >the offset before and after, so the difference is the size > >we seek, right? > > Yeah, you'll also want: > > diff --git a/arch_init.c b/arch_init.c > index a6c69c7..0d64200 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -334,6 +334,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, > voi > > expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; > > + if (stage == 2 && expected_time <= migrate_max_downtime()) { > + fprintf(stderr, "max bwidth: %lld\n", (long)(expected_time * > bwidth)); > + } > + > return (stage == 2) && (expected_time <= migrate_max_downtime()); > } > > You'll want to compare the size to max bwidth.
Well that depends on how guest behaves etc. I'm guessing just full memory size is a sane thing to compare against. I don't have a problem sticking this fprintf in as well. > BTW, putting this info properly into migration stats would probably > be pretty useful. > > Regards, > > Anthony Liguori Problem is adding anything to monitor makes me worry about future compatibility so much I usually just give up. IMO we really need a namespace for in-development experimental commands, like "unsupported-XXX", this would belong. -- MST