* Stefan Hajnoczi (stefa...@redhat.com) wrote: > On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote: > > * Stefan Hajnoczi (stefa...@redhat.com) wrote: > > > Orchestrating Migrations > > > ------------------------ > > > In order to migrate a device a *migration parameter list* must first be > > > built > > > on the source. Each migration parameter is added to the list if it is in > > > effect. For example, the migration parameter list for a device with > > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature > > > migration > > > parameter was introduced with the off value disabling its effect. > > > > What component builds that list (i.e. what component needs to know the > > history that new-feature=off was the default - ah I think you answer > > that below). > > Yep. Thanks for noting this. I'll need to reorder things so it is clear. > > > > The following conditions must be met to establish migration compatibility: > > > > > > 1. The source and destination device model strings match. > > > > > > 2. Each migration parameter name from the migration parameter list is > > > supported > > > by the destination. For example, the destination supports the > > > num-queues > > > migration parameter. > > > > > > 3. Each migration parameter value from the migration parameter list is > > > supported by the destination. For example, the destination supports > > > num-queues=4. > > > > Hmm, are combinations of parameter checks needed - i.e. is it possible > > that a destination supports num-queues=4 and new-feature=on/off - > > but only supports new-feature=on when num-queues>2 ? > > Yes, it's possible but cannot be expressed in the migration info JSON. > > We need to choose a level of expressiveness that will be useful enough > without being complex. In the extreme the migration info would contain > Turing complete validation expressions (e.g. JavaScript) so that any > relationship can be expressed, but I doubt that complexity is needed. > The other extreme is just booleans and (opaque) strings for maximum > simplicity. > > If the syntax is not expressive enough then it's impossible to check > migration compatibility without actually creating a new device instance > on the destination. Daniel Berrange raised the requirement of checking > migration compatibility without creating the device since this helps > with selecting a migration destination.
Right, but my worry isn't the JSON description, it's the set of 3 conditions above; they need to state that only some combinations need to be valid. > > > > The migration compatibility check can be performed without initiating a > > > migration. Therefore, this process can be used to select the migration > > > destination. > > > > > > The following steps perform the migration: > > > > > > 1. Configure the destination so it is prepared to load the device state, > > > including applying the migration parameter list. This may involve > > > instantiating a new device instance or resetting an existing device > > > instance > > > to a configuration that is compatible with the source. > > > > > > The details of how to do this for VFIO/mdev drivers and vfio-user > > > device > > > backend programs is described below. > > > > > > 2. Save the device state on the source and load it on the destination. > > > > Which is true for almost everything, unles sit turned out to have > > significant amounts of RAM on board; do we have a way to deal with that > > for vfio/vhost-user - where it needs to be iterative? (Lets just ignore > > this for now) > > Step 2 includes iterative migration. I should have mentioned that in the > document. OK. > > > "allowed_values" > > > The list all values that the device implementation accepts for this > > > migration > > > parameter. Integer ranges can be described using "<min>-<max>" strings. > > > > > > Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], > > > [true] > > > > > > This member is optional. When absent, any value suitable for the type > > > may be > > > given but the device implementation may refuse certain values. > > > > JSON isn't a great choice for specifying ranges of integers > > Agreed :) > > > > The device is instantiated by launching the destination process with the > > > migration parameter list from the source: > > > > > > .. code:: bash > > > > > > $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...] > > > > > > This example shows how to instantiate the device with migration parameters > > > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param> > > > <value>`` option formats are accepted. > > > > > > The ``--m-`` prefix is used to allow the device emulation program to > > > implement > > > device implementation-specific command-line options without conflicting > > > with > > > the migration parameter namespace. > > > > That feels like an odd syntax to me. > > Unfortunately we cannot use --<param>. I also considered using a JSON > input file but that makes it harder to invoke the device emulation > program manually for testing/development. I bet I'd have to look up the > JSON syntax every time whereas it's easy to remember how to format a > command-line parameter. > > The other one I considered was using '--' or another marker to separate > device implementation-specific command-line arguments from migration > parameters. However, doing so places requirements on the device > emulation program's command-line parsing library and I think people will > be unhappy if their favorite Go, Rust, Python, etc library cannot handle > the command-line options due to our weird syntax. > > Any ideas for a better syntax? I'd be happy with a --param name=value repeatedly, but also know that some option parsers don't like that. > > > When preparing for migration on the source, each migration parameter from > > > the > > > migration info JSON is added to the migration parameter list if its value > > > differs from "off_value". If a migration parameter in the list is not > > > available > > > on the destination, then migration is not possible. If a migration > > > parameter > > > value is not in the destination "allowed_values" migration_info.json then > > > migration is not possible. > > > > > > On the destination, a command-line is generated from the migration > > > parameter > > > list. For each destination migration parameter missing from the migration > > > parameter list a command-line option is added with the destination > > > "off_value". > > > The device emulation program prints an error message to standard error and > > > terminates with exit status 1 if the device could not be instantiated. > > > > I still don't think this revision answers the question of how a VM > > management program picks a sane set of parameter values for a new VM > > it's creating, especially if it wants it to be migratable. That's > > something your version stuff in V1 seemed nice for. > > Good point. If we're creating a VM and expect to migrate between two > device implementations, how do we choose the migration parameters? > > I can see a solution for that: grab the set of "init_values" from both > device implementations and use the one that both accept. This is O(N^2) > so it's not great when there are many device implementations involved. > It's O(N) with version numbers because you can keep an intersection set > of supported version numbers. Which is actually more complex if there's only some combinations that work. > This point definitely needs to be included in the document. Is my answer > acceptable or do you think versions are really needed? > > It's also hard to answer "which of these two migration parameter lists > is better/more modern?" without versions when non-bool migration > parameters are involved. Dave > Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK