Re: Configuring migration

Juan Quintela Tue, 14 Nov 2023 02:21:44 -0800

Markus Armbruster <arm...@redhat.com> wrote:
D> Cc: Paolo for QOM expertise.
>
> Peter Xu <pet...@redhat.com> writes:
>
>> On Thu, Nov 02, 2023 at 03:25:25PM +0100, Markus Armbruster wrote:
>
> [...]
>
>>> Migration has its own idiosyncratic configuration interface, even though
>>> its configuration needs are not special at all.  This is due to a long
>>> history of decisions that made sense at the time.
>>> 
>>> What kind of interface would we choose if we could start over now?
>>> 
>>> Let's have a look at what I consider the two most complex piece of
>>> configuration to date, namely block backends and QOM objects.
>>> 
>>> In both cases, configuration is a QAPI object type: BlockdevOptions and
>>> ObjectOptions.
>>> 
>>> The common members are the configuration common to all block backends /
>>> objects.  One of them is the type of block backend ("driver" in block
>>> parlance) or QOM object ("qom-type").
>>> 
>>> A type's variant members are the configuration specific to that type.
>>> 
>>> This is suitably expressive.
>>> 
>>> We create a state object for a given configuration object with
>>> blockdev-add / object-add.
>>> 
>>> For block devices, we even have a way to modify a state object's
>>> configuration: blockdev-reopen.  For QOM objects, there's qom-set, but I
>>> don't expect that to work in the general case.  Where "not work" can
>>> range from "does nothing" to "explodes".
>>> 
>>> Now let's try to apply this to migration.
>>> 
>>> As long as we can have just one migration, we need just one QAPI object
>>> to configure it.
>>> 
>>> We could create the object with -object / object_add.  For convenience,
>>> we'd probably want to create one with default configuration
>>> automatically on demand.
>>> 
>>> We could use qom-set to change configuration.  If we're not comfortable
>>> with using qom-set for production, we could do something like
>>> blockdev-reopen instead.
>>> 
>>> Could we move towards such a design?  Turn the existing ad hoc interface
>>> into compatibility sugar for it?
>>
>> Sounds doable to me.
>>
>> I'm not familiar with BlockdevOptions, it looks like something setup once
>> and for all for all relevant parameters need to be set in the same request?
>
> Yes, but you can "reopen", which replaces the entire configuration.
>
> blockdev-add creates a new block backend device, and blockdev-reopen
> reopens a set of existing ones.  Both take the same arguments for each
> device.
>
>> Migration will require each cap/parameter to be set separately anytime,
>> e.g., the user can adjust downtime / bandwidth even during migration in
>> progress.
>
> "Replace entire configuration" isn't a good fit then, because users
> would have to repeat the entire configuration just to tweak one thing.


We have two types of parameters:
- multifd_channels type: we need to set them before the migration start
- downtime: We can change it at any time, even during migration

So I think we need two types of parameters.
For the parameters that one can't change during migration, it could be a
good idea to set all of them at once, i.e.

(qemu) migration_set multifd on multifd-channels 6 multifd_compression none

Whatever syntax we see fit.  That would make it easier to:
- document what parameters make sense together (they need to be set at
  the same time)
- convert 
migrate_params_check()/migrate_params_test_apply()/migrate_params_apply()
  into a sane interface.
- this parameters don't need to be atomic, they are set by definition by
  the main thread before the migration starts.  The other parameters
  needs to be atomic.

>> Making all caps/parameters QOM objects, or one object containing both
>> attributes, sounds like a good fit.  object_property_* APIs allows setters,
>> I think that's good enough for migration to trigger whatever needed (e.g.
>> migration_rate_set() updates after bandwidth modifications).
>>
>> We can convert e.g. qmp set parameters into a loop of setting each
>> property, it'll be slightly slower because we'll need to do sanity check
>> for each property after each change, but that shouldn't be a hot path
>> anyway so seems fine.
>
> I figure doing initial configuration in one command is convenient.  The
> obvious existing command for that is object-add.
>
> The obvious interface for modifying configuration is a command to change
> just one parameter.  The obvious existing command for that is qom-set.

As said before, I think we need two commands:

- migrate_set_method method [list of method arguments with values]
  Values that need to be set before migration

- migrate_set_parameter parameter value
  Values that can be changed at any time

> Problem: qom-set is a death trap in general.  It can modify any QOM
> property with a setter, and we test basically none of them.  Using it
> for modifying migration configuration would signal it's okay to use
> elsewhere, too.  I'm not sure we want to send that message.  Maybe we
> want to do the opposite, and make it an unstable interface.
>
> Aside: I believe the root problem is our failure to tie "can write" to
> the object's state.  Just because a property can be set with object-add
> doesn't mean it can be validly changed at any time during the object's
> life.

Yeap.

> Problem: when several parameters together have to satisfy constraints,
> going from one valid configuration to another valid configuration may
> require changing several parameters at once, or else go through invalid
> intermediate configurations.

There are other things that "currently" we are not considering and that
make things really strange.

(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_parameter xbzrle_cache $foo
(qemu) migrate $bar

migration fails for whatever reason

we try again with another method, let's say multifd

(qemu) migrate_set_capability multifd on
(qemu) migrate_set_parameter multifd-channels $foo2
(qemu) migrate $bar2

At this point, xbrzrle_cache is still set, but it makes no sense.  So I
think that if we change the interface, the migrate_set_method that I
suggested would just clean-up all values to its defaults.


> This problem is not at all specific to the migration object.
>
> One solution is careful design to ensure that there's always a sequence
> of transitions through valid configuration.  Can become complicated as
> configuration evolves.  Possible even impractical or impossible.

This is a nightmare.  See migrate_params_check().  Basically everytime
that we add a capability/parameter we need to go through all the list to
see if anything is compatible/incompatible.  It is much better that we
set all that kind of parameters at once, so this is much easier to understand.

> Another solution is a command to modify multiple parameters together,
> leaving alone the others (unlike blockdev-reopen, which overwrites all
> of them).

I still think we need both methods because we have the two kinds of
parameters.

Actually, it is even worse than that, because there are parameters that
need to be set before the migration start but that are not related to
the migration method at all.



>> It'l still be a pity that we still cannot reduce the triplications of qapi
>> docs immediately even with that.  But with that, it seems doable if we will
>> obsolete QMP migrate-set-parameters after we can do QOM-set.

Trying to be practical, this are the capabilities:

{ 'enum': 'MigrationCapability',
  'data': [

'xbzrle'  <- how we compress, but it is used on top of anything else
             except multifd/rdma
'rdma-pin-all' <- this is a rdma parameter, but at the time there were
             not parameters, so here we are.

'auto-converge' <- this is obsolete.  Dirty limit features are better
                   for this.  But this again is independent of anything
                   else.  I will have to double check to see if it can
                   be set at any moment.
'zero-blocks' <- only for storage migration, clearly a parameter.

{ 'name': 'compress', 'features': [ 'deprecated' ] } <- migration
           compression method
'events' <- this should be default and make it a nop for backwards
           compatibility.  I think libvirt sets it always.

'postcopy-ram' <- we want to do postcopy.  I don't even remember why
                  this is a capability when we have to issue a command
                  to start it anyways

{ 'name': 'x-colo', 'features': [ 'unstable' ] }, <- colo is
                  experimental, not even enter here.

'release-ram' <- This only makes sense if:
                 * we are in postcopy
                 * we are migration with the guest stopped (and not sure
                   if in this case it even works)

{ 'name': 'block', 'features': [ 'deprecated' ] } <- deprecated, don't
           spent time here.

'return-path' <-  basically everything except exec should have this.
                  this is way from destination to give us back errors.
                  exec basically is not able to give any error.

'pause-before-switchover' <-  This is independent of anything else, but
                              it makes sense to require to set it before
                              migration start.  Used when you need to
                              "lock" things, like block devices to a
                              single user.  Think iscsi block devices
                              that can only be enabled at the same time
                              on source or destination, but not both.

'multifd' <- migration method

'dirty-bitmaps' <- block migration is weird.  I don't remember this one.


'postcopy-blocktime' <- don't even remember

'late-block-activate' <- I think this is somehow related to
                         pause-before-switchover, but I don't ever
                         remember the details.  I guess we did it wrong
                         the fist time.

{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] } <- we need to
           mark this stable.  Basically means that:
           * we are migratin on the same machine
           * RAM is mmaped in source/destination shared
           * so we don't need to migrate it.

'validate-uuid' <- making sure that we are migration to the right
                   destination.  Independent of anything else.  Should
                   be set before migration starts.

'background-snapshot' <- block devices are big and slow.  Migrating them
           is complilaceed.  Forgot about them.

'zero-copy-send' <- only valid for multifd, should be a paramter

'postcopy-preempt' <-  We create a new channel for postcopy, depends on
                       postcopy and needs to be set before migration starts.

'switchover-ack' <- Independent of anything else.  Needs to be set
           before migration starts.

'dirty-limit' <- Autoconverge was bad.  We create another way of doing
                 the same functionality.


And now we go with migration_parametres:

'announce-initial', 'announce-max',
'announce-rounds', 'announce-step',
        I think we should set all of them at the same time.
        Before migration ends.  This is SDN for you, sometimes we need
        to repeat the ARP packets to get they passed through routers.
        Hello openstack.


{ 'name': 'compress-level', 'features': [ 'deprecated' ] },
{ 'name': 'compress-threads', 'features': [ 'deprecated' ] },
{ 'name': 'decompress-threads', 'features': [ 'deprecated' ] },
{ 'name': 'compress-wait-thread', 'features': [ 'deprecated'] },

        Parameters for old compression method.  Should be set before we start.

'throttle-trigger-threshold',
'cpu-throttle-initial',
'cpu-throttle-increment',
'cpu-throttle-tailslow',
'max-cpu-throttle'
        Autoconverge parameters.  Make sense to change them after we start.
        Remember that autoconverge was a good idea and a bad
        implementation.

'tls-creds', 'tls-hostname', 'tls-authz',

        They are only needed if we are using TLS.  But we only need TLS
        depending on the URI we are using (tcp+tls).

'max-bandwidth',
        Can be changed at any time.

'avail-switchover-bandwidth',
        Can be changed at any time.  Better explained, it shouldn't be
        needed.  But we know that it is needed after migration starts,
        so ...

'downtime-limit',
        Can be changed at any point.

{ 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
        Colo parameter.  Needs to be set before migration starts.

{ 'name': 'block-incremental', 'features': [ 'deprecated' ] },
        deprecated, but needs to be set before migration starts.

'multifd-channels',
        Needs to be set before migration starts.

'xbzrle-cache-size'
        Needs to be set before migration starts.
        It *could* be changed after migration starts, but ...

'max-postcopy-bandwidth'
        When we are in postcopy stage, reasonable thing to do is to use
        all available bandwidth.  When that is not true, use this parameter.


'multifd-compression'
'multifd-zlib-level'
'multifd-zstd-level'
        multifd compression parameters.  Depending of
        multifd-compression value, the others make sense or not.

'block-bitmap-mapping',
        I don't understand/remember this new block migration, so I can't 
comment.

{ 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
        Needs to be set before migration starts.
'vcpu-dirty-limit',
        One could defend to change it after migration starts, but ....

'mode'
        set before migration starts


So you can see that we have lots of parameters to try to make sense of.

Later, Juan.

Re: Configuring migration

Reply via email to