On Thu, Nov 13, 2025 at 12:46:55PM -0500, Michael S. Tsirkin wrote:
> failing to start a perfectly good qemu which used to work
> because you changed kernels is better than failing to migrate how?
> 

I agree this is not pretty.

The very original proposal was having extra features to be OFF by default,
only allow explicit selections to enable them when the mgmt / user is aware
of the possible hosts to run on top.  That'll guarantee:

(1) explicit failure whenever some unsupported cap is chosen on boot,

(2) default setup should always assume no kernel dependency hence booting
should be all fine,

(3) since all features will be by default OFF or selected by the user with
explicit cmdlines, VM ABI is guaranteed so that migration will work.

But unfortunately that proposal was rejected.

> 
> graceful downgrade with old kernels is the basics of good userspace
> behaviour and has been for decades.
> 
> 
> sure, let's work on a solution, just erroring out is more about blaming
> the user. what is the user supposed to do when qemu fails to start?

This is indeed a good question.  If with strict checks maybe we would at
least want to make sure we throw explicit messages to let user know what to
turn off.

> 
> 
> first, formulate what exactly do you want to enable.
> 
> 
> 
> for example, you have a set of boxes and you want a set of flags
> to supply to guarantee qemu can migrate between them. is that it?

Yes I think that's the case.

That's also why I think the very original proposal still makes sense
(having all defaults OFF when dependent on kernel), because only the mgmt
knows the details about the cluster, so it may make more sense to select
from the top which has the full knowledge base, explicitly enable some sets
of features (not only network, but also CPU feature bits and else).  Then
the mgmt boots the VM, also knows where it can migrate explicitly.

If all things are hidden then the mgmt is almost out of control of this.

That was rejected because there's the need to by default enable new
features if ever possible.  In that case, IMHO Jason's soluion is spot on
where it sits in the middle ground of both, allowing both to happen
(auto-enable of new feats, while keeping VM ABI stablility).

So IIUC there will be a cluster, it may contain different groups of hosts,
each group should have similar setups so that VMs can freely migrate
between each other within the same group (but may not easily migratable
across groups?).  But I don't think I know well on that part in practise.

Dan might be a great source of input from that level.

Thanks,

-- 
Peter Xu


Reply via email to