On Thu, Nov 07, 2024 at 05:22:05PM +0530, Prasad Pandit wrote:
> On Wed, 6 Nov 2024 at 21:30, Peter Xu <pet...@redhat.com> wrote:
> > IIUC we can't simply change it to this.  We can do this with a compat
> > property and we can start sending a magic in the preempt channel, but then
> > the code still needs to keep working with old binaries of QEMU, so in all
> > cases we'll need to keep the old complexity for quite a while.
> 
> * I see...sigh.
> 
> > Handshake, in my mind, will use a totally separate path, then the hope is
> > we'll move to that with more machine types and finally obsolete / remove
> > this path.
> 
> * I need to check how 'separate path' works.

The plan is handshake will only happen on the main channel, and that
includes waiting all the channels to be established.  There, each channel
may need to have its own header, it could be a new handshake header that
whatever migration channel will start to establish, so it's named channels
and dest qemu handshake logic can "understand" which channel is which, and
properly assign those channels in the handshake.c logic, for example.

On src, now we kick off migration by migration_should_start_incoming()
returns true, only relying on "whether qemu have all the channels ready".
The handshake code can do more, so it checks more, then after all handshake
ready it can directly invoke migration_incoming_process() in the separate
path, to which stage it also needs to guarantee channel establishment.

We'll need to keep this path though for "if (!migrate_handshake())". 

> 
> > But note again that I still think your goal of enabling multifd + postcopy
> > may not require that new flag yet, simply because after 7.2 qemu will
> > connect preempt channel later than the main channel.  I think logically
> > QEMU can identify which channel is which: the preempt channel must be
> > established in this case after both main thread and multifd threads.
> 
> * You mean, instead of relying on magic bytes, we check if both 'main
> channel' and 'multifd channels' already exist, then the incoming
> connection is assumed to be for the 'postcopy preempt' channel?

Exactly.

So we keep the fact that we should only peek() if multifd is enabled at
least.  Then in your case even postcopy is also enabled, it won't connect
the preempt channel until very late (entry of postcopy_start()), and it'll
only connect if it receives a PONG first (migration_wait_main_channel()).
That means dest has all multifd+main channels ready otherwise no PONG will
generate.  This makes sure we'll never try to peek() on preempt channel
(which may hang) as long as it's always the latest.

I think 7.1 will keep working even if it behaves differently (it connects
preempt channel earlier, see preempt_pre_7_2), because the 1st requirement
in the new code (and also, in the old code) you will also only peek() if
multifd enabled first, while multifd cannot be enabled before with
postcopy/preempt or it was already a mess, so we can imagine old users only
enable either multifd or postcopy.

Thanks,

-- 
Peter Xu


Reply via email to