Hi,

On Mon, 3 Mar 2025 at 19:42, Peter Xu <pet...@redhat.com> wrote:
> On Mon, Mar 03, 2025 at 04:17:53PM +0530, Prasad Pandit wrote:
> > * I think we (you, me, Peter) are all looking at things differently.
> >     - In my view Patch-2 is the minimal change _required_  to enable
> > multifd & postcopy. In your view we are _revamping_ channel discovery
> > parts while _sneaking_ in a feature of enabling multifd & postcopy
> > together.
> >     - In my view Patch-5 in this series is an isolated change because
> > it adds a new migration command to allow multifd threads sync from
> > source side. But Peter thinks without that 'flush and sync' Patch-2 is
> > incomplete, so we should merge it back there.
>
> Just to mention, my suggestion does not conflict with splitting patch 2, as
> long as you keep every patch complete on its own.
>
> Patch 5 needs to be squashed to either patch 2 or a split patch out of
> patch 2, because current patch 2 (or any possible way to split it into
> smaller ones, then one of them which enables the feature) is buggy.

* I'll try to segregate different parts, then we can discuss how to
split them across patches:

Terminology:
    _requires_  => is without which migration shall not work at all.
    _essential_ => is without which there may be issues.

1. Enable Multifd and Postcopy together
    - It _requires_ removal of the Multifd capability check in
migration/options.c
    - It _requires_ identification of the CH_POSTCOPY connection when it arrives
        - so enum { CH_MAIN, CH_MULTIFD, CH_POSTCOPY } is defined
        - To identify channels, related changes are made to the
channel discovery part (if .. else conditionals) in
migration_ioc_process_incoming() function.
        - These changes help to establish all channel connections.
After that, the migration proceeds as usual until it's time to start
the Postcopy phase.
    - When time comes to start Postcopy phase, we shutdown multifd channels.
        - it _requires_ calling multifd_send_shutdown()
        - It _requires_ moving
file_cleanup_outgoing/socket_cleanup_outgoing calls to
migration_cleanup() function.
    - When Postcopy phase starts, we don't want ram_save_target
_page() function to call ram_save_multifd_page() function, because
migrate_multifd() is still true.
        - It _requires_ adding the !migration_in_postcopy() checks.

* Above changes are minimal _required_ to enable multifd and postcopy
together, while also ensuring that migration continues to work when
those options are not enabled together. With these changes, guest
migration across two machines works without any observed failures.

2. The multifd_ram_flush_and_sync() call/command and the
assert(!migration_in_postcopy()) call in multifd_recv_thread()
    - These calls help to ensure that no multifd data is left behind
when the Postcopy phase starts.
    - And that multifd_recv threads shall not be active when the
Postcopy is running.
    - They protect the guests from potential state corruption.

* It is up to us to decide whether (2) is _required_ OR _essential_
for the feature. Individual opinions can vary here.

3. Revamp of the channel discovery parts by moving those bits to
connection.c or other places.
    - This entails moving the channel discovery parts from
migration_ioc_process_incoming() function to somewhere else because it
makes more sense to move it there and maybe it reduces complexity and
makes the sources easier to understand.*

* It is up to us to decide whether (3) is _required_  OR  _essential_
for the feature. Individual opinions can vary here.

* IMHO (1) is _required_,  (2) is _essential_,  and (3) is neither
_required_ nor _essential_ for the - Enable multifd and postcopy
together - feature. (3) is a completely unrelated change to this
feature.

Since it is an individual opinion, we all can think differently here
and that is perfectly fine. Once we have some consensus, we can decide
how to split or merge patches and move forward.

Hope it helps. Thank you.
---
  - Prasad


Reply via email to