Re: [PATCH v1] migration: Wait for cpr.sock file to appear before connecting

JAEHOON KIM Mon, 09 Jun 2025 07:54:26 -0700


On 6/9/2025 8:50 AM, Daniel P. Berrangé wrote:

On Mon, Jun 09, 2025 at 09:39:48AM -0400, Steven Sistare wrote:

On 6/9/2025 9:20 AM, Daniel P. Berrangé wrote:

On Mon, Jun 09, 2025 at 09:12:27AM -0400, Steven Sistare wrote:

On 6/9/2025 4:06 AM, Daniel P. Berrangé wrote:

On Fri, Jun 06, 2025 at 03:37:56PM -0400, Steven Sistare wrote:

The easiest solution, with no interface changes, is adding wait_for_socket() in 
qtest,
with this addition from Daniel:


     "With the busy wait you risk looping forever if the child (target) QEMU
     already exited for some reason without ever creating the socket. You
     can mitigate this by using 'kill($PID, 0)' in the loop and looking
     for -ERSCH, but this only works if you know the pid involved."

Daniel also suggested:
     "For the tests, passing a pre-opened UNIX socket FD could work"

Note we can not use any of the standard chardev options to specify such a 
socket,
because the cpr socket is created before chardevs are created.

Perhaps we could specify the fd in an extension of the MigrationChannel 
MigrationAddress.
{ 'union': 'MigrationAddress',
     'base': { 'transport' : 'MigrationAddressType'},
     'discriminator': 'transport',
     'data': {
       'socket': 'SocketAddress',
       'exec': 'MigrationExecCommand',
       'rdma': 'InetSocketAddress',
       'file': 'FileMigrationArgs',
       'fd':   'FdMigrationArgs' } }           <-- add this

That would be useful for all clients, but this is asking a lot from you,
when you are just trying to fix the tests.

Note, 'SocketAddress' already has an option for declaring a FD that
represents a socket.

Yes, but if I understand, you proposed passing an fd that represents a
pre-listened socket, which requires target qemu to accept() first.  The
existing FdSocketAddress is ready to read.  We could add a boolean to enable
the new behavior.

It can do both actually - it depends on what APIs the QEMU uses the
SocketAddress with.

If it is used with qio_channel_socket_connect* the FD must be an
active peer connection.

If it is used with qio_channel_socket_listen*/qio_net_listener* the
FD must be listener socket.

Fair enough.  cpr currently listens here, and we could add a case for the FD:

   QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)
   {
     MigrationAddress *addr = channel->addr;

     if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
         addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
         ...
         g_autoptr(QIONetListener) listener = qio_net_listener_new();

Or to use my socketpair() suggestion, that function would also need changes,
calling qio_channel_socket_connect.

Which do you think is better for clients -- socketpair or pre-listened?

Please just use the existing SocketAddress functionality, as that's used
throughout QEMU - a special case with socketpair for migration is not
desirable.

The SocketAddress stuff is what libvirt's used for many years now to
address the race condition with QMP listeners.

With regards,
Daniel


Dear Daniel and Steve,

Thank you both for your valuable insights.

To clarify regarding the socket handling approach:
If I do not use socketpair() and instead pass a pre-listened FD to the target, 
which then calls accept(),
it seems this could mitigate some race condition. However, isn't there still a 
risk that the old QEMU might try to
connect before the target QEMU calls accept(), thereby resulting in the same 
race condition?

If I only consider the qtest environment, it seems to me that waiting for the 
target to create cpr.sock inside qtest framework
- along with checking the process status {kill(pid,0)} - might be the most 
efficient way to handle this.

I checked and found that the QTestState structure already has a "pid_t 
qemu_pid" field,
so it should be straightforward to check the target's PID.

From your experience,
which approach do you consider the most effective to solve the current race 
condition issue?

I would appreciate your thoughts.

- Jaehoon Kim

Re: [PATCH v1] migration: Wait for cpr.sock file to appear before connecting

Reply via email to