multifd: Bring back the 'ready' semaphore

Juan Quintela Thu, 19 Oct 2023 03:44:48 -0700

Fabiano Rosas <faro...@suse.de> wrote:
> Bring back the 'ready' semaphore, but this time make it per-channel,
> so that we can do true lockstep switching of MultiFDPages without
> taking the channel lock.
>
> Drop the channel lock as it now becomes useless. The rules for
> accessing the MultiFDSendParams are:
>
> - between sem and sem_done/ready if we're the channel
>
>     qemu_sem_post(&p->ready);
>     qemu_sem_wait(&p->sem);
>     <owns p>
>     qemu_sem_post(&p->sem_done);
>
> - between ready and sem if we're not the channel
>
>     qemu_sem_wait(&p->ready);
>     <owns p>
>     qemu_sem_post(&p->sem);
>
> Signed-off-by: Fabiano Rosas <faro...@suse.de>
> ---
> One issue I can see with this is that we might now have to wait at
> multifd_send_pages() if a channel takes too long to send it's packet
> and come back to p->ready. We would need to find a way of ignoring a
> slow channel and skipping ahead to the next one in line.


See my 1st patch in the series.  That is exactly what the channels_ready
sem does.  In your network is faster than your multifd channels can
write, main thread never waits on channels_ready, because it is always
positive.

In case that there are no channels ready, we wait in the sem instead of
being busy waiting.

And searching for the channel that is ready is really fast:

static int multifd_send_pages(QEMUFile *f)
{
    [...]

    qemu_sem_wait(&multifd_send_state->channels_ready);
    // taking a sem that is positive is basically 1 instruction

    next_channel %= migrate_multifd_channels();
    // we do crude load balancing here.
    for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
        qemu_mutex_lock(&p->mutex);  // 1 lock instruction
        if (p->quit) {               // 1 check
            ....
        }
        if (!p->pending_job) {       // 1 check
            p->pending_job++;
            next_channel = (i + 1) % migrate_multifd_channels();
            break;
        }
        qemu_mutex_unlock(&p->mutex);  // 1 unlock (another instruction)
    }
    ....

So checking a channel is:
- taking a mutex that is almost always free.
- 2 checks
- unlock the mutex

So I can't really see the need to optimize this.

Notice that your scheme only has real advantanges if:
- all channels are busy
- the channel that becomes ready is just the previous one to
  next_channel or so

And in that case, I think that the solution is to have more channels or
faster networking.

Later, Juan.

Re: [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore

Reply via email to