Fabiano Rosas <faro...@suse.de> wrote: > Bring back the 'ready' semaphore, but this time make it per-channel, > so that we can do true lockstep switching of MultiFDPages without > taking the channel lock. > > Drop the channel lock as it now becomes useless. The rules for > accessing the MultiFDSendParams are: > > - between sem and sem_done/ready if we're the channel > > qemu_sem_post(&p->ready); > qemu_sem_wait(&p->sem); > <owns p> > qemu_sem_post(&p->sem_done); > > - between ready and sem if we're not the channel > > qemu_sem_wait(&p->ready); > <owns p> > qemu_sem_post(&p->sem); > > Signed-off-by: Fabiano Rosas <faro...@suse.de> > --- > One issue I can see with this is that we might now have to wait at > multifd_send_pages() if a channel takes too long to send it's packet > and come back to p->ready. We would need to find a way of ignoring a > slow channel and skipping ahead to the next one in line.
See my 1st patch in the series. That is exactly what the channels_ready sem does. In your network is faster than your multifd channels can write, main thread never waits on channels_ready, because it is always positive. In case that there are no channels ready, we wait in the sem instead of being busy waiting. And searching for the channel that is ready is really fast: static int multifd_send_pages(QEMUFile *f) { [...] qemu_sem_wait(&multifd_send_state->channels_ready); // taking a sem that is positive is basically 1 instruction next_channel %= migrate_multifd_channels(); // we do crude load balancing here. for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { qemu_mutex_lock(&p->mutex); // 1 lock instruction if (p->quit) { // 1 check .... } if (!p->pending_job) { // 1 check p->pending_job++; next_channel = (i + 1) % migrate_multifd_channels(); break; } qemu_mutex_unlock(&p->mutex); // 1 unlock (another instruction) } .... So checking a channel is: - taking a mutex that is almost always free. - 2 checks - unlock the mutex So I can't really see the need to optimize this. Notice that your scheme only has real advantanges if: - all channels are busy - the channel that becomes ready is just the previous one to next_channel or so And in that case, I think that the solution is to have more channels or faster networking. Later, Juan.