Peter Xu <pet...@redhat.com> wrote: > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote: >> Peter Xu <pet...@redhat.com> wrote: >> > Fabiano, >> > >> > Sorry to look at this series late; I messed up my inbox after I reworked my >> > arrangement methodology of emails. ;) >> > >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote: >> >> Fabiano Rosas <faro...@suse.de> wrote: >> >> > The channels_ready semaphore is a global variable not linked to any >> >> > single multifd channel. Waiting on it only means that "some" channel >> >> > has become ready to send data. Since we need to address the channels >> >> > by index (multifd_send_state->params[i]), that information adds >> >> > nothing of value.
>> And that is what we do here. >> We didn't had this last line (not needed for making sure the channels >> are ready here). >> >> But needed to make sure that we are maintaining channels_ready exact. > > I didn't expect it to be exact, I think that's the major part of confusion. > For example, I see this comment: > > static void *multifd_send_thread(void *opaque) > ... > } else { > qemu_mutex_unlock(&p->mutex); > /* sometimes there are spurious wakeups */ > } I put that there during development, and let it there just to be safe. Years later I put an assert() there and did lots of migrations, never hit it. > So do we have spurious wakeup anywhere for either p->sem or channels_ready? > They are related, because if we got spurious p->sem wakeups, then we'll > boost channels_ready one more time too there. I think that we can change that for g_assert_not_reached() > I think two ways to go here: > > - If we want to make them all exact: we'd figure out where are spurious > wake ups and we fix all of them. Or, This one. > - IMHO we can also make them not exact. It means they can allow > spurious, and code can actually also work like that. One example is > e.g. what happens if we get spurious wakeup in multifd_send_pages() for > channels_ready? We simply do some cpu loops as long as we double check > with each channel again, we can even do better that if looping over N > channels and see all busy, "goto retry" and wait on the sem again. > > What do you think? Make sure that it is exact O:-) Later, Juan.