Peter Xu <pet...@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
>> Peter Xu <pet...@redhat.com> wrote:
>> > Fabiano,
>> >
>> > Sorry to look at this series late; I messed up my inbox after I reworked my
>> > arrangement methodology of emails. ;)
>> >
>> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
>> >> Fabiano Rosas <faro...@suse.de> wrote:
>> >> > The channels_ready semaphore is a global variable not linked to any
>> >> > single multifd channel. Waiting on it only means that "some" channel
>> >> > has become ready to send data. Since we need to address the channels
>> >> > by index (multifd_send_state->params[i]), that information adds
>> >> > nothing of value.

>> And that is what we do here.
>> We didn't had this last line (not needed for making sure the channels
>> are ready here).
>> 
>> But needed to make sure that we are maintaining channels_ready exact.
>
> I didn't expect it to be exact, I think that's the major part of confusion.
> For example, I see this comment:
>
> static void *multifd_send_thread(void *opaque)
>        ...
>         } else {
>             qemu_mutex_unlock(&p->mutex);
>             /* sometimes there are spurious wakeups */
>         }

I put that there during development, and let it there just to be safe.
Years later I put an assert() there and did lots of migrations, never
hit it.

> So do we have spurious wakeup anywhere for either p->sem or channels_ready?
> They are related, because if we got spurious p->sem wakeups, then we'll
> boost channels_ready one more time too there.

I think that we can change that for g_assert_not_reached()

> I think two ways to go here:
>
>   - If we want to make them all exact: we'd figure out where are spurious
>     wake ups and we fix all of them.  Or,

This one.

>   - IMHO we can also make them not exact.  It means they can allow
>     spurious, and code can actually also work like that.  One example is
>     e.g. what happens if we get spurious wakeup in multifd_send_pages() for
>     channels_ready?  We simply do some cpu loops as long as we double check
>     with each channel again, we can even do better that if looping over N
>     channels and see all busy, "goto retry" and wait on the sem again.
>
> What do you think?

Make sure that it is exact O:-)

Later, Juan.


Reply via email to