Hi

On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng...@gmail.com> wrote:

> On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau
> <marcandre.lur...@gmail.com> wrote:
> >
> > Hi
> >
> > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng...@gmail.com> wrote:
> >>
> >> From: Bin Meng <bin.m...@windriver.com>
> >>
> >> Random failure was observed when running qtests on Windows due to
> >> "Broken pipe" detected by qmp_fd_receive(). What happened is that
> >> the qtest executable sends testing data over a socket to the QEMU
> >> under test but no response is received. The errno of the recv()
> >> call from the qtest executable indicates ETIMEOUT, due to the qmp
> >> chardev's tcp_chr_read() is never called to receive testing data
> >> hence no response is sent to the other side.
> >>
> >> tcp_chr_read() is registered as the callback of the socket watch
> >> GSource. The reason of the callback not being called by glib, is
> >> that the source check fails to indicate the source is ready. There
> >> are two socket watch sources created to monitor the same socket
> >> event object from the char-socket backend in update_ioc_handlers().
> >>
> >> During the source check phase, qio_channel_socket_source_check()
> >> calls WSAEnumNetworkEvents() to discovers occurrences of network
> >> events for the indicated socket, clear internal network event records,
> >> and reset the event object. Testing shows that if we don't reset the
> >> event object by not passing the event handle to WSAEnumNetworkEvents()
> >> the symptom goes away and qtest runs very stably.
> >>
> >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we
> >> don't parse the result of WSANETWORKEVENTS returned from this API.
> >> We use select() to poll the socket status. Fix this instability by
> >> dropping the WSAEnumNetworkEvents() call.
> >>
> >> Signed-off-by: Bin Meng <bin.m...@windriver.com>
> >
> >
> > What clears the event then?
> >
>
> It seems we don't need to clear the event as everything still works as
> expected.
>

Well, it can "work" but are you sure it doesn't have a busy loop?

>>
> >> ---
> >> During the testing, I removed the following codes in
> update_ioc_handlers():
> >>
> >>     remove_hup_source(s);
> >>     s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
> >>     g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
> >>                           chr, NULL);
> >>     g_source_attach(s->hup_source, chr->gcontext);
> >>
> >> and such change also makes the symptom go away.
> >>
> >> And if I moved the above codes to the beginning, before the call to
> >> io_add_watch_poll(), the symptom also goes away.
> >>
> >> It seems two sources watching on the same socket event object is
> >> the key that leads to the instability. The order of adding a source
> >> watch seems to also play a role but I can't explain why.
> >> Hopefully a Windows and glib expert could explain this behavior.
> >>
> >
> > Feel free to leave that comment in the commit message.
>
> Sure, will add the above message into the commit in v2.
>
> >
> > This is strange, as both sources should have different events, clearing
> one shouldn't affect the other.
>
> Both sources have the same event, as one QIO channel only has one
> socket, and one socket can only be bound to one event.
>

 "one socket can only be bound to one event", is that a WSAEventSelect
limitation?


> >
> > I guess it's WSAEnumNetworkEvents clearing of the internal network event
> records that is problematic.
> >
> > Can you check if you replace the call with ResetEvent() everything works?
>
> No, ResetEvent() does not work, and is not recommended by MSDN [1]
> too, which says:
>

It probably works to some extent (I see glib is using it
https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What
you mean is that it doesn't solve the issue, I guess.

>
> The proper way to reset the state of an event object used with the
> WSAEventSelect function is to pass the handle of the event object to
> the WSAEnumNetworkEvents function in the hEventObject parameter. This
> will reset the event object and adjust the status of active FD events
> on the socket in an atomic fashion.
>
>
This is not what you want though if you have multiple event objects for the
same socket.


> [1]
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect
>
> >
> >
> >>
> >>  io/channel-watch.c | 4 ----
> >>  1 file changed, 4 deletions(-)
> >>
> >> diff --git a/io/channel-watch.c b/io/channel-watch.c
> >> index 89f3c8a88a..e34d86e810 100644
> >> --- a/io/channel-watch.c
> >> +++ b/io/channel-watch.c
> >> @@ -115,17 +115,13 @@ static gboolean
> >>  qio_channel_socket_source_check(GSource *source)
> >>  {
> >>      static struct timeval tv0;
> >> -
> >>      QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source;
> >> -    WSANETWORKEVENTS ev;
> >>      fd_set rfds, wfds, xfds;
> >>
> >>      if (!ssource->condition) {
> >>          return 0;
> >>      }
> >>
> >> -    WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev);
> >> -
> >>      FD_ZERO(&rfds);
> >>      FD_ZERO(&wfds);
> >>      FD_ZERO(&xfds);
> >
> >
> > Unrelated, after this chunk, there is
> >         FD_SET((SOCKET)ssource->socket, &rfds);
> >
> > it seems we can drop the cast. Feel free to send another patch.
> >
>
> Yeah, this cast is unnecessary. Will spin a new patch in v2. Thanks!
>
> Regards,
> Bin
>


-- 
Marc-André Lureau

Reply via email to