Hi On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng...@gmail.com> wrote:
> On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau > <marcandre.lur...@gmail.com> wrote: > > > > Hi > > > > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng...@gmail.com> wrote: > >> > >> From: Bin Meng <bin.m...@windriver.com> > >> > >> Random failure was observed when running qtests on Windows due to > >> "Broken pipe" detected by qmp_fd_receive(). What happened is that > >> the qtest executable sends testing data over a socket to the QEMU > >> under test but no response is received. The errno of the recv() > >> call from the qtest executable indicates ETIMEOUT, due to the qmp > >> chardev's tcp_chr_read() is never called to receive testing data > >> hence no response is sent to the other side. > >> > >> tcp_chr_read() is registered as the callback of the socket watch > >> GSource. The reason of the callback not being called by glib, is > >> that the source check fails to indicate the source is ready. There > >> are two socket watch sources created to monitor the same socket > >> event object from the char-socket backend in update_ioc_handlers(). > >> > >> During the source check phase, qio_channel_socket_source_check() > >> calls WSAEnumNetworkEvents() to discovers occurrences of network > >> events for the indicated socket, clear internal network event records, > >> and reset the event object. Testing shows that if we don't reset the > >> event object by not passing the event handle to WSAEnumNetworkEvents() > >> the symptom goes away and qtest runs very stably. > >> > >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we > >> don't parse the result of WSANETWORKEVENTS returned from this API. > >> We use select() to poll the socket status. Fix this instability by > >> dropping the WSAEnumNetworkEvents() call. > >> > >> Signed-off-by: Bin Meng <bin.m...@windriver.com> > > > > > > What clears the event then? > > > > It seems we don't need to clear the event as everything still works as > expected. > Well, it can "work" but are you sure it doesn't have a busy loop? >> > >> --- > >> During the testing, I removed the following codes in > update_ioc_handlers(): > >> > >> remove_hup_source(s); > >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > >> chr, NULL); > >> g_source_attach(s->hup_source, chr->gcontext); > >> > >> and such change also makes the symptom go away. > >> > >> And if I moved the above codes to the beginning, before the call to > >> io_add_watch_poll(), the symptom also goes away. > >> > >> It seems two sources watching on the same socket event object is > >> the key that leads to the instability. The order of adding a source > >> watch seems to also play a role but I can't explain why. > >> Hopefully a Windows and glib expert could explain this behavior. > >> > > > > Feel free to leave that comment in the commit message. > > Sure, will add the above message into the commit in v2. > > > > > This is strange, as both sources should have different events, clearing > one shouldn't affect the other. > > Both sources have the same event, as one QIO channel only has one > socket, and one socket can only be bound to one event. > "one socket can only be bound to one event", is that a WSAEventSelect limitation? > > > > I guess it's WSAEnumNetworkEvents clearing of the internal network event > records that is problematic. > > > > Can you check if you replace the call with ResetEvent() everything works? > > No, ResetEvent() does not work, and is not recommended by MSDN [1] > too, which says: > It probably works to some extent (I see glib is using it https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What you mean is that it doesn't solve the issue, I guess. > > The proper way to reset the state of an event object used with the > WSAEventSelect function is to pass the handle of the event object to > the WSAEnumNetworkEvents function in the hEventObject parameter. This > will reset the event object and adjust the status of active FD events > on the socket in an atomic fashion. > > This is not what you want though if you have multiple event objects for the same socket. > [1] > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect > > > > > > >> > >> io/channel-watch.c | 4 ---- > >> 1 file changed, 4 deletions(-) > >> > >> diff --git a/io/channel-watch.c b/io/channel-watch.c > >> index 89f3c8a88a..e34d86e810 100644 > >> --- a/io/channel-watch.c > >> +++ b/io/channel-watch.c > >> @@ -115,17 +115,13 @@ static gboolean > >> qio_channel_socket_source_check(GSource *source) > >> { > >> static struct timeval tv0; > >> - > >> QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source; > >> - WSANETWORKEVENTS ev; > >> fd_set rfds, wfds, xfds; > >> > >> if (!ssource->condition) { > >> return 0; > >> } > >> > >> - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > >> - > >> FD_ZERO(&rfds); > >> FD_ZERO(&wfds); > >> FD_ZERO(&xfds); > > > > > > Unrelated, after this chunk, there is > > FD_SET((SOCKET)ssource->socket, &rfds); > > > > it seems we can drop the cast. Feel free to send another patch. > > > > Yeah, this cast is unnecessary. Will spin a new patch in v2. Thanks! > > Regards, > Bin > -- Marc-André Lureau