Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd
On Thu, 2016-09-15 at 12:27 +0300, Pekka Paalanen wrote: > The blocking X11 calls from the Wayland compositor - are they all of > the nature that the X server will reply to them immediately when it > reads them, or could the reply be delayed allowing other X11 clients to > be serviced in the mean time? Replies to clients are buffered, and flushes happen whenever the server thinks it's a good idea [1]. But unlike wayland - or at least, unlike the existing implementation of wayland - when writes to the socket would block, xserver will just keep hold of the output buffer and retry later. > If all replies are immediate, it sounds like this would be a fairly > reliable fix after all. > They're not. But it's still an improvement. If Xwayland can't assume that its wm and wayland server don't block on each other, then the best it can do is try to limit the amount of wayland protocol it generates so that it never races too far ahead. I think to do that you end up wanting a journal of the X client actions that need to be propagated to the wayland server, doing dead-operation elimination on it to optimize away things like creating/destroying a popup window before it's seen (see gtkperf combobox for example), and syncing that to the wayland server on more or less frame boundaries. Except, of course, that since the wayland server is also the X window manager, it can see all those intermediate states... Really the whole thing would be a lot less fragile if the X server was just allowed to request window positioning of the wl server and used a normal window manager "inside" the X session. But noo. Can't let people know where their windows are on the screen, for... some reason. [1] - Including: when there are damage events pending, when an error is generated, when the reply being generated would overflow the existing write buffer, and when we switch which client we're servicing due to timeslice expiry or client death. - ajax ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel
Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd
Hi, On 15 September 2016 at 11:09, Olivier Fourdanwrote: >> > What this patch does is to avoid dispatching to the Wayland file >> > descriptor until it becomes available for writing again, while at the >> > same time continue processing X11 requests to release the deadlock. >> >> Could you expand on why the Wayland fd should not be dispatched while >> you wait for it to flush? >> >> Is it just because dispatching it is likely to cause more Wayland >> requests to be sent? I wonder how that holds up. Maybe it doesn't >> matter as the premise is that the Wayland compositor is stuck. > > Actually, the idea is to minimize the risk of breaking further down in > libwayland itself, either in copy_fds_to_connection() > ("request could not be marshaled: can't send file descriptor") or in > wl_proxy_marshal_array_constructor_versioned() ("Error sending request: > Resource temporarily unavailable") So, yes: you want to avoid sending more traffic down a socket which is already full. >> If all replies are immediate, it sounds like this would be a fairly >> reliable fix after all. > > To be honest, I wouldn't consider this a fix, merely a workaround, not to say > a ugly hack, but afaics from my testing here, it avoids the lock (even > though, with vlc fullscreen, the deadlock reoccur continuously when the > pointer hovers the timeline as the tooltip, a shaped window, gets > mapped/unmapped continuously which causes mutter to issue a lot of those > blocking rountrips) - Yet it offers a chance to get out of the deadlock > (eventually...) and not lose Xwayland, which for gnome-shell/mutter means > losing the entire user session. Indeed, it is a workaround: X11 requests can still provoke more activity. Consider where you ignore all Wayland events, but gtkperf continues rendering as an X11 client. Flushing that rendering to the upstream compositor (wl_surface_attach/damage/commit/etc) will cause more traffic on the Wayland connection. You might be able to do even better by calling OnlyListenToOneClient(clientInfo.clients[wm_client_index]) when transitioning wait_flush from 0 to 1, and ListenToAllClients() when it transitions back from 1 to 0. That being said, it's a good start, and I definitely believe you when you say it improves things for now. I think in the long run it does need to be extended so we avoid things like flushing more rendering, but with EINTR checked in xwl_display_pollout: Reviewed-by: Daniel Stone Cheers, Daniel ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel
Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd
Hi Pekka, > all the rationale in this commit message looks good to me, just some > things I'd like to ask more about. > > > What this patch does is to avoid dispatching to the Wayland file > > descriptor until it becomes available for writing again, while at the > > same time continue processing X11 requests to release the deadlock. > > Could you expand on why the Wayland fd should not be dispatched while > you wait for it to flush? > > Is it just because dispatching it is likely to cause more Wayland > requests to be sent? I wonder how that holds up. Maybe it doesn't > matter as the premise is that the Wayland compositor is stuck. Actually, the idea is to minimize the risk of breaking further down in libwayland itself, either in copy_fds_to_connection() ("request could not be marshaled: can't send file descriptor") or in wl_proxy_marshal_array_constructor_versioned() ("Error sending request: Resource temporarily unavailable") > > This is not perfect, as there is still the possibility of another X > > client hammering the connection and we'll still fail writing to the > > Wayland connection eventually, but this improves things enough to avoid > > a 100% repeatable crash with vlc and gtkperf. > > > > Also, it is worth considering that window managers and Wayland > > compositors such as mutter already have a higher priority than other > > regular X clients thanks to XSyncSetPriority(), mitigating the risk. > > The blocking X11 calls from the Wayland compositor - are they all of > the nature that the X server will reply to them immediately when it > reads them, or could the reply be delayed allowing other X11 clients to > be serviced in the mean time? Either requests or replies can be queued up while new ones come in, the previous version of that patch (one that I didn't send) was setting the Xserver's "dispatchException" (see ajax' reply) which has the effect of going straight to FlushAllOutput() to dequeue the replies without processing new incoming requests, but that would still leave the door open for a deadlock if the X11 requests wasn't read yet by the Xserver when the Wayland fd flooding occurs, as it would not process new incoming requests - I was able to reproduce this with gtkperf (another case of that issue). > If all replies are immediate, it sounds like this would be a fairly > reliable fix after all. To be honest, I wouldn't consider this a fix, merely a workaround, not to say a ugly hack, but afaics from my testing here, it avoids the lock (even though, with vlc fullscreen, the deadlock reoccur continuously when the pointer hovers the timeline as the tooltip, a shaped window, gets mapped/unmapped continuously which causes mutter to issue a lot of those blocking rountrips) - Yet it offers a chance to get out of the deadlock (eventually...) and not lose Xwayland, which for gnome-shell/mutter means losing the entire user session. Note: Reviewing my past logs, I think it's worth checking EINTR after the poll() returns... Cheers, Olivier ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel
Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd
On Wed, 14 Sep 2016 18:20:10 +0200 Olivier Fourdanwrote: > wl_display_flush() can fail with EAGAIN and Xwayland would make this a > fatal error. > > When this happens, it means that Xwayland has flooded the Wayland file > descriptor, either because the Wayland compositor cannot cope or more > likely because of a deadlock situation where the Wayland compositor is > blocking, waiting for an X reply while Xwayland tries to write data to > the Wayland file descriptor. > > The general consensus to avoid the deadlock is for the Wayland > compositor to never issue blocking X11 roundtrips, but in practice > blocking rountrips can occur in various places, including Xlib calls > themselves so this is not always achievable without major surgery in the > Wayland compositor/Window manager. Hi Olivier, all the rationale in this commit message looks good to me, just some things I'd like to ask more about. > What this patch does is to avoid dispatching to the Wayland file > descriptor until it becomes available for writing again, while at the > same time continue processing X11 requests to release the deadlock. Could you expand on why the Wayland fd should not be dispatched while you wait for it to flush? Is it just because dispatching it is likely to cause more Wayland requests to be sent? I wonder how that holds up. Maybe it doesn't matter as the premise is that the Wayland compositor is stuck. > This is not perfect, as there is still the possibility of another X > client hammering the connection and we'll still fail writing to the > Wayland connection eventually, but this improves things enough to avoid > a 100% repeatable crash with vlc and gtkperf. > > Also, it is worth considering that window managers and Wayland > compositors such as mutter already have a higher priority than other > regular X clients thanks to XSyncSetPriority(), mitigating the risk. The blocking X11 calls from the Wayland compositor - are they all of the nature that the X server will reply to them immediately when it reads them, or could the reply be delayed allowing other X11 clients to be serviced in the mean time? If all replies are immediate, it sounds like this would be a fairly reliable fix after all. Thanks, pq > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278159 > Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=763400 > Signed-off-by: Olivier Fourdan > --- > v2: oops, need to poll() on EAGAIN between retries > v3: Mostly a rewrite, non-blocking on poll() > > hw/xwayland/xwayland.c | 34 +++--- > hw/xwayland/xwayland.h | 1 + > 2 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/hw/xwayland/xwayland.c b/hw/xwayland/xwayland.c > index 847321e..2efbff8 100644 > --- a/hw/xwayland/xwayland.c > +++ b/hw/xwayland/xwayland.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > > #ifdef XF86VIDMODE > #include > @@ -470,6 +471,9 @@ xwl_read_events (struct xwl_screen *xwl_screen) > { > int ret; > > +if (xwl_screen->wait_flush) > +return; > + > ret = wl_display_read_events(xwl_screen->display); > if (ret == -1) > FatalError("failed to read Wayland events: %s\n", strerror(errno)); > @@ -481,10 +485,25 @@ xwl_read_events (struct xwl_screen *xwl_screen) > FatalError("failed to dispatch Wayland events: %s\n", > strerror(errno)); > } > > +static int > +xwl_display_pollout (struct xwl_screen *xwl_screen, int timeout) > +{ > +struct pollfd poll_fd; > + > +poll_fd.fd = wl_display_get_fd(xwl_screen->display); > +poll_fd.events = POLLOUT; > + > +return xserver_poll(_fd, 1, timeout); > +} > + > static void > xwl_dispatch_events (struct xwl_screen *xwl_screen) > { > -int ret; > +int ret = 0; > +int ready; > + > +if (xwl_screen->wait_flush) > +goto pollout; > > while (xwl_screen->prepare_read == 0 && > wl_display_prepare_read(xwl_screen->display) == -1) { > @@ -496,9 +515,18 @@ xwl_dispatch_events (struct xwl_screen *xwl_screen) > > xwl_screen->prepare_read = 1; > > -ret = wl_display_flush(xwl_screen->display); > -if (ret == -1) > +pollout: > +ready = xwl_display_pollout(xwl_screen, 5); > +if (ready == -1) > +FatalError("error polling on XWayland fd: %s\n", strerror(errno)); > + > +if (ready > 0) > +ret = wl_display_flush(xwl_screen->display); > + > +if (ret == -1 && errno != EAGAIN) > FatalError("failed to write to XWayland fd: %s\n", strerror(errno)); > + > +xwl_screen->wait_flush = (ready == 0 || ret == -1); > } > > static void > diff --git a/hw/xwayland/xwayland.h b/hw/xwayland/xwayland.h > index db3dd0b..3ce7a63 100644 > --- a/hw/xwayland/xwayland.h > +++ b/hw/xwayland/xwayland.h > @@ -83,6 +83,7 @@ struct xwl_screen { > #define XWL_FORMAT_RGB565 (1 << 2) > > int prepare_read; > +int wait_flush; > > char
[PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd
wl_display_flush() can fail with EAGAIN and Xwayland would make this a fatal error. When this happens, it means that Xwayland has flooded the Wayland file descriptor, either because the Wayland compositor cannot cope or more likely because of a deadlock situation where the Wayland compositor is blocking, waiting for an X reply while Xwayland tries to write data to the Wayland file descriptor. The general consensus to avoid the deadlock is for the Wayland compositor to never issue blocking X11 roundtrips, but in practice blocking rountrips can occur in various places, including Xlib calls themselves so this is not always achievable without major surgery in the Wayland compositor/Window manager. What this patch does is to avoid dispatching to the Wayland file descriptor until it becomes available for writing again, while at the same time continue processing X11 requests to release the deadlock. This is not perfect, as there is still the possibility of another X client hammering the connection and we'll still fail writing to the Wayland connection eventually, but this improves things enough to avoid a 100% repeatable crash with vlc and gtkperf. Also, it is worth considering that window managers and Wayland compositors such as mutter already have a higher priority than other regular X clients thanks to XSyncSetPriority(), mitigating the risk. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278159 Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=763400 Signed-off-by: Olivier Fourdan--- v2: oops, need to poll() on EAGAIN between retries v3: Mostly a rewrite, non-blocking on poll() hw/xwayland/xwayland.c | 34 +++--- hw/xwayland/xwayland.h | 1 + 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/hw/xwayland/xwayland.c b/hw/xwayland/xwayland.c index 847321e..2efbff8 100644 --- a/hw/xwayland/xwayland.c +++ b/hw/xwayland/xwayland.c @@ -33,6 +33,7 @@ #include #include #include +#include #ifdef XF86VIDMODE #include @@ -470,6 +471,9 @@ xwl_read_events (struct xwl_screen *xwl_screen) { int ret; +if (xwl_screen->wait_flush) +return; + ret = wl_display_read_events(xwl_screen->display); if (ret == -1) FatalError("failed to read Wayland events: %s\n", strerror(errno)); @@ -481,10 +485,25 @@ xwl_read_events (struct xwl_screen *xwl_screen) FatalError("failed to dispatch Wayland events: %s\n", strerror(errno)); } +static int +xwl_display_pollout (struct xwl_screen *xwl_screen, int timeout) +{ +struct pollfd poll_fd; + +poll_fd.fd = wl_display_get_fd(xwl_screen->display); +poll_fd.events = POLLOUT; + +return xserver_poll(_fd, 1, timeout); +} + static void xwl_dispatch_events (struct xwl_screen *xwl_screen) { -int ret; +int ret = 0; +int ready; + +if (xwl_screen->wait_flush) +goto pollout; while (xwl_screen->prepare_read == 0 && wl_display_prepare_read(xwl_screen->display) == -1) { @@ -496,9 +515,18 @@ xwl_dispatch_events (struct xwl_screen *xwl_screen) xwl_screen->prepare_read = 1; -ret = wl_display_flush(xwl_screen->display); -if (ret == -1) +pollout: +ready = xwl_display_pollout(xwl_screen, 5); +if (ready == -1) +FatalError("error polling on XWayland fd: %s\n", strerror(errno)); + +if (ready > 0) +ret = wl_display_flush(xwl_screen->display); + +if (ret == -1 && errno != EAGAIN) FatalError("failed to write to XWayland fd: %s\n", strerror(errno)); + +xwl_screen->wait_flush = (ready == 0 || ret == -1); } static void diff --git a/hw/xwayland/xwayland.h b/hw/xwayland/xwayland.h index db3dd0b..3ce7a63 100644 --- a/hw/xwayland/xwayland.h +++ b/hw/xwayland/xwayland.h @@ -83,6 +83,7 @@ struct xwl_screen { #define XWL_FORMAT_RGB565 (1 << 2) int prepare_read; +int wait_flush; char *device_name; int drm_fd; -- 2.9.3 ___ xorg-devel@lists.x.org: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel