Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd

2016-09-15 Thread Adam Jackson
On Thu, 2016-09-15 at 12:27 +0300, Pekka Paalanen wrote:

> The blocking X11 calls from the Wayland compositor - are they all of
> the nature that the X server will reply to them immediately when it
> reads them, or could the reply be delayed allowing other X11 clients to
> be serviced in the mean time?

Replies to clients are buffered, and flushes happen whenever the server
thinks it's a good idea [1]. But unlike wayland - or at least, unlike the 
existing implementation of wayland - when writes to the socket would block, 
xserver will just keep hold of the output buffer and retry later.

> If all replies are immediate, it sounds like this would be a fairly
> reliable fix after all.
> 

They're not. But it's still an improvement.

If Xwayland can't assume that its wm and wayland server don't block on
each other, then the best it can do is try to limit the amount of
wayland protocol it generates so that it never races too far ahead. I
think to do that you end up wanting a journal of the X client actions
that need to be propagated to the wayland server, doing dead-operation
elimination on it to optimize away things like creating/destroying a
popup window before it's seen (see gtkperf combobox for example), and
syncing that to the wayland server on more or less frame boundaries.
Except, of course, that since the wayland server is also the X window
manager, it can see all those intermediate states...

Really the whole thing would be a lot less fragile if the X server was
just allowed to request window positioning of the wl server and used a
normal window manager "inside" the X session. But noo. Can't let
people know where their windows are on the screen, for... some reason.

[1] - Including: when there are damage events pending, when an error is
generated, when the reply being generated would overflow the existing
write buffer, and when we switch which client we're servicing due to
timeslice expiry or client death.

- ajax
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd

2016-09-15 Thread Daniel Stone
Hi,

On 15 September 2016 at 11:09, Olivier Fourdan  wrote:
>> > What this patch does is to avoid dispatching to the Wayland file
>> > descriptor until it becomes available for writing again, while at the
>> > same time continue processing X11 requests to release the deadlock.
>>
>> Could you expand on why the Wayland fd should not be dispatched while
>> you wait for it to flush?
>>
>> Is it just because dispatching it is likely to cause more Wayland
>> requests to be sent? I wonder how that holds up. Maybe it doesn't
>> matter as the premise is that the Wayland compositor is stuck.
>
> Actually, the idea is to minimize the risk of breaking further down in 
> libwayland itself, either in copy_fds_to_connection()
> ("request could not be marshaled: can't send file descriptor") or in 
> wl_proxy_marshal_array_constructor_versioned() ("Error sending request: 
> Resource temporarily unavailable")

So, yes: you want to avoid sending more traffic down a socket which is
already full.

>> If all replies are immediate, it sounds like this would be a fairly
>> reliable fix after all.
>
> To be honest, I wouldn't consider this a fix, merely a workaround, not to say 
> a ugly hack, but afaics from my testing here, it avoids the lock (even 
> though, with vlc fullscreen, the deadlock reoccur continuously when the 
> pointer hovers the timeline as the tooltip, a shaped window, gets 
> mapped/unmapped continuously which causes mutter to issue a lot of those 
> blocking rountrips) - Yet it offers a chance to get out of the deadlock 
> (eventually...) and not lose Xwayland, which for gnome-shell/mutter means 
> losing the entire user session.

Indeed, it is a workaround: X11 requests can still provoke more
activity. Consider where you ignore all Wayland events, but gtkperf
continues rendering as an X11 client. Flushing that rendering to the
upstream compositor (wl_surface_attach/damage/commit/etc) will cause
more traffic on the Wayland connection. You might be able to do even
better by calling
OnlyListenToOneClient(clientInfo.clients[wm_client_index]) when
transitioning wait_flush from 0 to 1, and ListenToAllClients() when it
transitions back from 1 to 0.

That being said, it's a good start, and I definitely believe you when
you say it improves things for now. I think in the long run it does
need to be extended so we avoid things like flushing more rendering,
but with EINTR checked in xwl_display_pollout:
Reviewed-by: Daniel Stone 

Cheers,
Daniel
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd

2016-09-15 Thread Olivier Fourdan
Hi Pekka,

> all the rationale in this commit message looks good to me, just some
> things I'd like to ask more about.
> 
> > What this patch does is to avoid dispatching to the Wayland file
> > descriptor until it becomes available for writing again, while at the
> > same time continue processing X11 requests to release the deadlock.
> 
> Could you expand on why the Wayland fd should not be dispatched while
> you wait for it to flush?
> 
> Is it just because dispatching it is likely to cause more Wayland
> requests to be sent? I wonder how that holds up. Maybe it doesn't
> matter as the premise is that the Wayland compositor is stuck.

Actually, the idea is to minimize the risk of breaking further down in 
libwayland itself, either in copy_fds_to_connection()
("request could not be marshaled: can't send file descriptor") or in 
wl_proxy_marshal_array_constructor_versioned() ("Error sending request: 
Resource temporarily unavailable")

> > This is not perfect, as there is still the possibility of another X
> > client hammering the connection and we'll still fail writing to the
> > Wayland connection eventually, but this improves things enough to avoid
> > a 100% repeatable crash with vlc and gtkperf.
> > 
> > Also, it is worth considering that window managers and Wayland
> > compositors such as mutter already have a higher priority than other
> > regular X clients thanks to XSyncSetPriority(), mitigating the risk.
> 
> The blocking X11 calls from the Wayland compositor - are they all of
> the nature that the X server will reply to them immediately when it
> reads them, or could the reply be delayed allowing other X11 clients to
> be serviced in the mean time?

Either requests or replies can be queued up while new ones come in, the 
previous version of that patch (one that I didn't send) was setting the 
Xserver's  "dispatchException" (see ajax' reply) which has the effect of going 
straight to FlushAllOutput() to dequeue the replies without processing new 
incoming requests, but that would still leave the door open for a deadlock if 
the X11 requests wasn't read yet by the Xserver when the Wayland fd flooding 
occurs, as it would not process new incoming requests - I was able to reproduce 
this with gtkperf (another case of that issue).
 
> If all replies are immediate, it sounds like this would be a fairly
> reliable fix after all.

To be honest, I wouldn't consider this a fix, merely a workaround, not to say a 
ugly hack, but afaics from my testing here, it avoids the lock (even though, 
with vlc fullscreen, the deadlock reoccur continuously when the pointer hovers 
the timeline as the tooltip, a shaped window, gets mapped/unmapped continuously 
which causes mutter to issue a lot of those blocking rountrips) - Yet it offers 
a chance to get out of the deadlock (eventually...) and not lose Xwayland, 
which for gnome-shell/mutter means losing the entire user session.

Note: Reviewing my past logs, I think it's worth checking EINTR after the 
poll() returns...

Cheers,
Olivier
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Re: [PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd

2016-09-15 Thread Pekka Paalanen
On Wed, 14 Sep 2016 18:20:10 +0200
Olivier Fourdan  wrote:

> wl_display_flush() can fail with EAGAIN and Xwayland would make this a
> fatal error.
> 
> When this happens, it means that Xwayland has flooded the Wayland file
> descriptor, either because the Wayland compositor cannot cope or more
> likely because of a deadlock situation where the Wayland compositor is
> blocking, waiting for an X reply while Xwayland tries to write data to
> the Wayland file descriptor.
> 
> The general consensus to avoid the deadlock is for the Wayland
> compositor to never issue blocking X11 roundtrips, but in practice
> blocking rountrips can occur in various places, including Xlib calls
> themselves so this is not always achievable without major surgery in the
> Wayland compositor/Window manager.

Hi Olivier,

all the rationale in this commit message looks good to me, just some
things I'd like to ask more about.

> What this patch does is to avoid dispatching to the Wayland file
> descriptor until it becomes available for writing again, while at the
> same time continue processing X11 requests to release the deadlock.

Could you expand on why the Wayland fd should not be dispatched while
you wait for it to flush?

Is it just because dispatching it is likely to cause more Wayland
requests to be sent? I wonder how that holds up. Maybe it doesn't
matter as the premise is that the Wayland compositor is stuck.

> This is not perfect, as there is still the possibility of another X
> client hammering the connection and we'll still fail writing to the
> Wayland connection eventually, but this improves things enough to avoid
> a 100% repeatable crash with vlc and gtkperf.
> 
> Also, it is worth considering that window managers and Wayland
> compositors such as mutter already have a higher priority than other
> regular X clients thanks to XSyncSetPriority(), mitigating the risk.

The blocking X11 calls from the Wayland compositor - are they all of
the nature that the X server will reply to them immediately when it
reads them, or could the reply be delayed allowing other X11 clients to
be serviced in the mean time?

If all replies are immediate, it sounds like this would be a fairly
reliable fix after all.


Thanks,
pq

> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278159
> Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=763400
> Signed-off-by: Olivier Fourdan 
> ---
>  v2: oops, need to poll() on EAGAIN between retries
>  v3: Mostly a rewrite, non-blocking on poll()
> 
>  hw/xwayland/xwayland.c | 34 +++---
>  hw/xwayland/xwayland.h |  1 +
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/xwayland/xwayland.c b/hw/xwayland/xwayland.c
> index 847321e..2efbff8 100644
> --- a/hw/xwayland/xwayland.c
> +++ b/hw/xwayland/xwayland.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef XF86VIDMODE
>  #include 
> @@ -470,6 +471,9 @@ xwl_read_events (struct xwl_screen *xwl_screen)
>  {
>  int ret;
>  
> +if (xwl_screen->wait_flush)
> +return;
> +
>  ret = wl_display_read_events(xwl_screen->display);
>  if (ret == -1)
>  FatalError("failed to read Wayland events: %s\n", strerror(errno));
> @@ -481,10 +485,25 @@ xwl_read_events (struct xwl_screen *xwl_screen)
>  FatalError("failed to dispatch Wayland events: %s\n", 
> strerror(errno));
>  }
>  
> +static int
> +xwl_display_pollout (struct xwl_screen *xwl_screen, int timeout)
> +{
> +struct pollfd poll_fd;
> +
> +poll_fd.fd = wl_display_get_fd(xwl_screen->display);
> +poll_fd.events = POLLOUT;
> +
> +return xserver_poll(_fd, 1, timeout);
> +}
> +
>  static void
>  xwl_dispatch_events (struct xwl_screen *xwl_screen)
>  {
> -int ret;
> +int ret = 0;
> +int ready;
> +
> +if (xwl_screen->wait_flush)
> +goto pollout;
>  
>  while (xwl_screen->prepare_read == 0 &&
> wl_display_prepare_read(xwl_screen->display) == -1) {
> @@ -496,9 +515,18 @@ xwl_dispatch_events (struct xwl_screen *xwl_screen)
>  
>  xwl_screen->prepare_read = 1;
>  
> -ret = wl_display_flush(xwl_screen->display);
> -if (ret == -1)
> +pollout:
> +ready = xwl_display_pollout(xwl_screen, 5);
> +if (ready == -1)
> +FatalError("error polling on XWayland fd: %s\n", strerror(errno));
> +
> +if (ready > 0)
> +ret = wl_display_flush(xwl_screen->display);
> +
> +if (ret == -1 && errno != EAGAIN)
>  FatalError("failed to write to XWayland fd: %s\n", strerror(errno));
> +
> +xwl_screen->wait_flush = (ready == 0 || ret == -1);
>  }
>  
>  static void
> diff --git a/hw/xwayland/xwayland.h b/hw/xwayland/xwayland.h
> index db3dd0b..3ce7a63 100644
> --- a/hw/xwayland/xwayland.h
> +++ b/hw/xwayland/xwayland.h
> @@ -83,6 +83,7 @@ struct xwl_screen {
>  #define XWL_FORMAT_RGB565   (1 << 2)
>  
>  int prepare_read;
> +int wait_flush;
>  
>  char 

[PATCH xserver v3] xwayland: handle EAGAIN on Wayland fd

2016-09-14 Thread Olivier Fourdan
wl_display_flush() can fail with EAGAIN and Xwayland would make this a
fatal error.

When this happens, it means that Xwayland has flooded the Wayland file
descriptor, either because the Wayland compositor cannot cope or more
likely because of a deadlock situation where the Wayland compositor is
blocking, waiting for an X reply while Xwayland tries to write data to
the Wayland file descriptor.

The general consensus to avoid the deadlock is for the Wayland
compositor to never issue blocking X11 roundtrips, but in practice
blocking rountrips can occur in various places, including Xlib calls
themselves so this is not always achievable without major surgery in the
Wayland compositor/Window manager.

What this patch does is to avoid dispatching to the Wayland file
descriptor until it becomes available for writing again, while at the
same time continue processing X11 requests to release the deadlock.

This is not perfect, as there is still the possibility of another X
client hammering the connection and we'll still fail writing to the
Wayland connection eventually, but this improves things enough to avoid
a 100% repeatable crash with vlc and gtkperf.

Also, it is worth considering that window managers and Wayland
compositors such as mutter already have a higher priority than other
regular X clients thanks to XSyncSetPriority(), mitigating the risk.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278159
Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=763400
Signed-off-by: Olivier Fourdan 
---
 v2: oops, need to poll() on EAGAIN between retries
 v3: Mostly a rewrite, non-blocking on poll()

 hw/xwayland/xwayland.c | 34 +++---
 hw/xwayland/xwayland.h |  1 +
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/hw/xwayland/xwayland.c b/hw/xwayland/xwayland.c
index 847321e..2efbff8 100644
--- a/hw/xwayland/xwayland.c
+++ b/hw/xwayland/xwayland.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef XF86VIDMODE
 #include 
@@ -470,6 +471,9 @@ xwl_read_events (struct xwl_screen *xwl_screen)
 {
 int ret;
 
+if (xwl_screen->wait_flush)
+return;
+
 ret = wl_display_read_events(xwl_screen->display);
 if (ret == -1)
 FatalError("failed to read Wayland events: %s\n", strerror(errno));
@@ -481,10 +485,25 @@ xwl_read_events (struct xwl_screen *xwl_screen)
 FatalError("failed to dispatch Wayland events: %s\n", strerror(errno));
 }
 
+static int
+xwl_display_pollout (struct xwl_screen *xwl_screen, int timeout)
+{
+struct pollfd poll_fd;
+
+poll_fd.fd = wl_display_get_fd(xwl_screen->display);
+poll_fd.events = POLLOUT;
+
+return xserver_poll(_fd, 1, timeout);
+}
+
 static void
 xwl_dispatch_events (struct xwl_screen *xwl_screen)
 {
-int ret;
+int ret = 0;
+int ready;
+
+if (xwl_screen->wait_flush)
+goto pollout;
 
 while (xwl_screen->prepare_read == 0 &&
wl_display_prepare_read(xwl_screen->display) == -1) {
@@ -496,9 +515,18 @@ xwl_dispatch_events (struct xwl_screen *xwl_screen)
 
 xwl_screen->prepare_read = 1;
 
-ret = wl_display_flush(xwl_screen->display);
-if (ret == -1)
+pollout:
+ready = xwl_display_pollout(xwl_screen, 5);
+if (ready == -1)
+FatalError("error polling on XWayland fd: %s\n", strerror(errno));
+
+if (ready > 0)
+ret = wl_display_flush(xwl_screen->display);
+
+if (ret == -1 && errno != EAGAIN)
 FatalError("failed to write to XWayland fd: %s\n", strerror(errno));
+
+xwl_screen->wait_flush = (ready == 0 || ret == -1);
 }
 
 static void
diff --git a/hw/xwayland/xwayland.h b/hw/xwayland/xwayland.h
index db3dd0b..3ce7a63 100644
--- a/hw/xwayland/xwayland.h
+++ b/hw/xwayland/xwayland.h
@@ -83,6 +83,7 @@ struct xwl_screen {
 #define XWL_FORMAT_RGB565   (1 << 2)
 
 int prepare_read;
+int wait_flush;
 
 char *device_name;
 int drm_fd;
-- 
2.9.3

___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel