On 15 September 2016 at 11:09, Olivier Fourdan <ofour...@redhat.com> wrote:
>> > What this patch does is to avoid dispatching to the Wayland file
>> > descriptor until it becomes available for writing again, while at the
>> > same time continue processing X11 requests to release the deadlock.
>> Could you expand on why the Wayland fd should not be dispatched while
>> you wait for it to flush?
>> Is it just because dispatching it is likely to cause more Wayland
>> requests to be sent? I wonder how that holds up. Maybe it doesn't
>> matter as the premise is that the Wayland compositor is stuck.
> Actually, the idea is to minimize the risk of breaking further down in
> libwayland itself, either in copy_fds_to_connection()
> ("request could not be marshaled: can't send file descriptor") or in
> wl_proxy_marshal_array_constructor_versioned() ("Error sending request:
> Resource temporarily unavailable")
So, yes: you want to avoid sending more traffic down a socket which is
>> If all replies are immediate, it sounds like this would be a fairly
>> reliable fix after all.
> To be honest, I wouldn't consider this a fix, merely a workaround, not to say
> a ugly hack, but afaics from my testing here, it avoids the lock (even
> though, with vlc fullscreen, the deadlock reoccur continuously when the
> pointer hovers the timeline as the tooltip, a shaped window, gets
> mapped/unmapped continuously which causes mutter to issue a lot of those
> blocking rountrips) - Yet it offers a chance to get out of the deadlock
> (eventually...) and not lose Xwayland, which for gnome-shell/mutter means
> losing the entire user session.
Indeed, it is a workaround: X11 requests can still provoke more
activity. Consider where you ignore all Wayland events, but gtkperf
continues rendering as an X11 client. Flushing that rendering to the
upstream compositor (wl_surface_attach/damage/commit/etc) will cause
more traffic on the Wayland connection. You might be able to do even
better by calling
transitioning wait_flush from 0 to 1, and ListenToAllClients() when it
transitions back from 1 to 0.
That being said, it's a good start, and I definitely believe you when
you say it improves things for now. I think in the long run it does
need to be extended so we avoid things like flushing more rendering,
but with EINTR checked in xwl_display_pollout:
Reviewed-by: Daniel Stone <dani...@collabora.com>
firstname.lastname@example.org: X.Org development