> all the rationale in this commit message looks good to me, just some
> things I'd like to ask more about.
> > What this patch does is to avoid dispatching to the Wayland file
> > descriptor until it becomes available for writing again, while at the
> > same time continue processing X11 requests to release the deadlock.
> Could you expand on why the Wayland fd should not be dispatched while
> you wait for it to flush?
> Is it just because dispatching it is likely to cause more Wayland
> requests to be sent? I wonder how that holds up. Maybe it doesn't
> matter as the premise is that the Wayland compositor is stuck.
Actually, the idea is to minimize the risk of breaking further down in
libwayland itself, either in copy_fds_to_connection()
("request could not be marshaled: can't send file descriptor") or in
wl_proxy_marshal_array_constructor_versioned() ("Error sending request:
Resource temporarily unavailable")
> > This is not perfect, as there is still the possibility of another X
> > client hammering the connection and we'll still fail writing to the
> > Wayland connection eventually, but this improves things enough to avoid
> > a 100% repeatable crash with vlc and gtkperf.
> > Also, it is worth considering that window managers and Wayland
> > compositors such as mutter already have a higher priority than other
> > regular X clients thanks to XSyncSetPriority(), mitigating the risk.
> The blocking X11 calls from the Wayland compositor - are they all of
> the nature that the X server will reply to them immediately when it
> reads them, or could the reply be delayed allowing other X11 clients to
> be serviced in the mean time?
Either requests or replies can be queued up while new ones come in, the
previous version of that patch (one that I didn't send) was setting the
Xserver's "dispatchException" (see ajax' reply) which has the effect of going
straight to FlushAllOutput() to dequeue the replies without processing new
incoming requests, but that would still leave the door open for a deadlock if
the X11 requests wasn't read yet by the Xserver when the Wayland fd flooding
occurs, as it would not process new incoming requests - I was able to reproduce
this with gtkperf (another case of that issue).
> If all replies are immediate, it sounds like this would be a fairly
> reliable fix after all.
To be honest, I wouldn't consider this a fix, merely a workaround, not to say a
ugly hack, but afaics from my testing here, it avoids the lock (even though,
with vlc fullscreen, the deadlock reoccur continuously when the pointer hovers
the timeline as the tooltip, a shaped window, gets mapped/unmapped continuously
which causes mutter to issue a lot of those blocking rountrips) - Yet it offers
a chance to get out of the deadlock (eventually...) and not lose Xwayland,
which for gnome-shell/mutter means losing the entire user session.
Note: Reviewing my past logs, I think it's worth checking EINTR after the
firstname.lastname@example.org: X.Org development