On Tue, 13 Sep 2016 06:13:16 -0400 (EDT) Olivier Fourdan <[email protected]> wrote:
> Hi all > > ----- Original Message ----- > > wl_display_flush() can fail with EAGAIN and Xwayland would make this a > > fatal error. > > > > Handle the usual EAGAIN and EINTR gracefully so that Xwayland doesn't > > die for so little. > > Right, I am running out of ideas... > > So the approach of using poll() to wait for the Wayland file descriptor to > become writeable again leads straight to a deadlock apparently... > > Reason for this is the compositor (gnome-shell/mutter) is itself waiting for > data on the X file descriptor: > > Backtrace of gnome-shell while we hit the EAGAIN case on the Wayland fd on > the Xwayland side: > > #0 0x00007f86d1cd400d in poll () at /lib64/libc.so.6 > #1 0x00007f86d1537d10 in _xcb_conn_wait () at /lib64/libxcb.so.1 > #2 0x00007f86d1539aa9 in xcb_wait_for_event () at /lib64/libxcb.so.1 > #3 0x00007f86d21fe03b in _XReadEvents (dpy=dpy@entry=0x55f956633000) at > xcb_io.c:401 > #4 0x00007f86d21e562e in XIfEvent (dpy=0x55f956633000, event=0x7ffe30c28eb0, > predicate=<find_timestamp_predicate>, arg=0x55f956761100) > at IfEvent.c:68 > #5 0x00007f86d8031ddb in meta_display_get_current_time_roundtrip () at > /lib64/libmutter.so.0 > #6 0x00007f86d805ac49 in handle_other_xevent () at /lib64/libmutter.so.0 > #7 0x00007f86d805b95b in xevent_filter () at /lib64/libmutter.so.0 > #8 0x00007f86d73b98f1 in gdk_event_apply_filters () at /lib64/libgdk-3.so.0 > #9 0x00007f86d73b9cf2 in _gdk_x11_display_queue_events () at > /lib64/libgdk-3.so.0 > #10 0x00007f86d7380f19 in gdk_display_get_event () at /lib64/libgdk-3.so.0 > #11 0x00007f86d73b9962 in gdk_event_source_dispatch () at /lib64/libgdk-3.so.0 > #12 0x00007f86d37d0f22 in g_main_context_dispatch () at > /lib64/libglib-2.0.so.0 > #13 0x00007f86d37d12a0 in g_main_context_iterate.isra () at > /lib64/libglib-2.0.so.0 > #14 0x00007f86d37d15c2 in g_main_loop_run () at /lib64/libglib-2.0.so.0 > #15 0x00007f86d803c00c in meta_run () at /lib64/libmutter.so.0 > #16 0x000055f953220657 in main () > > i.e gnome-shell is stuck in meta_display_get_current_time_roundtrip(): > > https://git.gnome.org/browse/mutter/tree/src/core/display.c#n1300 > > While at the same time, Xwayland is trying to write to the Wayland file > descriptor with wl_display_flush() and gets an EAGAIN in the block_handler(): > > > https://cgit.freedesktop.org/xorg/xserver/tree/hw/xwayland/xwayland.c?h=server-1.18-branch#n483 > > I tried to poll() the Wayland fd with a timeout prior to wl_display_flush() > to make sure to wl_display_flush() only when writable, to see if that would > help unblocking mutter waiting for its PropertyNotify event but that did not > work, the Wayland fd still remains in EAGAIN forever and gnome-shell/mutter > remains stuck waiting for the PropertyNotify event... > > I am a bit puzzled, why is gnome-shell/mutter/xcb waiting for the > PropertyNotify, where is that event gone? Hi Olivier, I don't have any solution for you. The interactions between the Wayland compositor and Xwayland are known to be very easily deadlockable IIRC. I believe the only thing you can do is ensure no such case can ever occur, which is very painful. That is, never do a blocking roundtrip at least from one side. Have the recent modifications caused a significant increase of Wayland requests from Xwayland? If Xwayland needs to send an amount of data bigger than bufferable, *any* blocking roundtrip via X11 from the Wayland compositor is prone to deadlock. It will be waiting for a reply via X11, while Xwayland is blocked on flushing, since the Wayland compositor is not consuming requests. It can also trivially happen if both sides do a blocking roundtrip at the same time. Or just a wait for an event. Either server needs to be able to return to its main loop to process the protocol stream it is the server for. Preferably both, I think. You could check how Weston's XWM works. I highly suspect that after Xwayland launch it avoids doing any blocking roundtrips via X11. I'd assume Xwayland also tries to avoid blocking on Wayland events, but if nothing else, I believe Mesa via GLAMOR may block on wl_buffer.release events... or maybe not if GLAMOR is smart with its throttling. Anyway, since your flush is hitting EAGAIN, that doesn't seem to be the cause. I wonder if making wl_display_flush() block immediately like in your patch could be replaced by adding the wl_display fd to the main poll loop, so that it would get flushed ASAP but still service X11 requests in the mean time? It does run the risk of overflowing the Wayland send buffer in Xwayland. Any way to prioritize the Wayland compositor's X11 connection in Xwayland? Thanks, pq
pgp4idxEu7tuc.pgp
Description: OpenPGP digital signature
_______________________________________________ [email protected]: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel
