I tracked it down to _XSERVTransWritev returning -1 with errno=EAGAIN within FlushClient(). I was able to workaround the issue by immediately retrying (usually once, sometimes twice).
The problem is with the existing EAGAIN error handling. FlushClient() re-adds the client to the pending list and returns, but FlushClient() doesn't get called again. We would expect it to be called via FlushAllOutput() via WaitForSomething(), but that only happens when NewOutputPending is set. FlushAllOutput() also early-exits if NewOutputPending is not set. The only place that is setting NewOutputPending is ClientReady(). The problem there is that ClientReady() is called based on an edge trigger and not a level trigger. I'll be sending a patch for that in a moment, but I have a followup question from all of this... should we even have NewOutputPending any more? Should we just check any_output_pending()? The following two patches for consideration address these points. Thanks, Jeremy > On Sep 16, 2016, at 23:16, Jeremy Huddleston Sequoia <jerem...@apple.com> > wrote: > > I bisected the issue to: > > f993091e7db81b0420e23c485378cba112278839 is the first bad commit > commit f993091e7db81b0420e23c485378cba112278839 > Author: Keith Packard <kei...@keithp.com> > Date: Thu May 26 10:40:44 2016 -0700 > > os: Switch server to poll(2) [v3] > > Eliminates all of the fd_set mangling in the server main thread > > v2: Listen for POLLOUT while writes are blocked. > > v3: Only mark client not ready on EAGAIN return from read > > Signed-off-by: Keith Packard <kei...@keithp.com> > Reviewed-by: Adam Jackson <a...@redhat.com> > > --- > > I'll see if I can dig into it a bit later, but my time is a bit thin. If > anyone else has a hunch as to what might be going on to narrow my search, I'd > appreciate it. If not, I'll probably start by taking a closer look at this > particular change to understand what's going wrong. > > Thanks, > Jeremy > >> On Sep 11, 2016, at 16:54, Jeremy Huddleston Sequoia <jerem...@apple.com> >> wrote: >> >> Upon a bit more digging, it looks like the clients are listed in >> output_pending_clients, but it's not getting delivered. >> >> This situation reveals another issue with a use-after-free during >> CloseDownClient() in which a client will get re-added to the >> output_pending_clients during FlushClient() as part of CloseDownConnection(). >> >> See https://bugs.freedesktop.org/show_bug.cgi?id=97770 for more details on >> that. >> >>> On Sep 11, 2016, at 12:24, Jeremy Huddleston Sequoia <jerem...@apple.com> >>> wrote: >>> >>> Using current master plus the various patches I submitted to the list last >>> night, I'm able to launch the server fairly reliably under ASan again, but >>> some clients are getting wedged. Most notably, xterm gets stuck waiting >>> for a reply from XLoadQueryFont(). Looking at the state of the server >>> threads, it looks fine. It's processing other requests just fine from >>> other clients. >>> >>> Has anyone noticed anything odd like this or have some hunch as to where I >>> might start looking other than to do yet another bisect? =/ >>> >>> Thread 0x344e96 DispatchQueue 1 1000 samples (1-1000) >>> priority 31 (base 31) >>> 1000 start + 52 (xterm + 6312) [0x10d3bf8a8] >>> 1000 main + 3752 (xterm + 146228) [0x10d3e1b34] >>> 1000 spawnXTerm + 977 (xterm + 148776) [0x10d3e2528] >>> 1000 VTInit + 22 (xterm + 68248) [0x10d3cea98] >>> 1000 XtRealizeWidget + 135 (libXt.6.dylib + 84572) [0x10d4ffa5c] >>> 1000 RealizeWidget + 871 (libXt.6.dylib + 85805) [0x10d4fff2d] >>> 1000 RealizeWidget + 365 (libXt.6.dylib + 85299) [0x10d4ffd33] >>> 1000 VTRealize + 324 (xterm + 82404) [0x10d3d21e4] >>> 1000 SetVTFont + 335 (xterm + 121728) [0x10d3dbb80] >>> 1000 xtermLoadFont + 514 (xterm + 114081) [0x10d3d9da1] >>> 1000 xtermOpenFont + 86 (xterm + 112342) [0x10d3d96d6] >>> 1000 XLoadQueryFont + 311 (libX11.6.dylib + 29419) >>> [0x10d54f2eb] >>> 1000 _XQueryFont + 163 (libX11.6.dylib + 32127) >>> [0x10d54fd7f] >>> 1000 _XReply + 279 (libX11.6.dylib + 143484) >>> [0x10d56b07c] >>> 1000 xcb_wait_for_reply + 103 (libxcb.1.dylib + >>> 8247) [0x10d711037] >>> 1000 wait_for_reply + 251 (libxcb.1.dylib + >>> 8521) [0x10d711149] >>> 1000 _xcb_conn_wait + 466 (libxcb.1.dylib + >>> 4166) [0x10d710046] >>> 1000 _xcb_in_read + 1051 (libxcb.1.dylib >>> + 12750) [0x10d7121ce] >>> 1000 __select + 10 >>> (libsystem_kernel.dylib + 106318) [0x7fffc5e2df4e] >>> *1000 _sleep_continue + 0 >>> (kernel.development + 7459248) [0xffffff800091d1b0] >>> >>> >> >> _______________________________________________ >> email@example.com: X.Org development >> Archives: http://lists.x.org/archives/xorg-devel >> Info: https://lists.x.org/mailman/listinfo/xorg-devel > > _______________________________________________ > firstname.lastname@example.org: X.Org development > Archives: http://lists.x.org/archives/xorg-devel > Info: https://lists.x.org/mailman/listinfo/xorg-devel
Description: S/MIME cryptographic signature
_______________________________________________ email@example.com: X.Org development Archives: http://lists.x.org/archives/xorg-devel Info: https://lists.x.org/mailman/listinfo/xorg-devel