I tracked it down to _XSERVTransWritev returning -1 with errno=EAGAIN within 
FlushClient().  I was able to workaround the issue by immediately retrying 
(usually once, sometimes twice).

The problem is with the existing EAGAIN error handling.  FlushClient() re-adds 
the client to the pending list and returns, but FlushClient() doesn't get 
called again.  We would expect it to be called via FlushAllOutput() via 
WaitForSomething(), but that only happens when NewOutputPending is set.  
FlushAllOutput() also early-exits if NewOutputPending is not set.

The only place that is setting NewOutputPending is ClientReady().  The problem 
there is that ClientReady() is called based on an edge trigger and not a level 
trigger.  I'll be sending a patch for that in a moment, but I have a followup 
question from all of this...  should we even have NewOutputPending any more?  
Should we just check any_output_pending()?

The following two patches for consideration address these points.

Thanks,
Jeremy


> On Sep 16, 2016, at 23:16, Jeremy Huddleston Sequoia <jerem...@apple.com> 
> wrote:
> 
> I bisected the issue to:
> 
> f993091e7db81b0420e23c485378cba112278839 is the first bad commit
> commit f993091e7db81b0420e23c485378cba112278839
> Author: Keith Packard <kei...@keithp.com>
> Date:   Thu May 26 10:40:44 2016 -0700
> 
>    os: Switch server to poll(2) [v3]
> 
>    Eliminates all of the fd_set mangling in the server main thread
> 
>    v2: Listen for POLLOUT while writes are blocked.
> 
>    v3: Only mark client not ready on EAGAIN return from read
> 
>    Signed-off-by: Keith Packard <kei...@keithp.com>
>    Reviewed-by: Adam Jackson <a...@redhat.com>
> 
> ---
> 
> I'll see if I can dig into it a bit later, but my time is a bit thin.  If 
> anyone else has a hunch as to what might be going on to narrow my search, I'd 
> appreciate it.  If not, I'll probably start by taking a closer look at this 
> particular change to understand what's going wrong.
> 
> Thanks,
> Jeremy
> 
>> On Sep 11, 2016, at 16:54, Jeremy Huddleston Sequoia <jerem...@apple.com> 
>> wrote:
>> 
>> Upon a bit more digging, it looks like the clients are listed in 
>> output_pending_clients, but it's not getting delivered.
>> 
>> This situation reveals another issue with a use-after-free during 
>> CloseDownClient() in which a client will get re-added to the 
>> output_pending_clients during FlushClient() as part of CloseDownConnection().
>> 
>> See https://bugs.freedesktop.org/show_bug.cgi?id=97770 for more details on 
>> that.
>> 
>>> On Sep 11, 2016, at 12:24, Jeremy Huddleston Sequoia <jerem...@apple.com> 
>>> wrote:
>>> 
>>> Using current master plus the various patches I submitted to the list last 
>>> night, I'm able to launch the server fairly reliably under ASan again, but 
>>> some clients are getting wedged.  Most notably, xterm gets stuck waiting 
>>> for a reply from XLoadQueryFont().  Looking at the state of the server 
>>> threads, it looks fine.  It's processing other requests just fine from 
>>> other clients.
>>> 
>>> Has anyone noticed anything odd like this or have some hunch as to where I 
>>> might start looking other than to do yet another bisect? =/
>>> 
>>> Thread 0x344e96           DispatchQueue 1           1000 samples (1-1000)   
>>>   priority 31 (base 31)
>>> 1000  start + 52 (xterm + 6312) [0x10d3bf8a8]
>>>  1000  main + 3752 (xterm + 146228) [0x10d3e1b34]
>>>    1000  spawnXTerm + 977 (xterm + 148776) [0x10d3e2528]
>>>      1000  VTInit + 22 (xterm + 68248) [0x10d3cea98]
>>>        1000  XtRealizeWidget + 135 (libXt.6.dylib + 84572) [0x10d4ffa5c]
>>>          1000  RealizeWidget + 871 (libXt.6.dylib + 85805) [0x10d4fff2d]
>>>            1000  RealizeWidget + 365 (libXt.6.dylib + 85299) [0x10d4ffd33]
>>>              1000  VTRealize + 324 (xterm + 82404) [0x10d3d21e4]
>>>                1000  SetVTFont + 335 (xterm + 121728) [0x10d3dbb80]
>>>                  1000  xtermLoadFont + 514 (xterm + 114081) [0x10d3d9da1]
>>>                    1000  xtermOpenFont + 86 (xterm + 112342) [0x10d3d96d6]
>>>                      1000  XLoadQueryFont + 311 (libX11.6.dylib + 29419) 
>>> [0x10d54f2eb]
>>>                        1000  _XQueryFont + 163 (libX11.6.dylib + 32127) 
>>> [0x10d54fd7f]
>>>                          1000  _XReply + 279 (libX11.6.dylib + 143484) 
>>> [0x10d56b07c]
>>>                            1000  xcb_wait_for_reply + 103 (libxcb.1.dylib + 
>>> 8247) [0x10d711037]
>>>                              1000  wait_for_reply + 251 (libxcb.1.dylib + 
>>> 8521) [0x10d711149]
>>>                                1000  _xcb_conn_wait + 466 (libxcb.1.dylib + 
>>> 4166) [0x10d710046]
>>>                                  1000  _xcb_in_read + 1051 (libxcb.1.dylib 
>>> + 12750) [0x10d7121ce]
>>>                                    1000  __select + 10 
>>> (libsystem_kernel.dylib + 106318) [0x7fffc5e2df4e]
>>>                                     *1000  _sleep_continue + 0 
>>> (kernel.development + 7459248) [0xffffff800091d1b0]
>>> 
>>> 
>> 
>> _______________________________________________
>> xorg-devel@lists.x.org: X.Org development
>> Archives: http://lists.x.org/archives/xorg-devel
>> Info: https://lists.x.org/mailman/listinfo/xorg-devel
> 
> _______________________________________________
> xorg-devel@lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: https://lists.x.org/mailman/listinfo/xorg-devel

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to