Hi,

I am tracking down a fairly sporadic bug in our software that uses Tomcat
8.0.38. Long story short, sometimes calls to Basic.sendBinary() to a full
buffer then to a small buffer (eg. 8192x3 then 444 bytes). The first 8192
sends will succeed and occasionally we see the last 444 byte send 'fail' in
a way that we never see it leave the network, resulting in the client
waiting for bytes and eventually timing out. We notice that if we close the
the connection remotely, the bytes immediately get sent.

This led me to believe something was not getting flushed properly. This URL
indicates that there were some recent conversations about something similar:

http://tomcat.10.x6.nabble.com/Tomcat-WebSocket-does-not-
always-send-asynchronous-messages-td5060965.html

I decided to dig further and tried to send a ping between sending bytes, it
seems to alleviate the problem, but still doesn't tell me what is going on.
Taking a suggestion from Mark T.  \around a 'possible race condition in the
web socket code', I debugged through tomcat code looking for race
conditions, and immediately a source file and function (doWrite()) stood
out, it is modifying state then calling to another public function to act
on that state:

https://github.com/apache/tomcat/blob/TOMCAT_8_0_0_RC10/
java/org/apache/tomcat/websocket/server/WsRemoteEndpointImplServer.java

Further up, the doWrite() caller in the endpoint was moved out of a sync
block to prevent a deadlock (there was a specific comment around this),
which leads me to believe that something was calling doWrite() on multiple
threads, but I have not tracked that down yet.

Anyway, there was a recent code change on the 8.5.x series to the doWrite()
implementation which checks to see if it is a blocking call, then sends
immediately to the socket and flushes without class level state. I have not
tested this yet to see if it solves the issue as we are tied to 8.0.x for
now, but working on migrating our code to work with 8.5.x.

Most of the work on the files seem to be done by Mark T. (awesome work, we
rely on this functionality heavily!) so I figured I would reach out and ask
about the doWrite() change to have a else block for blocking sockets. Is
this intended to fix the issue I am describing above?

I would check the history but I cannot seem to find the source for the
initial commit that introduces the else block for 8.5.x.

Thanks,
  -Rob

Reply via email to