Thanks Mark. I will take a look at the test you linked in (seems like Clint already is).
I have a question regarding your previous note "The short version is that it is possible that there are two threads". On 8.0.38, doWrite() sets it's scoped handler and buffers to the class level instance, then proceeds to call onWritePossible(true). onWritePossible creates a new local variable that seems like it copies the class level buffers state (not sure if this is a deep clone or not, i'll have to test this), but it does not replicate the handler reference. If onWritePossible() is busy working writing the buffers to the socket and another thread calls doWrite(), that class level state will be swapped before onWritePossible is finished, resulting in a possible race condition for a swapped out buffer or handler. Not claiming to know the code as well as those who maintain it, but it would be nice to know if someone thinks this could happen resulting in unexpected behavior (eg. we are writing to a socket and the the read thread responds with a write on another thread). Based on the link I posted before ( http://tomcat.10.x6.nabble.com/Tomcat-WebSocket-does-not-always-send-asynchronous-messages-td5060965.html), Harri mentions that in Tomcat 8.5, blocking sockets do not seem to exhibit the same failure behavior as async sockets, which leads me to believe that the else block (scoped state, write to socket, flush) fixes the problem, and async may still have a race given class level state across two functions that are not synchronized. We are using 8.0.38 and are seeing async and sync cases not flushing in rare situations. Curious why onWritePossible(true) is not called with the doWrite() state? (eg. onWritePossible(true, buffers, handler))? This is just an observation at this point so looking for an opinion on whether or not something described above could happen. If what was stated (that a write and a read / response write) can operate on two threads, I don't see why not. Continuing to run off and play with the tests you have linked. Thanks, -Rob On Tue, Mar 28, 2017 at 3:46 PM, Mark Thomas <ma...@apache.org> wrote: > On 28/03/17 00:30, Robert Lewis wrote: > > Hi, > > > > I am tracking down a fairly sporadic bug in our software that uses Tomcat > > 8.0.38. Long story short, sometimes calls to Basic.sendBinary() to a full > > buffer then to a small buffer (eg. 8192x3 then 444 bytes). The first 8192 > > sends will succeed and occasionally we see the last 444 byte send 'fail' > in > > a way that we never see it leave the network, resulting in the client > > waiting for bytes and eventually timing out. We notice that if we close > the > > the connection remotely, the bytes immediately get sent. > > > > This led me to believe something was not getting flushed properly. This > URL > > indicates that there were some recent conversations about something > similar: > > > > http://tomcat.10.x6.nabble.com/Tomcat-WebSocket-does-not- > > always-send-asynchronous-messages-td5060965.html > > > > I decided to dig further and tried to send a ping between sending bytes, > it > > seems to alleviate the problem, but still doesn't tell me what is going > on. > > Taking a suggestion from Mark T. \around a 'possible race condition in > the > > web socket code', I debugged through tomcat code looking for race > > conditions, and immediately a source file and function (doWrite()) stood > > out, it is modifying state then calling to another public function to act > > on that state: > > > > https://github.com/apache/tomcat/blob/TOMCAT_8_0_0_RC10/ > > java/org/apache/tomcat/websocket/server/WsRemoteEndpointImplServer.java > > That is quite an old version of the code. You'd be better off looking at > the master copy which is in svn at the ASF: > > https://svn.apache.org/viewvc/tomcat/tc8.5.x/trunk/java/org/ > apache/tomcat/websocket/server/WsRemoteEndpointImplServer.java?view=log > > > Further up, the doWrite() caller in the endpoint was moved out of a sync > > block to prevent a deadlock (there was a specific comment around this), > > which leads me to believe that something was calling doWrite() on > multiple > > threads, but I have not tracked that down yet. > > This is part of the long story to do with trying to run on the Servlet > 3.1 API (see below). For full details see: > http://people.apache.org/~markt/presentations/2013-09- > XX-WebSocket-on-Servlet31.pdf > > starting around slide 37. > > The short version is that it is possible that there are two threads > trying to write in parallel: > - one 'write' thread responding to the poller > - one 'read' thread in application code that is trying to write a > response to what it has just read > > > Anyway, there was a recent code change on the 8.5.x series to the > doWrite() > > implementation which checks to see if it is a blocking call, then sends > > immediately to the socket and flushes without class level state. I have > not > > tested this yet to see if it solves the issue as we are tied to 8.0.x for > > now, but working on migrating our code to work with 8.5.x. > > > > Most of the work on the files seem to be done by Mark T. (awesome work, > we > > rely on this functionality heavily!) > > Flattery always welcome ;) > > > so I figured I would reach out and ask > > about the doWrite() change to have a else block for blocking sockets. Is > > this intended to fix the issue I am describing above? > > No. The major change between 8.0.x and 8.5.x was giving up on the idea > of building WebSocket solely on top of the Servlet API. To cut a long > story short, it isn't possible. This is primarily because WebSocket > requires a connection to switch back and forth between blocking and > non-blocking but Servlet 3.1 non-blocking I/O doesn't allow upgraded > connections to do that. > > The else block was the result of being able to write simpler code once > we bypassed the Servlet API and went directly to Tomcat's low level I/O > API. > > > I would check the history but I cannot seem to find the source for the > > initial commit that introduces the else block for 8.5.x. > > https://svn.apache.org/viewvc?view=revision&revision=1662698 > > You might want to explore the test case I wrote for the previous report: > https://svn.apache.org/viewvc/tomcat/trunk/test/org/apache/ > tomcat/websocket/server/TestAsyncMessages.java?view=log > > If you can modify that so it fails at least some of the time - ideally > all the time - fixing it should be fairly simple. > > Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >