I ported your tests back and I have run them using 8.0.x trunk. It fails pretty 
reliably for me. It will fail this assertion: 
Assert.assertFalse(handler.hasFailed());

It's failed 4/4 times. I had to port a few classes back to test with but it 
wasn't hard. I'll see if I can track it down. Any pointers on where I should 
look?

Run 2:
    Expected diff < 500,000 but was [627399], count [2]
    Expected diff < 500,000 but was [543746], count [1304]
    
Run 3:
    Expected diff < 500,000 but was [666658], count [2]
    Expected diff < 500,000 but was [594378], count [17]
    Expected diff < 500,000 but was [543379], count [62]
    Expected diff < 500,000 but was [527235], count [524]

Run 4: 
    Expected diff < 500,000 but was [679132], count [2]
    Expected diff < 500,000 but was [503753], count [317]
    Expected diff < 500,000 but was [548883], count [380]
    Expected diff < 500,000 but was [580069], count [575]
    Expected diff < 500,000 but was [538242], count [911]
    Expected diff < 500,000 but was [1060708], count [1109]




-----Original Message-----
From: Mark Thomas [mailto:ma...@apache.org] 
Sent: Tuesday, March 28, 2017 3:47 PM
To: Tomcat Users List <users@tomcat.apache.org>
Subject: Re: Tracking down a Basic.sendBinary() issue

On 28/03/17 00:30, Robert Lewis wrote:
> Hi,
> 
> I am tracking down a fairly sporadic bug in our software that uses 
> Tomcat 8.0.38. Long story short, sometimes calls to Basic.sendBinary() 
> to a full buffer then to a small buffer (eg. 8192x3 then 444 bytes). 
> The first 8192 sends will succeed and occasionally we see the last 444 
> byte send 'fail' in a way that we never see it leave the network, 
> resulting in the client waiting for bytes and eventually timing out. 
> We notice that if we close the the connection remotely, the bytes immediately 
> get sent.
> 
> This led me to believe something was not getting flushed properly. 
> This URL indicates that there were some recent conversations about something 
> similar:
> 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftomcat
> .10.x6.nabble.com%2FTomcat-WebSocket-does-not-&data=02%7C01%7CClint.Do
> vholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb9921086ff
> 774d0d828acb3381f678e2%7C0%7C0%7C636263272334173532&sdata=6tzUErBSP8lH
> guvbjrWAhV4Xzf%2BN5wId97oKmaG%2Fd5g%3D&reserved=0
> always-send-asynchronous-messages-td5060965.html
> 
> I decided to dig further and tried to send a ping between sending 
> bytes, it seems to alleviate the problem, but still doesn't tell me what is 
> going on.
> Taking a suggestion from Mark T.  \around a 'possible race condition 
> in the web socket code', I debugged through tomcat code looking for 
> race conditions, and immediately a source file and function 
> (doWrite()) stood out, it is modifying state then calling to another 
> public function to act on that state:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Fapache%2Ftomcat%2Fblob%2FTOMCAT_8_0_0_RC10%2F&data=02%7C01%7CC
> lint.Dovholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb99
> 21086ff774d0d828acb3381f678e2%7C0%7C0%7C636263272334173532&sdata=93Hlg
> L%2FI3SmJwGnTT%2FGKqiyL%2FY1HmwT68qZcbQsR1Io%3D&reserved=0
> java/org/apache/tomcat/websocket/server/WsRemoteEndpointImplServer.jav
> a

That is quite an old version of the code. You'd be better off looking at the 
master copy which is in svn at the ASF:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Fviewvc%2Ftomcat%2Ftc8.5.x%2Ftrunk%2Fjava%2Forg%2Fapache%2Ftomcat%2Fwebsocket%2Fserver%2FWsRemoteEndpointImplServer.java%3Fview%3Dlog&data=02%7C01%7CClint.Dovholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C636263272334173532&sdata=s4ybkkpV4OSYONyTO5IjV4nJ%2B6%2FMNFl3WOU33YAa%2B0U%3D&reserved=0

> Further up, the doWrite() caller in the endpoint was moved out of a 
> sync block to prevent a deadlock (there was a specific comment around 
> this), which leads me to believe that something was calling doWrite() 
> on multiple threads, but I have not tracked that down yet.

This is part of the long story to do with trying to run on the Servlet
3.1 API (see below). For full details see:
https://na01.safelinks.protection.outlook.com/?url=http:%2F%2Fpeople.apache.org%2F~markt%2Fpresentations%2F2013-09-XX-WebSocket-on-Servlet31.pdf&data=02%7C01%7CClint.Dovholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C1%7C636263272334173532&sdata=i9Bhm2jpMqjkseKJwmiw3Bcf9UaJWNvf1SDaK1g6TPM%3D&reserved=0

starting around slide 37.

The short version is that it is possible that there are two threads trying to 
write in parallel:
- one 'write' thread responding to the poller
- one 'read' thread in application code that is trying to write a
  response to what it has just read

> Anyway, there was a recent code change on the 8.5.x series to the 
> doWrite() implementation which checks to see if it is a blocking call, 
> then sends immediately to the socket and flushes without class level 
> state. I have not tested this yet to see if it solves the issue as we 
> are tied to 8.0.x for now, but working on migrating our code to work with 
> 8.5.x.
> 
> Most of the work on the files seem to be done by Mark T. (awesome 
> work, we rely on this functionality heavily!)

Flattery always welcome ;)

> so I figured I would reach out and ask about the doWrite() change to 
> have a else block for blocking sockets. Is this intended to fix the 
> issue I am describing above?

No. The major change between 8.0.x and 8.5.x was giving up on the idea of 
building WebSocket solely on top  of the Servlet API. To cut a long story 
short, it isn't possible. This is primarily because WebSocket requires a 
connection to switch back and forth between blocking and non-blocking but 
Servlet 3.1 non-blocking I/O doesn't allow upgraded connections to do that.

The else block was the result of being able to write simpler code once we 
bypassed the Servlet API and went directly to Tomcat's low level I/O API.

> I would check the history but I cannot seem to find the source for the 
> initial commit that introduces the else block for 8.5.x.

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Fviewvc%3Fview%3Drevision%26revision%3D1662698&data=02%7C01%7CClint.Dovholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C636263272334173532&sdata=u9d9t1LhXS5T8W9BM%2Bp6dgd0x%2BGFJkqJVRsWbKgWTrI%3D&reserved=0

You might want to explore the test case I wrote for the previous report:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvn.apache.org%2Fviewvc%2Ftomcat%2Ftrunk%2Ftest%2Forg%2Fapache%2Ftomcat%2Fwebsocket%2Fserver%2FTestAsyncMessages.java%3Fview%3Dlog&data=02%7C01%7CClint.Dovholuk%40thingworx.com%7C0903ad7ab2844af4ef2a08d476133a82%7Cb9921086ff774d0d828acb3381f678e2%7C0%7C0%7C636263272334173532&sdata=41%2B0YuJQP6V%2B0Z6ryk4xPVb6E5Y5Pn5fDGtuImy%2FW%2Bk%3D&reserved=0

If you can modify that so it fails at least some of the time - ideally all the 
time - fixing it should be fairly simple.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to