TL;DR: We are having this weird, non-reproducable problem in a complex system
with lots of moving parts. I posted this question to Atmosphere mailing list
but didn’t get useful answers. I would appreciate if anyone can tell me
whether there is any chance what we observe can be a bug with Tomcat/TomEE.
Any other help/tips would be greatly appreciated. Thanks
Problem:
We experience some weird problems with WebSockets under Atmosphere, running
under TomEE 7.0.0 — the whole stack is listed below.
One of our JAX-RS API endpoints (something like /twilio/xxxx) is called by
Twilio. Twilio expects the following XML response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
That system was working fine for months. We were receiving Text Messages with
no errors from Twilio.
Then, we added WebSockets at a different endpoint (something like /ws/yyyy).
Using our AngularJS app, our users started communicating with our backend in
JSON. A sample message our backend wrote to a WebSocket client:
{"thread_id":"Server-2702","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1126,"task_id":10134},"time":1465409434004}
After a while, Twilio started sending us "Schema validation warning” emails.
When I went to Twilio Dashboard, I saw that they received the following
response to their API call:
{"thread_id":”Server-1732","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id”:734,"task_id”:9345},"time":
1465409122781}<?xml version="1.0" encoding="UTF-8"
standalone="yes"?><Response/>
So Twilio received a message that was meant to be sent to a WebSocket user,
prepended to the usual Twilio response! This looks crazy, and not unacceptable
of course.
This happens a few times, sometimes a dozen times a day. We put together a
test system but could not reproduce the problem. (Our test system was not
exactly the same system as our production in many respects unfortunately)
Sometimes Twilio receives more than one WebSocket message, again an actual
example (redacted just a little for brevity):
{"thread_id":"Server-2722","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1966,"task_id":11112},"time":1465409571004}{"thread_id":"Server-2723","session_id":null,"command":"new_caleasy_item","data":{"text_message_id":1453,"text_message_created_at":null,"meal_type_id":6,"body":"Edamame","report_date":"2016-06-03","caleasy_media_list":[],"task_id":11114,"expiration_millis":180000},"time":1465409582010}<?xml
version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
In this example, our Backend wrote 2 WebSocket messages, probably from 2
separate threads, intended for 2 separate WebSocket users, and then it wrote
the response to Twilio. It looks like we are writing to a connection nobody is
reading from, and appending more messages to it, and then the whole thing is
returned to Twilio when they make the API call…
I don’t understand how this could happening. In nginx? (I really doubt nginx
can have such a huge bug) In Tomcat? A problem between nginx and Tomcat
that only affects WebSockets when regular API calls are also used? I have no
idea and I don’t know where to start to investigate ;-(
Here is our stack:
*) AWS Linux 4.1.10-17.31, TomEE 7.0.0, nginx as proxy server on the same AWS
instance
*) SSL ends on nginx. nginx is configured for WebSockets following online
instructions.
*) We serve JAX-RS API calls via Apache CXF (version 3.1.6), using standard
@Path("/....") annotations.
*) We have just one WebSocket service using Atmosphere (version 2.4.2), defined
as @ManagedService(path = "/caleasy”)
*) WebSocket service is used by our AngularJS based UI
*) No other specific setup with CXF and Atmosphere
In our web.xml, we configure Atmosphere as follows:
<servlet>
<description>AtmosphereServlet</description>
<servlet-name>AtmosphereServlet</servlet-name>
<servlet-class>org.atmosphere.cpr.AtmosphereServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>AtmosphereServlet</servlet-name>
<url-pattern>/ws/*</url-pattern>
</servlet-mapping>
<listener>
<listener-class>org.atmosphere.cpr.SessionSupport</listener-class>
</listener>
PS: Our TomEE logs have the following Exceptions, but I am not sure whether
they are normal or suspicious:
2016-06-09 13:52:41,832 ERROR [nio-8080-exec-9] o.a.c.JSR356Endpoint -
java.io.EOFException: null
at
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1267)
at
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.isReadyForRead(NioEndpoint.java:1176)
at
org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:58)
at
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:148)
at
org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54)
at
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53)
at
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:788)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1485)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)