2016-06-16 4:09 GMT+02:00 Ihsan Ecemis <[email protected]>: > > Thanks for your answer Romain. I would appreciate any links to inspire me > how to do that, especially for Twilio API calls. I don’t know how to use > sessionIds to track connections. > > Idea was to use websocket ot http session, I don't know if you activated it (atmosphere can need session support for long polling).
> > Here are 2 new pieces of information: > > (1) WebSocket connections were shaky for some users and Atmosphere > javascript client had a failover to long-polling. > > So yesterday, we disabled long-polling failover and since then we didn't > have any errors from Twilio. Probably it is too early to claim victory but > I think it is an interesting observation. > > If we don’t see any other errors, I will think that the client was > switching from websocket to long-polling, and then getting disconnected. > The server (Atmosphere?) didn’t realize this disconnect and kept on writing > to the same connection. And at some point that same connection was handed > over to the Twilio API call… (which should not happen! must be prevented > by Tomcat? nginx? CXF? Atmosphere?) > > I don’t know how the Socket Connection Pool works at low-level so this is > only my speculation, both from my observation there is something really > wrong here. > > > (2) I looked at Twilio error messages more carefully, and each error I > looked so far contained WebSockets messages addressed to the same user > (This is very manual process, I check our logs for each task_id posted to > Twilio in JSON). > > In one case there were 5 WebSockets messages addressed to the same user, > prepended to Twilio XML. These 5 messages spanned 576 seconds, i.e. > almost 10 minutes! (Each WebSockets messages has a timestamp). So we > kept on writing to the same connection for 10 minutes after the WebSocket > was disconnected ;-( > > Can it be a bad mobile connection? > > > > On Jun 15, 2016, at 7:26 PM, Romain Manni-Bucau <[email protected]> > wrote: > > > > Is debugging using the sessionId to track connections an option? > > Le 16 juin 2016 01:12, "Ihsan Ecemis" <[email protected]> a écrit : > > > >> > >> TL;DR: We are having this weird, non-reproducable problem in a complex > >> system with lots of moving parts. I posted this question to Atmosphere > >> mailing list but didn’t get useful answers. I would appreciate if > anyone > >> can tell me whether there is any chance what we observe can be a bug > with > >> Tomcat/TomEE. Any other help/tips would be greatly appreciated. Thanks > >> > >> > >> Problem: > >> > >> We experience some weird problems with WebSockets under Atmosphere, > >> running under TomEE 7.0.0 — the whole stack is listed below. > >> > >> > >> One of our JAX-RS API endpoints (something like /twilio/xxxx) is called > by > >> Twilio. Twilio expects the following XML response: > >> > >> <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response/> > >> > >> That system was working fine for months. We were receiving Text > Messages > >> with no errors from Twilio. > >> > >> > >> Then, we added WebSockets at a different endpoint (something like > >> /ws/yyyy). Using our AngularJS app, our users started communicating > with > >> our backend in JSON. A sample message our backend wrote to a WebSocket > >> client: > >> > >> > >> > {"thread_id":"Server-2702","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1126,"task_id":10134},"time":1465409434004} > >> > >> > >> > >> After a while, Twilio started sending us "Schema validation warning” > >> emails. When I went to Twilio Dashboard, I saw that they received the > >> following response to their API call: > >> > >> > {"thread_id":”Server-1732","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id”:734,"task_id”:9345},"time": > >> 1465409122781}<?xml version="1.0" encoding="UTF-8" > >> standalone="yes"?><Response/> > >> > >> > >> So Twilio received a message that was meant to be sent to a WebSocket > >> user, prepended to the usual Twilio response! This looks crazy, and not > >> unacceptable of course. > >> > >> This happens a few times, sometimes a dozen times a day. We put > together > >> a test system but could not reproduce the problem. (Our test system was > not > >> exactly the same system as our production in many respects > unfortunately) > >> > >> Sometimes Twilio receives more than one WebSocket message, again an > actual > >> example (redacted just a little for brevity): > >> > >> > {"thread_id":"Server-2722","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1966,"task_id":11112},"time":1465409571004}{"thread_id":"Server-2723","session_id":null,"command":"new_caleasy_item","data":{"text_message_id":1453,"text_message_created_at":null,"meal_type_id":6,"body":"Edamame","report_date":"2016-06-03","caleasy_media_list":[],"task_id":11114,"expiration_millis":180000},"time":1465409582010}<?xml > >> version="1.0" encoding="UTF-8" standalone="yes"?><Response/> > >> > >> > >> In this example, our Backend wrote 2 WebSocket messages, probably from 2 > >> separate threads, intended for 2 separate WebSocket users, and then it > >> wrote the response to Twilio. It looks like we are writing to a > connection > >> nobody is reading from, and appending more messages to it, and then the > >> whole thing is returned to Twilio when they make the API call… > >> > >> > >> I don’t understand how this could happening. In nginx? (I really doubt > >> nginx can have such a huge bug) In Tomcat? A problem between nginx > and > >> Tomcat that only affects WebSockets when regular API calls are also > used? > >> I have no idea and I don’t know where to start to investigate ;-( > >> > >> > >> Here is our stack: > >> > >> *) AWS Linux 4.1.10-17.31, TomEE 7.0.0, nginx as proxy server on the > same > >> AWS instance > >> *) SSL ends on nginx. nginx is configured for WebSockets following > online > >> instructions. > >> *) We serve JAX-RS API calls via Apache CXF (version 3.1.6), using > >> standard @Path("/....") annotations. > >> *) We have just one WebSocket service using Atmosphere (version 2.4.2), > >> defined as @ManagedService(path = "/caleasy”) > >> *) WebSocket service is used by our AngularJS based UI > >> *) No other specific setup with CXF and Atmosphere > >> > >> In our web.xml, we configure Atmosphere as follows: > >> > >> <servlet> > >> <description>AtmosphereServlet</description> > >> <servlet-name>AtmosphereServlet</servlet-name> > >> <servlet-class>org.atmosphere.cpr.AtmosphereServlet</servlet-class> > >> </servlet> > >> > >> <servlet-mapping> > >> <servlet-name>AtmosphereServlet</servlet-name> > >> <url-pattern>/ws/*</url-pattern> > >> </servlet-mapping> > >> > >> <listener> > >> <listener-class>org.atmosphere.cpr.SessionSupport</listener-class> > >> </listener> > >> > >> > >> > >> PS: Our TomEE logs have the following Exceptions, but I am not sure > >> whether they are normal or suspicious: > >> > >> 2016-06-09 13:52:41,832 ERROR [nio-8080-exec-9] o.a.c.JSR356Endpoint > >> - > >> java.io.EOFException: null > >> at > >> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1267) > >> at > >> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.isReadyForRead(NioEndpoint.java:1176) > >> at > >> > org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:58) > >> at > >> > org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:148) > >> at > >> > org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54) > >> at > >> > org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) > >> at > >> > org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:788) > >> at > >> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1485) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >> at > >> > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > >> at java.lang.Thread.run(Thread.java:745) > >> > >> > >> > >> > >
