Hmm, you should have onDisconnect() or onThrowable() hook called. Question is: is it an atmosphere bug or was tomcat missing an event (there was a rare case where onClose() of websocket can be missed but atmosphere should protect itself against it wrapping the calls to ensure it detects it lazily too.
Romain Manni-Bucau @rmannibucau <https://twitter.com/rmannibucau> | Blog <https://blog-rmannibucau.rhcloud.com> | Old Wordpress Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | Tomitriber <http://www.tomitribe.com> | JavaEE Factory <https://javaeefactory-rmannibucau.rhcloud.com> 2016-06-16 9:05 GMT+02:00 Ihsan Ecemis <[email protected]>: > > > On Jun 16, 2016, at 2:37 AM, Romain Manni-Bucau <[email protected]> > wrote: > > > > 2016-06-16 4:09 GMT+02:00 Ihsan Ecemis <[email protected] <mailto: > [email protected]>>: > > > >> > >> Thanks for your answer Romain. I would appreciate any links to inspire > me > >> how to do that, especially for Twilio API calls. I don’t know how to > use > >> sessionIds to track connections. > >> > >> > > Idea was to use websocket ot http session, I don't know if you activated > it > > (atmosphere can need session support for long polling). > > We have this in our web.xml: > > <listener> > <listener-class>org.atmosphere.cpr.SessionSupport</listener-class> > </listener> > > <context-param> > <param-name>org.atmosphere.cpr.sessionSupport</param-name> > <param-value>true</param-value> > </context-param> > > as documented here: > https://github.com/Atmosphere/atmosphere/wiki/Enabling-HttpSession-Support > < > https://github.com/Atmosphere/atmosphere/wiki/Enabling-HttpSession-Support > > > > And we can get sessionId with AtmosphereResource.session(). > > But apparently since Atmosphere doesn’t notify us about the disconnect we > can still write to the underlying resource, which is not cleared when > another producer starts using it. > > BTW, we are using @ManagedService ( > https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-the-ManagedService-Annotation#managedservice > < > https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-the-ManagedService-Annotation#managedservice>) > which comes with HeartbeatInterceptor, so I don’t understand how the Server > doesn’t detect the disconnect for 10 minutes... > > > >> > >> Here are 2 new pieces of information: > >> > >> (1) WebSocket connections were shaky for some users and Atmosphere > >> javascript client had a failover to long-polling. > >> > >> So yesterday, we disabled long-polling failover and since then we didn't > >> have any errors from Twilio. Probably it is too early to claim victory > but > >> I think it is an interesting observation. > >> > >> If we don’t see any other errors, I will think that the client was > >> switching from websocket to long-polling, and then getting disconnected. > >> The server (Atmosphere?) didn’t realize this disconnect and kept on > writing > >> to the same connection. And at some point that same connection was > handed > >> over to the Twilio API call… (which should not happen! must be > prevented > >> by Tomcat? nginx? CXF? Atmosphere?) > >> > >> I don’t know how the Socket Connection Pool works at low-level so this > is > >> only my speculation, both from my observation there is something really > >> wrong here. > >> > >> > >> (2) I looked at Twilio error messages more carefully, and each error I > >> looked so far contained WebSockets messages addressed to the same user > >> (This is very manual process, I check our logs for each task_id posted > to > >> Twilio in JSON). > >> > >> In one case there were 5 WebSockets messages addressed to the same user, > >> prepended to Twilio XML. These 5 messages spanned 576 seconds, i.e. > >> almost 10 minutes! (Each WebSockets messages has a timestamp). So we > >> kept on writing to the same connection for 10 minutes after the > WebSocket > >> was disconnected ;-( > >> > >> > > Can it be a bad mobile connection? > > The user who had lots of disconnects the other day was working in a dorm > room over WiFi. Not mobile, but not great connection either. > > Maybe the user was continuously connecting and disconnecting, and > Atmosphere was trying to keep the underlying resource alive? > > > > > >> > >> > >>> On Jun 15, 2016, at 7:26 PM, Romain Manni-Bucau <[email protected] > > > >> wrote: > >>> > >>> Is debugging using the sessionId to track connections an option? > >>> Le 16 juin 2016 01:12, "Ihsan Ecemis" <[email protected]> a écrit : > >>> > >>>> > >>>> TL;DR: We are having this weird, non-reproducable problem in a > complex > >>>> system with lots of moving parts. I posted this question to Atmosphere > >>>> mailing list but didn’t get useful answers. I would appreciate if > >> anyone > >>>> can tell me whether there is any chance what we observe can be a bug > >> with > >>>> Tomcat/TomEE. Any other help/tips would be greatly appreciated. > Thanks > >>>> > >>>> > >>>> Problem: > >>>> > >>>> We experience some weird problems with WebSockets under Atmosphere, > >>>> running under TomEE 7.0.0 — the whole stack is listed below. > >>>> > >>>> > >>>> One of our JAX-RS API endpoints (something like /twilio/xxxx) is > called > >> by > >>>> Twilio. Twilio expects the following XML response: > >>>> > >>>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response/> > >>>> > >>>> That system was working fine for months. We were receiving Text > >> Messages > >>>> with no errors from Twilio. > >>>> > >>>> > >>>> Then, we added WebSockets at a different endpoint (something like > >>>> /ws/yyyy). Using our AngularJS app, our users started communicating > >> with > >>>> our backend in JSON. A sample message our backend wrote to a > WebSocket > >>>> client: > >>>> > >>>> > >>>> > >> > {"thread_id":"Server-2702","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1126,"task_id":10134},"time":1465409434004} > >>>> > >>>> > >>>> > >>>> After a while, Twilio started sending us "Schema validation warning” > >>>> emails. When I went to Twilio Dashboard, I saw that they received > the > >>>> following response to their API call: > >>>> > >>>> > >> > {"thread_id":”Server-1732","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id”:734,"task_id”:9345},"time": > >>>> 1465409122781}<?xml version="1.0" encoding="UTF-8" > >>>> standalone="yes"?><Response/> > >>>> > >>>> > >>>> So Twilio received a message that was meant to be sent to a WebSocket > >>>> user, prepended to the usual Twilio response! This looks crazy, and > not > >>>> unacceptable of course. > >>>> > >>>> This happens a few times, sometimes a dozen times a day. We put > >> together > >>>> a test system but could not reproduce the problem. (Our test system > was > >> not > >>>> exactly the same system as our production in many respects > >> unfortunately) > >>>> > >>>> Sometimes Twilio receives more than one WebSocket message, again an > >> actual > >>>> example (redacted just a little for brevity): > >>>> > >>>> > >> > {"thread_id":"Server-2722","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1966,"task_id":11112},"time":1465409571004}{"thread_id":"Server-2723","session_id":null,"command":"new_caleasy_item","data":{"text_message_id":1453,"text_message_created_at":null,"meal_type_id":6,"body":"Edamame","report_date":"2016-06-03","caleasy_media_list":[],"task_id":11114,"expiration_millis":180000},"time":1465409582010}<?xml > >>>> version="1.0" encoding="UTF-8" standalone="yes"?><Response/> > >>>> > >>>> > >>>> In this example, our Backend wrote 2 WebSocket messages, probably > from 2 > >>>> separate threads, intended for 2 separate WebSocket users, and then it > >>>> wrote the response to Twilio. It looks like we are writing to a > >> connection > >>>> nobody is reading from, and appending more messages to it, and then > the > >>>> whole thing is returned to Twilio when they make the API call… > >>>> > >>>> > >>>> I don’t understand how this could happening. In nginx? (I really > doubt > >>>> nginx can have such a huge bug) In Tomcat? A problem between nginx > >> and > >>>> Tomcat that only affects WebSockets when regular API calls are also > >> used? > >>>> I have no idea and I don’t know where to start to investigate ;-( > >>>> > >>>> > >>>> Here is our stack: > >>>> > >>>> *) AWS Linux 4.1.10-17.31, TomEE 7.0.0, nginx as proxy server on the > >> same > >>>> AWS instance > >>>> *) SSL ends on nginx. nginx is configured for WebSockets following > >> online > >>>> instructions. > >>>> *) We serve JAX-RS API calls via Apache CXF (version 3.1.6), using > >>>> standard @Path("/....") annotations. > >>>> *) We have just one WebSocket service using Atmosphere (version > 2.4.2), > >>>> defined as @ManagedService(path = "/caleasy”) > >>>> *) WebSocket service is used by our AngularJS based UI > >>>> *) No other specific setup with CXF and Atmosphere > >>>> > >>>> In our web.xml, we configure Atmosphere as follows: > >>>> > >>>> <servlet> > >>>> <description>AtmosphereServlet</description> > >>>> <servlet-name>AtmosphereServlet</servlet-name> > >>>> <servlet-class>org.atmosphere.cpr.AtmosphereServlet</servlet-class> > >>>> </servlet> > >>>> > >>>> <servlet-mapping> > >>>> <servlet-name>AtmosphereServlet</servlet-name> > >>>> <url-pattern>/ws/*</url-pattern> > >>>> </servlet-mapping> > >>>> > >>>> <listener> > >>>> <listener-class>org.atmosphere.cpr.SessionSupport</listener-class> > >>>> </listener> > >>>> > >>>> > >>>> > >>>> PS: Our TomEE logs have the following Exceptions, but I am not sure > >>>> whether they are normal or suspicious: > >>>> > >>>> 2016-06-09 13:52:41,832 ERROR [nio-8080-exec-9] o.a.c.JSR356Endpoint > >>>> - > >>>> java.io.EOFException: null > >>>> at > >>>> > >> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1267) > >>>> at > >>>> > >> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.isReadyForRead(NioEndpoint.java:1176) > >>>> at > >>>> > >> > org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:58) > >>>> at > >>>> > >> > org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:148) > >>>> at > >>>> > >> > org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54) > >>>> at > >>>> > >> > org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) > >>>> at > >>>> > >> > org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:788) > >>>> at > >>>> > >> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1485) > >>>> at > >>>> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >>>> at > >>>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >>>> at > >>>> > >> > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > >>>> at java.lang.Thread.run(Thread.java:745) > >
