Hi Dave, Thanks for sharing this. To verify that I got it wrong, can I dumb your suggestions down by summarizing them as:
- Increase the timeout after which a connection is considered unrecoverably dead (to ... how many minutes?) - After a period of inactivity that's a lot shorter than the timeout mentioned above (presumable around the existing timeout value) start generating push notifications Regards, Guus On Wed, 4 Nov 2020 at 12:48, Dave Cridland <[email protected]> wrote: > Hey all, > > We (that is, myself and others from Forward Clinical Ltd, my employer) > have been doing some extensive work to support high latency networks such > as Satellite Links, in relation to our work with UK Defence Medical > Services. Our "long thin" links cover the C2S link. > > We believe these findings are more generally useful than just SATCOM - in > particular, we think these will help with the adverse network conditions > found in hospitals (where people keep putting in lifts and lots of cables, > giving lots of blackspots), and general applicability with mobile use of > XMPP. > > TL;DR: When the session has a ping timeout, do push notifications, but > otherwise leave it open - mobile clients will often recover after several > minutes have passed. > > We assume that established sessions may be in several connectivity states > from the point of view of the server, typically: > > "Live" - a session is genuinely live and can be used for communication. > "Unresponsive" - the session has a TCP connection associated with it, but > it unresponsive to pings etc. > "Resumable" - the session has no TCP session, but 198 resumption was > negotiated and the session remains available. > > We expect that the majority of servers will immediately move a session > detected as unresponsive into the resumable state by closing the TCP > session, and starting a (relatively short) timeout. > > In the process of doing so, unacknowledged stanzas will be processed for > push notifications etc as needed, and errors will be sent as appropriate. > > Due to network analysis (and "thanks" to a bug in the server which caused > some useful logging), we were able to examine not only when sessions went > into the unresponsive state, but also when the client subsequently sent > traffic on that session. This often happened well after the session had > fallen into the resumable state - this resulted in an error, as the session > had been closed. > > Having seen the result of this in the logging of the server, we followed > up by looking for the same logging output on the production system, where > the majority of users are using WiFi or 4G within hospitals. Coverage is > often poor, and the WiFi overused, so clinicians often operate on a weak 4G > signal, or highly contented WiFi. Think FOSDEM. > > Again, we observed clients recovering sometimes well after the ping > timeout had triggered. Had these clients been able to, they could have > continued to use the same TCP session without any disruption (or, for that > matter, any additional RTTs re-establishing). > > The usual approach here seems to be to increase the timeout required to > move a session from "live" to "unresponsive" when pinged. However, this has > the effect of delaying push notifications while the session is, in effect > in limbo. > > Our proposal is that when a session is found to be unresponsive, the > server starts sending push notifications for unacknowledged (and future) > messages, but otherwise leaves the session live when resumable. Only after > a significantly longer timeout should the TCP session be terminated (and at > that point destroy the session entirely). > > This means that a client recovering network after several minutes will > find the connection still live (in effect), whereas if it never recovers, > it will still get the push notifications in a timely manner. > > There are likely to be downsides with this approach; particularly presence > state will be badly affected. PSA could help here. Overall, though, we > believe that this will substantially improve the effective performance of > C2S over high latency, high contention links. > > I hope this is useful! > > Dave. > _______________________________________________ > Standards mailing list > Info: https://mail.jabber.org/mailman/listinfo/standards > Unsubscribe: [email protected] > _______________________________________________ >
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
