Am Mittwoch, den 04.11.2020, 11:46 +0000 schrieb Dave Cridland: > > Due to network analysis (and "thanks" to a bug in the server which > caused some useful logging), we were able to examine not only when > sessions went into the unresponsive state, but also when the client > subsequently sent traffic on that session. This often happened well > after the session had fallen into the resumable state - this resulted > in an error, as the session had been closed. > > Having seen the result of this in the logging of the server, we > followed up by looking for the same logging output on the production > system, where the majority of users are using WiFi or 4G within > hospitals. Coverage is often poor, and the WiFi overused, so > clinicians often operate on a weak 4G signal, or highly contented > WiFi. Think FOSDEM. > > Again, we observed clients recovering sometimes well after the ping > timeout had triggered. Had these clients been able to, they could > have continued to use the same TCP session without any disruption > (or, for that matter, any additional RTTs re-establishing). > > The usual approach here seems to be to increase the timeout required > to move a session from "live" to "unresponsive" when pinged. However, > this has the effect of delaying push notifications while the session > is, in effect in limbo. > > Our proposal is that when a session is found to be unresponsive, the > server starts sending push notifications for unacknowledged (and > future) messages, but otherwise leaves the session live when > resumable. Only after a significantly longer timeout should the TCP > session be terminated (and at that point destroy the session > entirely). >
Matches my observations [1] as well. If the session is not too active tcp recovery is instant, all the snd/rcv buffers are flushed and then queues are flushed and all live as if nothing happened. > This means that a client recovering network after several minutes > will find the connection still live (in effect), whereas if it never > recovers, it will still get the push notifications in a timely > manner. > > There are likely to be downsides with this approach; particularly > presence state will be badly affected. PSA could help here. Overall, > though, we believe that this will substantially improve the effective > performance of C2S over high latency, high contention links. I'm leaning towards ignoring all the timers whatsoever, only care about how it affects UX. If tcp is still holding up - let it be, if it got EOF/EOS/Timeout (from whatever side) - let's just do resumption reconnection - we're reconnectiong continuously anyway. 1. - https://github.com/TelepathyIM/wocky/issues/14#issuecomment-720091807
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
