I think Christian gave the detail explanation for the reason that we needs not, or should not touch the application logic themselves. I think Usama have also the similar concerns.
And to Eric's question: ------- " You had said earlier that the TLS stack should also be buffering and re-sending `make-payment(1/2)`. Is that still your view?" My express in intension is that if the TLS stack receive the command from the application layer during its switchover, it should do some buffering(but needs not be aware of such data is 'make-payment(1/2)'). Once the switchover process is finished, it should send such buffered data immediately, because it is the TLS layer's task to accomplish what it is told to do. And once the buffered data is sent out successfully(via the TCP layer's ACK mechanism), it can notify the application layer, its work is done. The application layer should have its timer to wait such notification from TLS layer, before it send again the 'make-payment(1/2)' Best Regards Aijun Wang China Telecom -----Original Message----- From: Christian Huitema [mailto:[email protected]] Sent: Monday, January 5, 2026 3:28 AM To: Eric Rescorla <[email protected]>; Aijun Wang <[email protected]> Cc: [email protected] Subject: Re: [TLS] Re: 【Reply to the comments after the presentation in Montreal】RE: Re: FW: New Version Notification for draft-wang-tls-service-affinity-00.txt This discussion of "session continuity" reminds me a lot of similar discussions in the 7-layer OSI model. It included an hypothetical "session" layer, which included the functions required to safely recover from a breakout in communication. That "session" layer disappeared in modern architectures, for a variety of reasons. As we seem to revisit the issue, let's get a reminder of what we learned in those years. The main lesson is that recovery of interruptions requires a form of check-pointing, and that such check-pointing often involves several connections, not just one. The classic example of such check-pointing is the "two phase commit" algorithms, whose primary goal is to ensure that transactions either succeed or fail across all systems involved. In the banking example the "make payment" transaction typically involve customer, vendor and their banks, and the goal is to decide that the payment either takes place everywhere or not at all. The next lesson is that such check-pointing mostly requires visibility by the application. Ultimately, the application is in charge of ensuring that databases remain coherent. It may chose different algorithms depending on its requirements. Two-phase commit is just an example, other systems based on journals are also popular. Some applications may be satisfied with "eventually converging" instead of "always in sync". Yet another lesson is that this synchronization is definitely not a "transport" issue. That's why previous attempt to create a "session" layer between "transport" and "application" have failed. The application decides whether to recover or roll-back a transaction, for example canceling a transaction if it took too long. The transport cannot really do that, and a simple TLS layer on top of the transport cannot do that either. One could design transport extensions that deal with very simple cases, such as deciding how to restart a file transfer, but even those have limited value, and are easily superseded by application level solutions like "get the bytes of file X starting at offset Y". In real life, such "simple" solutions will have qualifiers, such as "get the bytes of file X starting at offset Y but only if the file did not change before time T". All that to say that I am not convinced at all by a proposition to insert a "session data continuity" mechanism at the TLS layer. Such functionality is application dependent, it belongs to the application, not to the transport. The current design of the session resume mechanism correctly focuses on just one piece of the puzzle: spend less CPU restarting a TLS session if the peer remember the keying material of a previous session. That's an optional mechanism, and it narrowly focuses on TLS specific data. Adding complexity there would be counter-productive. -- Christian Huitema On 1/4/2026 6:38 AM, Eric Rescorla wrote: > > On Sun, Jan 4, 2026 at 1:27 AM Aijun Wang <[email protected]> > wrote: > > Server > > <------------------ First TCP/TLS Connection ----------------> > POST /make-payment (1/2) ---\ /---------------- Switch servers > X > <---------------------------/ \------------------------------> > [Buffer /make-payment (2/2)] > <-------------------------------------------------------- ACK > > <-------------------- New TCP/TLS Connection > ----------------> > > [If in application layer, the client side doesn’t receive the > response from make-payment, it needs to send again make-payment(1) > and also the make-payment(2)] > > /make-payment (1/2)-----------------------------------------à > > > You had said earlier that the TLS stack should also be buffering and > re-sending `make-payment(1/2)`. Is that still your view? > > -Ekr > > > ========================================================== > > *From:*[email protected] > [mailto:[email protected]] *On Behalf Of *Eric Rescorla > *Sent:* Sunday, January 4, 2026 10:12 AM > *To:* Aijun Wang <[email protected]> > *Cc:* [email protected] > *Subject:* [TLS] Re: 【Reply to the comments after the presentation > in Montreal】RE: Re: FW: New Version Notification for > draft-wang-tls-service-affinity-00.txt > > On Sat, Jan 3, 2026 at 5:42 PM Aijun Wang > <[email protected]> wrote: > > Hi, Eric: > > What we want to is similar with “Resumption and Pre-Shared > Key(PSK)”that is described in > https://datatracker.ietf.org/doc/html/rfc8446#section-2.2 > > From this section, we can know the application layer will not > aware such session resumption, TLS layer handles all the > procedure. Right? > > Not necessarily. The TLS specification takes no position on when > (1) clients should attempt resumption and (2) servers should allow it. > > What you described in previous examples can all happen in the > resumption process, and the application layer should have > their own additional confirmation/retry logic. > > I'm not sure that's in fact true. The purpose of the examples was > to explore that, which is why I asked you to provide your own > ladder diagrams showing how you thought this worked. Again, can > you please do that? > > For the mentioned > > draft(https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/), > the additional exchange signals are to transfer the new server > address securely after the initial connection. > > What’s the client and server need do is to correlate the > corresponding cryptographic context to the new underlying TCP > connection. > > Do you have any suggestions to make the above intension more > clearly in > https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/? > > As I said, I think this is the wrong design, so my suggestion is > you don't do it. > > To the extent to which you are trying to make the case otherwise, > you really need to show your work, which this message does not do. > > -Ekr > > *From:*[email protected] > [mailto:[email protected]] *On Behalf Of > *【外部账号】Eric Rescorla > *Sent:* Tuesday, December 30, 2025 10:41 PM > *To:* Aijun Wang <[email protected]> > *Cc:* [email protected]; [email protected]; > Mohit Sahni <[email protected]>; Aijun Wang > <[email protected]> > *Subject:* Re: [TLS] Re: 【Reply to the comments after the > presentation in Montreal】RE: Re: FW: New Version Notification > for draft-wang-tls-service-affinity-00.txt > > On Tue, Dec 30, 2025 at 2:10 AM Aijun Wang > <[email protected]> wrote: > > Hi, Eric: > > Contrary to your conclusions, I think the application > layer and TLS/TCP layer should(already) have their own > mechanisms to assure the data integrity, > > Yes, which might or might not work correctly, because they are > rarely tested. > > there is no necessary to consider them again at the > protocol layer, we need just some guidance for the > implementation of client/server sides themselves. > > If there is data arrival during the switchover, the > internal implementation logic is the application layer > will call the api of TLS/TCP to send some data, with the > same session identifier. > > I don't know what you mean by "The same session identifier". > There is no concept in TLS that two different TCP connections > are somehow the same conceptual flow of data. PSK identifiers > solely identify keys. > > > In this case, the client doesn't know what has happened. > You need > mechanisms either at the HTTP layer--or more typically at > the REST API > layer--to do the right thing, which might be an > idempotency layer > combined with client-side retransmit. This is all just a > straightforward application of the end-to-end argument, > and there's no > real way around it as long as systems might asynchronously > fail, but > it's also a source of defects (think about how many times > sites tell > you not to press the submit button twice) because these > mechanisms may > not have been exercised or tested. For instance, if the > server is high > reliability and the client just assumes that anything it > sent works, > that will be good enough a very large fraction of the > time, but not if > the server has a high failure rate. > > [WAJ] From the example, we can know each application has > its own confirmation mechanism, because most of them are > asynchronous. > > The application knows there will be possibilities that the > server crash, or the underlay connection broken. > > Yes. I said exactly this, but again, they're not always going > to be > > implemented correctly, and that's largely OK because most > > connections don't fail. You're talking about making an > exceptional > > condition routine. > > Unfortunately, these transaction semantics only exist at > the HTTP > layer, not the TLS layer, so the TLS layer has no way of > knowing to > wait for the 200 OK, it just knows that the client sent > some data, but > not whether that reflects an outstanding request or > something else; > recall that TLS doesn't even know about the HTTP > request/response > semantics, because it's just a dumb pipe. > > [WAJ] TLS needn’t aware the 200 OK signal, it is the job > of application layer. > > TLS/TCP needs only transmit the data from the application > layer correctly to other side. > > So you're saying that in the example above, the TLS layer > ought to inform > > the HTTP layer that the connection has failed and trust the > HTTP layer > > to retry in a safe fashion? > > In your email, you suggest that the client ought to: > > 1. Wait for the server's TCP ACK of all transmitted data, > with the > implied semantics being that once the message is ACKed it > will be > reliably delivered to the server, not just to the TCP stack. > > [WAJ] No. I emphasize only the TCPACK and the TCP stack. > Not the application stack. That is to say, receiving the > TCP ACK doesn’t represent the application layer ACK. > > > 2. Buffer any data it receives form the cleint while > waiting for the > ACK and retransmit it on the new connections. > > [WAJ] Buffer any data it receives, but can’t transmit > immediately during the switchover process, not waiting for > the application ACK. > > I don't understand what you're saying here. Can you please > provide: > > 1. A concrete description of what you believe the rules that > the > > TLS stack should be following. > > 2. New versions of my ladder diagrams that show what you > believe the correct > > behavior is. > > -Ekr > > > _______________________________________________ > TLS mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ TLS mailing list -- [email protected] To unsubscribe send an email to [email protected]
