Over the last few weeks we have created a number of JIRAs related to SSL resumption and redirects etc. All of these allude to some master redirect scenario but the scenario itself perhaps wasn't clear. Hopefully this note should give folks some more context as to the motivation and details for this scenario.
We have a multi-tenant cloud messaging service that supports AMQP 1-0. One of our customers is building bandwidth constrained devices which require long lived connections to our service with the minimum amount of protocol chatter. There are potentially a large number of devices involved so byte overhead needs to be minimized, even an empty 8 byte keep-alive frame every 5 minutes, is quite prohibitive. We have recommended the customer use AMQP 1.0 with Proton-C in their Linux based devices. Now our service uses our standard cloud (Azure) infrastructure components and hence inherits some of the constraints of these components. Specifically the Azure load balancer, which this service sits behind, aggressively kills connections if there is no traffic for more than 5 minutes. The service however does have direct IP endpoints which are connectable but are dynamic. To satisfy the customer scenario and given our constraints, we need the client to connect to the well-known DNS name for the load balancer VIP and somehow redirect the client to that machine's direct IP address to work around the load balancer's idle connection limitation. AMQP redirection is a perfect fit for this. Given this background, here is how the complete redirect scenario works out: 1. Client establishes a TCP connection to the AMQP endpoint 2. Regular SSL/TLS session negotiation happens which involves a number of round trips between client/server 3. Client/Server do the SASL-PLAIN dance 4. Client sends AMQP open frame with open.hostname set to the AMQP endpoint address from (1) 5. Server sends back a close frame with an AMQP redirect error, this redirect error will contain the direct IP address of the same node 6. Client establishes a TCP connection to the direct IP 7. SSL resume is used to re-establish the TLS session 8. Client/Server do the SASL-PLAIN dance 9. Client sends AMQP open frame with open.hostname set to the AMQP endpoint address from (1) above 10.The server sends back a successful Open 11.At this point AMQP Connection establishment is complete. The client will periodically disconnect due to various reasons beyond our control. The key here is to let them reconnect with minimal byte overhead: 1. Client should reconnect to the direct IP using steps 6-11 above. 2. If the connection to the direct IP fails N times then client should fall back to the standard load balancer address. SSL session resume may be possible in some scenarios, but full SSL negotiation may be required. HTH! Thanks
