[ Discussion about cluster singleton or not for the ContainerManager]

fwiw, I believe for Kubernetes we do not need to attempt to deal with fault
tolerance for the ContainerManager state ourselves.  We can use labels to
replicate all the persistent metadata for a container (prewarm or not, the
ContainerRouter it is assigned to) in the Kube objects representing the
pods in Kube's etcd metadata server.  If we need to restart a
ContainerManager, the new instance can come up "instantly" and start
servicing requests while recovering the state of the previous instance via
querries against etcd to discover the pre-existing containers it owned.

We'll need to validate the performance of this is acceptable (should be,
since it is just some asynchronous labeling operations when (a) the
container is created and (b) on the initial transition from stemcell to
warm), but it is going to be pretty simple to implement and makes good
usage of the underlying platform's capabilities.

--dave

Reply via email to