Re: Fencing and pod guarantees

Clayton Coleman Tue, 11 Jul 2017 16:05:39 -0700

On Thu, Jul 6, 2017 at 6:34 AM, Nicola Ferraro <[email protected]> wrote:


> Hi,
> I've read some discussions on fencing and pod guarantees. Most of them are
> related to stateful sets, e.g. https://github.com/
> kubernetes/community/blob/master/contributors/design-
> proposals/pod-safety.md and related threads.
> Anyway, I couldn't find an answer to the following questions...
>
> Suppose I create a DeploymentConfig (so, no statefulsets) with replicas=1.
> After a pod is scheduled on some node, that node is disconnected from the
> cluster (I block all communications with the master).
> After some time, the DC/RC tries to delete that pod and reschedule a new
> pod on another node.
>

The RC doesn't delete the pod, but the node controller will (after X
minutes).  A new pod is created - the RC does *not* block waiting for old
pods to be deleted before creating new ones.  If the Pod references a PV
that supports locking innately (GCE, AWS, Azure, Ceph, Gluster), then the
second pod will *not* start up, because the volume can't be attached to the
new node.  But this behavior depends on the storage service itself, not on
Kube.


>
> For what I've understood, if now I reconnect the failing node, the Kubelet
> will read the cluster status and effectively delete the old pod, but,
> before that moment, both pods were running in their respective nodes and
> the old pod was allowed to access external resources (e.g. if the network
> still allowed communication with them).
>

Yes


>
> Is this scenario possible?
> Is there a mechanism by which a disconnected node can tear down its pods
> automatically after a certain timeout?
>

Run a daemonset that shuts down the instance if it loses contact with the
master API / health check for > X seconds.  Even this is best effort.  You
can also run a daemon set that uses sanlock or another tool based on a
shared RWM volume, and then self terminate if you lose the lock.  Keep in
mind these solutions aren't perfect, and it's always possible that a bug in
sanlock or another node error prevents that daemon process from running to
completion.


> Is fencing implemented/going-to-be-implemented for normal pods, even if
> they don't belong to stateful sets?
>

It's possible that we will add attach/detach controller support to control
whether volumes that are RWO but don't have innate locking.  It's also
possible that someone will implement a fencer.  It should be easy to
implement a fencer today.


>
> Thanks,
> Nicola
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Fencing and pod guarantees

Reply via email to