Hello,

we just had our whole Openshift cluster go down hard due to a "feature" in the 
Openshift master that deletes all pods from a node if the node doesn't report back to the 
master on a regular basis.

Turns out we're not the only ones who have been bitten by this "feature":
https://github.com/kubernetes/kubernetes/issues/30972#issuecomment-241077740
https://github.com/kubernetes/kubernetes/issues/24200

I am writing here to find out whether it is possible to disable this feature 
completely. We don't need it and we don't want our master to ever do something 
like that again.

Please note how easily this feature can be abused: At the moment anyone can 
bring down your whole Openshift cluster just by DDoSing the master(s) for a few 
minutes.







The logs (they were the same for all nodes):
Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 21:47:10.804666  
919215 nodecontroller.go:697] node openshiftnode.com hasn't been updated for 
5m17.169004459s. Last out of disk condition is: &{Type:OutOfDisk Status:Unknown 
LastHeartbeatTime:2016-10-04 21:41:53 +0200 CEST LastTransitionTime:2016-10-04 
21:42:33 +0200 CEST Reason:NodeStatusUnknown Message:Kubelet stopped posting node 
status.}
Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 
21:47:10.804742  919215 nodecontroller.go:451] Evicting pods on node 
openshiftnode.com: 2016-10-04 21:47:10.80472667 +0200 CEST is later than 
2016-10-04 21:42:33.779813315 +0200 CEST + 4m20s
Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 
21:47:10.945766  919215 nodecontroller.go:540] Recording Deleting all Pods from 
Node openshiftnode.com. event message for node openshiftnode.com

Regards
v

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to