Re: Complete cluster meltdown due to "Kubelet stopped posting node status"

v Wed, 12 Oct 2016 23:48:07 -0700

Hello,

seems like that manual intervention (log on and evacuate node) is the price we 
have to pay if we don't want our master to wreak havoc in our cluster when it 
has connectivity problems.


Maybe this whole mechanism could be built in a more defensive way. What is 
missing for us is an option to just re-create the pods that were on that node 
somewhere else if that node can't be reached for 5 minutes, and only evacuate 
the node after, say, 4 hours. Because that node might still be working properly 
and serving requests, it might just not be reachable for the master, as was in 
our case.

Such an option would be great to have, because all our services are built in a 
way that they are allowed to exist multiple times in the network.

Best Regards & thanks for your support Clayton!
v


Am 2016-10-12 um 17:44 schrieb Clayton Coleman:

Yeah, if you make this change you'll be responsible for triggering evacuation of down 
nodes.  You can do that via "oadm manage-node NODE_NAME --evacuate"

On Mon, Oct 10, 2016 at 8:06 AM, v <vekt...@gmx.net <mailto:vekt...@gmx.net>> 
wrote:

    Hello Clayton,

    thank you for replying!
    I'm not sure whether changing the node failure detection threshhold is the 
right way to go. I have found this:

    
https://docs.openshift.com/enterprise/3.1/install_config/master_node_configuration.html
 
<https://docs.openshift.com/enterprise/3.1/install_config/master_node_configuration.html>
    masterIP: 10.0.2.15podEvictionTimeout: 5mschedulerConfigFile: "" I think that podEvictionTimeout is the 
thing that bit us. After changing that to "24h" I don't see any "Evicting pods on node" or 
"Recording Deleting all Pods from Node" messages in the master logs any more.

    Regards
    v

    Am 2016-10-10 um 15:21 schrieb Clayton Coleman:

    Network segmentation mode is in 1.3.  In 1.1 or 1.2 you can also
    increase the node failure detection threshold (80s by default) as high
    as you want by setting the extended controller argument for it, which
    will delay evictions (you could set 24h and use external tooling to
    handle node down).

    If you are concerned about external traffic causing DDoS, add a proxy
    configuration for your masters that rate limits traffic by cookie or
    source ip.

    On Oct 10, 2016, at 2:56 AM, v<vekt...@gmx.net> <mailto:vekt...@gmx.net>  
wrote:

    Hello,

    we just had our whole Openshift cluster go down hard due to a "feature" in 
the Openshift master that deletes all pods from a node if the node doesn't report back to 
the master on a regular basis.

    Turns out we're not the only ones who have been bitten by this "feature":
    https://github.com/kubernetes/kubernetes/issues/30972#issuecomment-241077740 
<https://github.com/kubernetes/kubernetes/issues/30972#issuecomment-241077740>
    https://github.com/kubernetes/kubernetes/issues/24200 
<https://github.com/kubernetes/kubernetes/issues/24200>

    I am writing here to find out whether it is possible to disable this 
feature completely. We don't need it and we don't want our master to ever do 
something like that again.

    Please note how easily this feature can be abused: At the moment anyone can 
bring down your whole Openshift cluster just by DDoSing the master(s) for a few 
minutes.







    The logs (they were the same for all nodes):
    Okt 09 21:47:10openshiftmaster.com <http://openshiftmaster.com>  
origin-master[919215]: I1004 21:47:10.804666  919215 nodecontroller.go:697] 
nodeopenshiftnode.com <http://openshiftnode.com>  hasn't been updated for 
5m17.169004459s. Last out of disk condition is: &{Type:OutOfDisk Status:Unknown 
LastHeartbeatTime:2016-10-04 21:41:53 +0200 CEST LastTransitionTime:2016-10-04 21:42:33 +0200 
CEST Reason:NodeStatusUnknown Message:Kubelet stopped posting node status.}
    Okt 09 21:47:10openshiftmaster.com <http://openshiftmaster.com>  
origin-master[919215]: I1004 21:47:10.804742  919215 nodecontroller.go:451] Evicting pods 
on nodeopenshiftnode.com <http://openshiftnode.com>: 2016-10-04 21:47:10.80472667 
+0200 CEST is later than 2016-10-04 21:42:33.779813315 +0200 CEST + 4m20s
    Okt 09 21:47:10openshiftmaster.com <http://openshiftmaster.com>  origin-master[919215]: 
I1004 21:47:10.945766  919215 nodecontroller.go:540] Recording Deleting all Pods from 
Nodeopenshiftnode.com <http://openshiftnode.com>. event message for nodeopenshiftnode.com 
<http://openshiftnode.com>

    Regards
    v

    _______________________________________________
    users mailing list
    users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com>
    http://lists.openshift.redhat.com/openshiftmm/listinfo/users 
<http://lists.openshift.redhat.com/openshiftmm/listinfo/users>

    _______________________________________________
    users mailing list
    users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com>
    http://lists.openshift.redhat.com/openshiftmm/listinfo/users 
<http://lists.openshift.redhat.com/openshiftmm/listinfo/users>

_______________________________________________ users mailing list users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> http://lists.openshift.redhat.com/openshiftmm/listinfo/users <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Complete cluster meltdown due to "Kubelet stopped posting node status"

Reply via email to