Yeah, if you make this change you'll be responsible for triggering
evacuation of down nodes.  You can do that via "oadm manage-node NODE_NAME
--evacuate"

On Mon, Oct 10, 2016 at 8:06 AM, v <vekt...@gmx.net> wrote:

> Hello Clayton,
>
> thank you for replying!
> I'm not sure whether changing the node failure detection threshhold is the
> right way to go. I have found this:
>
> https://docs.openshift.com/enterprise/3.1/install_config/
> master_node_configuration.html
> masterIP: 10.0.2.15 podEvictionTimeout: 5m schedulerConfigFile: "" I
> think that podEvictionTimeout is the thing that bit us. After changing that
> to "24h" I don't see any "Evicting pods on node" or "Recording Deleting all
> Pods from Node" messages in the master logs any more.
>
> Regards
> v
>
> Am 2016-10-10 um 15:21 schrieb Clayton Coleman:
>
> Network segmentation mode is in 1.3.  In 1.1 or 1.2 you can also
> increase the node failure detection threshold (80s by default) as high
> as you want by setting the extended controller argument for it, which
> will delay evictions (you could set 24h and use external tooling to
> handle node down).
>
> If you are concerned about external traffic causing DDoS, add a proxy
> configuration for your masters that rate limits traffic by cookie or
> source ip.
>
>
>
>
> On Oct 10, 2016, at 2:56 AM, v <vekt...@gmx.net> <vekt...@gmx.net> wrote:
>
> Hello,
>
> we just had our whole Openshift cluster go down hard due to a "feature" in 
> the Openshift master that deletes all pods from a node if the node doesn't 
> report back to the master on a regular basis.
>
> Turns out we're not the only ones who have been bitten by this 
> "feature":https://github.com/kubernetes/kubernetes/issues/30972#issuecomment-241077740https://github.com/kubernetes/kubernetes/issues/24200
>
> I am writing here to find out whether it is possible to disable this feature 
> completely. We don't need it and we don't want our master to ever do 
> something like that again.
>
> Please note how easily this feature can be abused: At the moment anyone can 
> bring down your whole Openshift cluster just by DDoSing the master(s) for a 
> few minutes.
>
>
>
>
>
>
>
> The logs (they were the same for all nodes):
> Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 
> 21:47:10.804666  919215 nodecontroller.go:697] node openshiftnode.com hasn't 
> been updated for 5m17.169004459s. Last out of disk condition is: 
> &{Type:OutOfDisk Status:Unknown LastHeartbeatTime:2016-10-04 21:41:53 +0200 
> CEST LastTransitionTime:2016-10-04 21:42:33 +0200 CEST 
> Reason:NodeStatusUnknown Message:Kubelet stopped posting node status.}
> Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 
> 21:47:10.804742  919215 nodecontroller.go:451] Evicting pods on node 
> openshiftnode.com: 2016-10-04 21:47:10.80472667 +0200 CEST is later than 
> 2016-10-04 21:42:33.779813315 +0200 CEST + 4m20s
> Okt 09 21:47:10 openshiftmaster.com origin-master[919215]: I1004 
> 21:47:10.945766  919215 nodecontroller.go:540] Recording Deleting all Pods 
> from Node openshiftnode.com. event message for node openshiftnode.com
>
> Regards
> v
>
> _______________________________________________
> users mailing 
> listusers@lists.openshift.redhat.comhttp://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
> _______________________________________________
> users mailing 
> listusers@lists.openshift.redhat.comhttp://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to