[ https://issues.apache.org/jira/browse/AIRFLOW-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jarek Potiuk resolved AIRFLOW-6014. ----------------------------------- Fix Version/s: 1.10.10 Resolution: Fixed > Kubernetes executor - handle preempted deleted pods - queued tasks > ------------------------------------------------------------------ > > Key: AIRFLOW-6014 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6014 > Project: Apache Airflow > Issue Type: Improvement > Components: executor-kubernetes > Affects Versions: 1.10.6 > Reporter: afusr > Assignee: Daniel Imberman > Priority: Minor > Fix For: 1.10.10 > > > We have encountered an issue whereby when using the kubernetes executor, and > using autoscaling, airflow pods are preempted and airflow never attempts to > rerun these pods. > This is partly as a result of having the following set on the pod spec: > restartPolicy: Never > This makes sense as if a pod fails when running a task, we don't want > kubernetes to retry it, as this should be controlled by airflow. > What we believe happens is that when a new node is added by autoscaling, > kubernetes schedules a number of airflow pods onto the new node, as well as > any pods required by k8s/daemon sets. As these are higher priority, the > Airflow pods are preempted, and deleted. You see messages such as: > > Preempted by kube-system/ip-masq-agent-xz77q on node > gke-some--airflow-00000000-node-1ltl > > Within the kubernetes executor, these pods end up in a status of pending and > an event of deleted is received but not handled. > The end result is tasks remain in a queued state forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)