[jira] [Created] (AIRFLOW-4854) Env Var passed to K8 worker pod should be case sensitive
raman created AIRFLOW-4854: -- Summary: Env Var passed to K8 worker pod should be case sensitive Key: AIRFLOW-4854 URL: https://issues.apache.org/jira/browse/AIRFLOW-4854 Project: Apache Airflow Issue Type: Improvement Components: executors Affects Versions: 1.10.2 Reporter: raman We are using K8Executor and there is a use case where users might provide env variable in lower case which is read by their custom operator in K8 Pod. But currently it seems that in K8 executor env variables are transformed in to upper case [https://github.com/apache/airflow/blob/f520d02cc1f41f9861f479f984bb52bda3860d30/airflow/kubernetes/secret.py#L43]. These should be passed in case sensitive manner -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4789) Add Mysql Exception Handling while pushing data to XCom
raman created AIRFLOW-4789: -- Summary: Add Mysql Exception Handling while pushing data to XCom Key: AIRFLOW-4789 URL: https://issues.apache.org/jira/browse/AIRFLOW-4789 Project: Apache Airflow Issue Type: Improvement Components: xcom Affects Versions: 1.10.2 Reporter: raman We have seen mysql exceptions while setting up xcom data. It would be better to add exception handling there -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4586) Task getting stuck in Queued State
[ https://issues.apache.org/jira/browse/AIRFLOW-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman reassigned AIRFLOW-4586: -- Assignee: raman > Task getting stuck in Queued State > -- > > Key: AIRFLOW-4586 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4586 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.0 >Reporter: raman >Assignee: raman >Priority: Major > > We are observing intermittently that Tasks get stuck in queued state and > never get executed by Airflow. On debugging it we found that one of the > queued dependency was not met due to which task did not move from queued to > running state. So task remained in queued state. (are_dependencies_met > function returned false for QUEUE_DEPS inside > _check_and_change_state_before_execution). > By looking into scheduler code it seems that scheduler does not reschedule > the queued state tasks due to which task never got added to executor queue > again and remained stuck in queued state. There is a logic inside > _check_and_change_state_before_execution function to move the task from > queued to None state(which gets picked by scheduler for rescheduling) if > RUN_DEPS are not met but this logic seems to be missing for QUEUE_DEPS. It > seems that task should be moved to None state even if QUEUE_DEPS are not met. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4586) Task getting stuck in Queued State
[ https://issues.apache.org/jira/browse/AIRFLOW-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852848#comment-16852848 ] raman commented on AIRFLOW-4586: Inside _process_task_instances(self, dag, queue, session=None) function scheduler is not considering task instances in queued state(rightly so otherwise it might add duplicate tasks to executor queue) so task remained in queued state if queued dep are not met. Below is the code snippet for same "{color:#205081}for run in active_dag_runs:{color} {color:#205081} self.log.debug("Examining active DAG run: %s", run){color} {color:#205081} # this needs a fresh session sometimes tis get detached{color} {color:#205081} tis = run.get_task_instances(state=(State.NONE,{color} {color:#205081} State.UP_FOR_RETRY,{color} {color:#205081} State.UP_FOR_RESCHEDULE)){color}" > Task getting stuck in Queued State > -- > > Key: AIRFLOW-4586 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4586 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.0 >Reporter: raman >Priority: Major > > We are observing intermittently that Tasks get stuck in queued state and > never get executed by Airflow. On debugging it we found that one of the > queued dependency was not met due to which task did not move from queued to > running state. So task remained in queued state. (are_dependencies_met > function returned false for QUEUE_DEPS inside > _check_and_change_state_before_execution). > By looking into scheduler code it seems that scheduler does not reschedule > the queued state tasks due to which task never got added to executor queue > again and remained stuck in queued state. There is a logic inside > _check_and_change_state_before_execution function to move the task from > queued to None state(which gets picked by scheduler for rescheduling) if > RUN_DEPS are not met but this logic seems to be missing for QUEUE_DEPS. It > seems that task should be moved to None state even if QUEUE_DEPS are not met. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4594) Kubernetes watcher updating Task state based on Pod state
raman created AIRFLOW-4594: -- Summary: Kubernetes watcher updating Task state based on Pod state Key: AIRFLOW-4594 URL: https://issues.apache.org/jira/browse/AIRFLOW-4594 Project: Apache Airflow Issue Type: Bug Components: executors Affects Versions: 1.10.1 Reporter: raman Currently KubernetesJobWatcher updates the task state based on the pod state inside process_status function. If it gets the pod state as failed it marks the task state to failed which is not the correct behaviour because task might move to retry state or reschedule state. Pod might be failed because of various reasons like machine/network issue, it might crash. But its state might not reflect the task state -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4586) Task getting stuck in Queued State
raman created AIRFLOW-4586: -- Summary: Task getting stuck in Queued State Key: AIRFLOW-4586 URL: https://issues.apache.org/jira/browse/AIRFLOW-4586 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.10.0 Reporter: raman We are observing intermittently that Tasks get stuck in queued state and never get executed by Airflow. On debugging it we found that one of the queued dependency was not met due to which task did not move from queued to running state. So task remained in queued state. (are_dependencies_met function returned false for QUEUE_DEPS inside _check_and_change_state_before_execution). By looking into scheduler code it seems that scheduler does not reschedule the queued state tasks due to which task never got added to executor queue again and remained stuck in queued state. There is a logic inside _check_and_change_state_before_execution function to move the task from queued to None state(which gets picked by scheduler for rescheduling) if RUN_DEPS are not met but this logic seems to be missing for QUEUE_DEPS. It seems that task should be moved to None state even if QUEUE_DEPS are not met. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4091) Support for providing kube client arguments through airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman reassigned AIRFLOW-4091: -- Assignee: raman > Support for providing kube client arguments through airflow.cfg > --- > > Key: AIRFLOW-4091 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4091 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration, executor, kubernetes >Affects Versions: 1.10.2 >Reporter: raman >Assignee: raman >Priority: Major > > KubeClient supports various config options like async, _reques__timeout etc. > But there there is no way to provide these through airflow.cfg. > So add the functionality to provide different kubeclient config options > through cfg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4091) Support for providing kube client arguments through airflow.cfg
raman created AIRFLOW-4091: -- Summary: Support for providing kube client arguments through airflow.cfg Key: AIRFLOW-4091 URL: https://issues.apache.org/jira/browse/AIRFLOW-4091 Project: Apache Airflow Issue Type: Improvement Components: configuration, executor, kubernetes Affects Versions: 1.10.2 Reporter: raman KubeClient supports various config options like async, _reques__timeout etc. But there there is no way to provide these through airflow.cfg. So add the functionality to provide different kubeclient config options through cfg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3917) Support for running Airflow scheduler(with KubernetesExecutor) outside of K8 cluster
[ https://issues.apache.org/jira/browse/AIRFLOW-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman reassigned AIRFLOW-3917: -- Assignee: raman > Support for running Airflow scheduler(with KubernetesExecutor) outside of K8 > cluster > > > Key: AIRFLOW-3917 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3917 > Project: Apache Airflow > Issue Type: Improvement > Components: executor, kubernetes, scheduler >Affects Versions: 1.10.1, 1.10.2 >Reporter: raman >Assignee: raman >Priority: Major > > Currently It seems that Airflow Scheduler needs to be run inside K8 cluster > as a long running pod. > Though there is "in_cluster" config exposed through airflow.cfg but there > does not seem to be a provision to pass required config file while creating > kube_client. > Following is the code snippet from kubernetes_executor.py's start function > where kube_client is configured without passing any config > self.kube_client = get_kube_client() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3936) Support for emptyDir volume in KubernetesExecutor
raman created AIRFLOW-3936: -- Summary: Support for emptyDir volume in KubernetesExecutor Key: AIRFLOW-3936 URL: https://issues.apache.org/jira/browse/AIRFLOW-3936 Project: Apache Airflow Issue Type: Improvement Components: executor, kubernetes, scheduler Affects Versions: 1.10.1 Reporter: raman Currently It seems that K8 Executor expects the dags_volume_claim or git_repo to be always defined through airflow.cfg. Otherwise it does not come up. Though there is support for "emptyDir" volume in worker_configuration.py but kubernetes_executor fails in _validate function if these configs are not defined. Our dag files are stored in some remote location which can be synced to worker pod through init/side-car container. We are exploring if it makes sense to allow K8 executor to come up for cases where dags_volume_claim are git_repo are not defined. In such cases worker pod would look for the dags in emptyDir and worker_airflow_dags path (like it does for git-sync). Dag files can be made available in worker_airflow_dags path through init/side-car container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3917) Support for running Airflow scheduler(with KubernetesExecutor) outside of K8 cluster
raman created AIRFLOW-3917: -- Summary: Support for running Airflow scheduler(with KubernetesExecutor) outside of K8 cluster Key: AIRFLOW-3917 URL: https://issues.apache.org/jira/browse/AIRFLOW-3917 Project: Apache Airflow Issue Type: Improvement Components: executor, kubernetes, scheduler Affects Versions: 1.10.2, 1.10.1 Reporter: raman Currently It seems that Airflow Scheduler needs to be run inside K8 cluster as a long running pod. Though there is "in_cluster" config exposed through airflow.cfg but there does not seem to be a provision to pass required config file while creating kube_client. Following is the code snippet from kubernetes_executor.py's start function where kube_client is configured without passing any config self.kube_client = get_kube_client() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3916) Remove Metastore dependency for K8 Worker pod heartbeat
raman created AIRFLOW-3916: -- Summary: Remove Metastore dependency for K8 Worker pod heartbeat Key: AIRFLOW-3916 URL: https://issues.apache.org/jira/browse/AIRFLOW-3916 Project: Apache Airflow Issue Type: Improvement Components: executor, kubernetes, worker Affects Versions: 1.10.2, 1.10.1 Reporter: raman Currently K8 Worker pod uses Metastore for continuous heartbeat which might make metastore a bottleneck for 1000(s) of running Pods. Airflow with K8 Executor mode uses KubernetesJobWatcher to track the status for submitted worker pod(s). It can be leveraged to remove the worker pod's dependency on Meta store for heartbeating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-3882) Optimise APIs to create DAG model from DB instead of calling DagBag()
[ https://issues.apache.org/jira/browse/AIRFLOW-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman closed AIRFLOW-3882. -- Resolution: Not A Problem > Optimise APIs to create DAG model from DB instead of calling DagBag() > - > > Key: AIRFLOW-3882 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3882 > Project: Apache Airflow > Issue Type: Improvement > Components: api, webserver >Affects Versions: 1.10.2 >Reporter: raman >Priority: Minor > > Currently following APIs create the Dag object by calling DagBag() on all the > files. which is not an optimised way. There is an alternative to create a Dag > object from DB > -> get_dag_run_state > -> get_dag_runs > -> /dags//tasks/ > -> > /dags//dag_runs//tasks/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3882) Optimise APIs to create DAG model from DB instead of calling DagBag()
raman created AIRFLOW-3882: -- Summary: Optimise APIs to create DAG model from DB instead of calling DagBag() Key: AIRFLOW-3882 URL: https://issues.apache.org/jira/browse/AIRFLOW-3882 Project: Apache Airflow Issue Type: Improvement Components: api, webserver Affects Versions: 1.10.2 Reporter: raman Currently following APIs create the Dag object by calling DagBag() on all the files. which is not an optimised way. There is an alternative to create a Dag object from DB -> get_dag_run_state -> get_dag_runs -> /dags//tasks/ -> /dags//dag_runs//tasks/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3878) K8 Pod creation error if dag_id exceeds 63 characters
raman created AIRFLOW-3878: -- Summary: K8 Pod creation error if dag_id exceeds 63 characters Key: AIRFLOW-3878 URL: https://issues.apache.org/jira/browse/AIRFLOW-3878 Project: Apache Airflow Issue Type: Bug Components: executor, kubernetes Affects Versions: 1.10.2 Reporter: raman Getting following error while creating K8 worker Pod. Dag Id length is > 63 char and as per [https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/] lable value should be <=63 HTTP response body: \{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to parse requirement: invalid label value: \"\": must be no more than 63 characters","reason":"BadRequest","code":400} K8 Executor sets the dagId as the label value while creating k8pod. Avoid adding dag_id and task_id as pod labels if their size exceeds 63 char -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3865) Expose API to get Python code corresponding to a Dag Id
raman created AIRFLOW-3865: -- Summary: Expose API to get Python code corresponding to a Dag Id Key: AIRFLOW-3865 URL: https://issues.apache.org/jira/browse/AIRFLOW-3865 Project: Apache Airflow Issue Type: Improvement Components: api, webserver Affects Versions: 1.9.0, 2.0.0 Reporter: raman Assignee: raman Currently Airflow's experimental api does not expose an endpoint to download DAg's python code. It might be required for sharing and debugging purpose. Proposal here is to expose following GET endpoint to allow downloading of dag_id code. /dags//code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3516) Improve K8 Executors Pod creation Rate
[ https://issues.apache.org/jira/browse/AIRFLOW-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman updated AIRFLOW-3516: --- Component/s: (was: scheduler) > Improve K8 Executors Pod creation Rate > -- > > Key: AIRFLOW-3516 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3516 > Project: Apache Airflow > Issue Type: Improvement > Components: executor, kubernetes >Affects Versions: 1.10.1 >Reporter: raman >Assignee: raman >Priority: Major > > It seems that Airflow scheduler is creating worker pod sequentially(one pod > per scheduler loop) in Kubernetes executors sync function which is called > from base executors.heartbeat() function. > One enhancement could be to create the k8 pod in batches where batch size can > be configured through airflow.cfg file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3516) Improve K8 Executors Pod creation Rate
[ https://issues.apache.org/jira/browse/AIRFLOW-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman reassigned AIRFLOW-3516: -- Assignee: raman > Improve K8 Executors Pod creation Rate > -- > > Key: AIRFLOW-3516 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3516 > Project: Apache Airflow > Issue Type: Improvement > Components: executor, kubernetes, scheduler >Affects Versions: 1.10.1 >Reporter: raman >Assignee: raman >Priority: Major > > It seems that Airflow scheduler is creating worker pod sequentially(one pod > per scheduler loop) in Kubernetes executors sync function which is called > from base executors.heartbeat() function. > One enhancement could be to create the k8 pod in batches where batch size can > be configured through airflow.cfg file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3613) Update Readme to add adobe as airflow user
[ https://issues.apache.org/jira/browse/AIRFLOW-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] raman reassigned AIRFLOW-3613: -- Assignee: raman > Update Readme to add adobe as airflow user > -- > > Key: AIRFLOW-3613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3613 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0, 1.10.1 >Reporter: raman >Assignee: raman >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3613) Update Readme to add adobe as airflow user
raman created AIRFLOW-3613: -- Summary: Update Readme to add adobe as airflow user Key: AIRFLOW-3613 URL: https://issues.apache.org/jira/browse/AIRFLOW-3613 Project: Apache Airflow Issue Type: Improvement Components: Documentation Affects Versions: 1.10.1, 1.10.0 Reporter: raman -- This message was sent by Atlassian JIRA (v7.6.3#76005)