I have an airflow 1.9 cluster setup on kubernetes and I have an issue where a 
random DAG Task shows as fail because it appears that Airflow has lost track of 
it.  The cluster consists of a database, redis store, scheduler, and 14 workers.


What happens is the task starts as normal, runs, and exits, but instead of 
status being written the Operator, State Date, Job ID, and Hostname are erased. 
  Shortly there after an endtime is added and the state is set to failed.


Given the hostname is erased I to brute force find the logs of the worker that 
executed the task.  If I can find the Task logs it indicates the command 
(BashOperator) ran to completion and exited cleanly.  I don't see any errors in 
the airflow scheduler or any workers that would indicate any issues.  I am not 
sure what else to debug.




- John K

Reply via email to