Tasks stay queued when they fail in celery

2017-07-28 Thread David Capwell
We noticed that in the past few days we keep seeing tasks stay in the queued state. Looking into celery, we see that the task had failed. Traceback (most recent call last): File "/python/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task R = retval = fun(*args,

Re: Completed tasks not being marked as completed

2017-07-28 Thread Marc Weil
Hey Max, Thanks for the suggestions. I believe it was a retry (I'm using remote logging so I can only check after the task completes), but the UI never reported it as such. The latest_heartbeat column is in the jobs table, and interestingly I do see some running jobs that haven't heartbeated for

Re: Airflow + Kubernetes Talk video

2017-07-28 Thread Dan Davydov
Thanks for organizing, and leading this effort in general! On Fri, Jul 28, 2017 at 2:40 PM, Daniel Imberman wrote: > Hi guys! > > Thank you again to everyone who attended the talk yesterday. I've posted > the video of the conversation to youtube, and will soon add the

Re: Completed tasks not being marked as completed

2017-07-28 Thread Maxime Beauchemin
Are you sure there hasn't been a retry at that point? [One of] the expected behavior is the one I described, where if a task finished without reporting it's success [or failure], it will stay marked as RUNNING, but will fail to emit a heartbeat (which is a timestamp updated in the task_instance

Re: Email on last failed try

2017-07-28 Thread Andrew Maguire
Ah that must be what it is, have left that as default which I guess is true. Cheers Andy On Fri, 28 Jul 2017, 22:53 Maxime Beauchemin, wrote: > Maybe you don't have have `email_on_retry=False`? > > On Fri, Jul 28, 2017 at 11:52 AM, Alex Guziel < >

Re: Email on last failed try

2017-07-28 Thread Maxime Beauchemin
Maybe you don't have have `email_on_retry=False`? On Fri, Jul 28, 2017 at 11:52 AM, Alex Guziel < alex.guz...@airbnb.com.invalid> wrote: > Sounds like unintended behavior. That should be what email_on_retry does. > If you can repro, file a ticket. > > On Fri, Jul 28, 2017 at 11:44 AM, Andrew

Airflow + Kubernetes Talk video

2017-07-28 Thread Daniel Imberman
Hi guys! Thank you again to everyone who attended the talk yesterday. I've posted the video of the conversation to youtube, and will soon add the video and slides to the airflow Wiki Cheers, Daniel https://www.youtube.com/watch?v=5BU3YPYYRno

Re: Email on last failed try

2017-07-28 Thread Alex Guziel
Sounds like unintended behavior. That should be what email_on_retry does. If you can repro, file a ticket. On Fri, Jul 28, 2017 at 11:44 AM, Andrew Maguire wrote: > Yeah - i have: > > 'email_on_failure': True > 'retries': 4 > > So i get emails on every try: e.g. Try 1 out

Re: Email on last failed try

2017-07-28 Thread Andrew Maguire
Yeah - i have: 'email_on_failure': True 'retries': 4 So i get emails on every try: e.g. Try 1 out of 5 Really what i'm most worried about is the final failures then i have a problem, whereas if it fails 3 times and then succeeds i'm ok to be unaware of that. Maybe email routing via gmail might

Re: Sensor slots utilization

2017-07-28 Thread Alex Guziel
I'm concerned that we would be making the logic more complex, unless the new sensor 'pokeonce' case is just a high number of retries. And the other overhead of course. Running the poke method inline wouldn't be great for perf either since it's a blocking I/O and would need to be handled async in

Airflow declarative DAGs

2017-07-28 Thread Alexander Shorin
Hi everyone! Yesterday we released airflow-declarative project which allows you define DAGs declaratively via YAML. https://github.com/rambler-digital-solutions/airflow-declarative TL;DR - Declarative DAGs in plain text YAML helps a lot to understand how DAG will looks like. Made for humans,

Sensor slots utilization

2017-07-28 Thread Maxime Beauchemin
Thought his was interesting to bubble up to the mailing list. From: https://github.com/apache/incubator-airflow/pull/2423#issuecomment-318723842 This is about the issue around sensors utilizing a lot of worker slots. The context is a PR from @shaform introducing sensors that check once and give

Re: Completed tasks not being marked as completed

2017-07-28 Thread Marc Weil
It happens mostly when the scheduler is catching up. More specifically, when I load a brand new DAG with a start date in the past. Usually I have it set to run 5 DAG runs at the same time, and up to 16 tasks at the same time. What I've also noticed is that the tasks will sit completed in reality

Re: Completed tasks not being marked as completed

2017-07-28 Thread Maxime Beauchemin
By the time "INFO - Task exited with return code 0" gets logged, the task should have been marked as successful by the subprocess. I have no specific intuition as to what the issue may be. I'm guessing at that point the job stops emitting heartbeat and eventually the scheduler will handle it as a

Re: Completed tasks not being marked as completed

2017-07-28 Thread Marc Weil
>From what I can tell, it only affects CeleryExecutor. I've never seen this behavior with LocalExecutor before. Max, do you know anything about this type of failure mode? ᐧ -- Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement | New Relic On Fri, Jul 28, 2017 at 5:48 AM,

Re: Email on last failed try

2017-07-28 Thread Maxime Beauchemin
Wouldn't `email_on_failure=True` work for you? https://airflow.incubator.apache.org/code.html?highlight=email_on_failure#baseoperator On Fri, Jul 28, 2017 at 9:32 AM, Andrew Maguire wrote: > Hey, > > Just wondering if anyone knows if there might be a way to only send

Email on last failed try

2017-07-28 Thread Andrew Maguire
Hey, Just wondering if anyone knows if there might be a way to only send email on the last failed try of a task? Could I use a callable on failure only send the mail on the last failed try. We are using big query and getting lots of transient errors around limit of concurrent queries that

Custom BashSensor

2017-07-28 Thread Diogo Franco
Hi, I was looking for a sensor that allowed me to customize the sensing logic. I couldn't find any way of doing this and wrote a BashSensor, which returns True or False based on a command/script's return code. The implementation is similar to the BashOperator, and it seems to me that it would fit

Re: Completed tasks not being marked as completed

2017-07-28 Thread Jonas Karlsson
We have the exact same problem. In our case, it's a bash operator starting a docker container. The container and process it ran exit, but the 'docker run' command is still showing up in the process table, waiting for an event. I'm trying to switch to LocalExecutor to see if that will help. _jonas