We noticed that in the past few days we keep seeing tasks stay in the
queued state. Looking into celery, we see that the task had failed.
Traceback (most recent call last):
File "/python/lib/python2.7/site-packages/celery/app/trace.py", line
367, in trace_task
R = retval = fun(*args,
Hey Max,
Thanks for the suggestions. I believe it was a retry (I'm using remote
logging so I can only check after the task completes), but the UI never
reported it as such. The latest_heartbeat column is in the jobs table, and
interestingly I do see some running jobs that haven't heartbeated for
Thanks for organizing, and leading this effort in general!
On Fri, Jul 28, 2017 at 2:40 PM, Daniel Imberman
wrote:
> Hi guys!
>
> Thank you again to everyone who attended the talk yesterday. I've posted
> the video of the conversation to youtube, and will soon add the
Are you sure there hasn't been a retry at that point? [One of] the expected
behavior is the one I described, where if a task finished without reporting
it's success [or failure], it will stay marked as RUNNING, but will fail to
emit a heartbeat (which is a timestamp updated in the task_instance
Ah that must be what it is, have left that as default which I guess is
true.
Cheers
Andy
On Fri, 28 Jul 2017, 22:53 Maxime Beauchemin,
wrote:
> Maybe you don't have have `email_on_retry=False`?
>
> On Fri, Jul 28, 2017 at 11:52 AM, Alex Guziel <
>
Maybe you don't have have `email_on_retry=False`?
On Fri, Jul 28, 2017 at 11:52 AM, Alex Guziel <
alex.guz...@airbnb.com.invalid> wrote:
> Sounds like unintended behavior. That should be what email_on_retry does.
> If you can repro, file a ticket.
>
> On Fri, Jul 28, 2017 at 11:44 AM, Andrew
Hi guys!
Thank you again to everyone who attended the talk yesterday. I've posted
the video of the conversation to youtube, and will soon add the video and
slides to the airflow Wiki
Cheers,
Daniel
https://www.youtube.com/watch?v=5BU3YPYYRno
Sounds like unintended behavior. That should be what email_on_retry does.
If you can repro, file a ticket.
On Fri, Jul 28, 2017 at 11:44 AM, Andrew Maguire
wrote:
> Yeah - i have:
>
> 'email_on_failure': True
> 'retries': 4
>
> So i get emails on every try: e.g. Try 1 out
Yeah - i have:
'email_on_failure': True
'retries': 4
So i get emails on every try: e.g. Try 1 out of 5
Really what i'm most worried about is the final failures then i have a
problem, whereas if it fails 3 times and then succeeds i'm ok to be unaware
of that.
Maybe email routing via gmail might
I'm concerned that we would be making the logic more complex, unless the
new sensor 'pokeonce' case is just a high number of retries. And the other
overhead of course.
Running the poke method inline wouldn't be great for perf either since it's
a blocking I/O and would need to be handled async in
Hi everyone!
Yesterday we released airflow-declarative project which allows you
define DAGs declaratively via YAML.
https://github.com/rambler-digital-solutions/airflow-declarative
TL;DR
- Declarative DAGs in plain text YAML helps a lot to understand how
DAG will looks like. Made for humans,
Thought his was interesting to bubble up to the mailing list. From:
https://github.com/apache/incubator-airflow/pull/2423#issuecomment-318723842
This is about the issue around sensors utilizing a lot of worker slots. The
context is a PR from @shaform introducing sensors that check once and give
It happens mostly when the scheduler is catching up. More specifically,
when I load a brand new DAG with a start date in the past. Usually I have
it set to run 5 DAG runs at the same time, and up to 16 tasks at the same
time.
What I've also noticed is that the tasks will sit completed in reality
By the time "INFO - Task exited with return code 0" gets logged, the task
should have been marked as successful by the subprocess. I have no specific
intuition as to what the issue may be.
I'm guessing at that point the job stops emitting heartbeat and eventually
the scheduler will handle it as a
>From what I can tell, it only affects CeleryExecutor. I've never seen this
behavior with LocalExecutor before.
Max, do you know anything about this type of failure mode?
ᐧ
--
Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement |
New Relic
On Fri, Jul 28, 2017 at 5:48 AM,
Wouldn't `email_on_failure=True` work for you?
https://airflow.incubator.apache.org/code.html?highlight=email_on_failure#baseoperator
On Fri, Jul 28, 2017 at 9:32 AM, Andrew Maguire
wrote:
> Hey,
>
> Just wondering if anyone knows if there might be a way to only send
Hey,
Just wondering if anyone knows if there might be a way to only send email
on the last failed try of a task?
Could I use a callable on failure only send the mail on the last failed
try.
We are using big query and getting lots of transient errors around limit of
concurrent queries that
Hi,
I was looking for a sensor that allowed me to customize the sensing logic.
I couldn't find any way of doing this and wrote a BashSensor, which returns
True or False based on a command/script's return code. The implementation
is similar to the BashOperator, and it seems to me that it would fit
We have the exact same problem. In our case, it's a bash operator starting
a docker container. The container and process it ran exit, but the 'docker
run' command is still showing up in the process table, waiting for an event.
I'm trying to switch to LocalExecutor to see if that will help.
_jonas
19 matches
Mail list logo