update DAG and the

2016-05-07 Thread Jason Chen
Hi I have two questions: (1) Is it possible to update the DAG ? Say, the task name in the DAG, the start_date of the DAG and the dependencies of the involved tasks ? (2) For the airflow UI: "Tree view", it lists the tasks along with the time highlighted in the top (say, 08:30; 09:00, etc). What

Time zone used in "Tree view" and task order

2016-05-16 Thread Jason Chen
I have two questions (1) For the airflow UI: "Tree view", it lists the tasks along with the time highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of time? It looks not the UTC time of the task was running. I know in overall, airflow uses UTC time (2) I have a DAG with two tasks

Re: Time zone used in "Tree view" and task order

2016-05-27 Thread Jason Chen
27;ll have two runs > going at the same time. If you want to prevent this, you can set > max_active_runs to 1 in your DAG. > > Cheers, > Chris > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen > wrote: > > > I have two questions > > > > (1) For the airflow

Re: Time zone used in "Tree view" and task order

2016-05-31 Thread Jason Chen
DAGs. This will allow multiple >(different) DAGs to run in parallel, but only one DAG of each type can run >at the same type. > >Cheers, >Chris > >On Fri, May 27, 2016 at 11:42 PM, Jason Chen >wrote: > >> Hi Chris, >> Thanks for your reply. After setting it up, I

Re: Time zone used in "Tree view" and task order

2016-05-31 Thread Jason Chen
rent) DAGs to run in parallel, but only one DAG of each type can run > at the same type. > > Cheers, > Chris > > On Fri, May 27, 2016 at 11:42 PM, Jason Chen > wrote: > > > Hi Chris, > > Thanks for your reply. After setting it up, I observed how it works for

Re: Time zone used in "Tree view" and task order

2016-05-31 Thread Jason Chen
Chris, I am running SequentialExecutor. Thanks. Jason On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini wrote: > Hey Jason, > > Are you running the SerialExecutor? This is the default out-of-the-box > executor. > > Cheers, > Chris > > On Tue, May 31, 2016 at 12

Re: Time zone used in "Tree view" and task order

2016-05-31 Thread Jason Chen
ing purposes. Try switching to the LocalExecutor. > > Cheers, > Chris > > On Tue, May 31, 2016 at 3:31 PM, Jason Chen > wrote: > >> Chris, >> I am running SequentialExecutor. >> >> Thanks. >> Jason >> >> >> On Tue, May 31, 201

Upgrade from 1.7.0 to 1.7.1.2

2016-06-10 Thread Jason Chen
Hi, I upgraded airflow from 1.7.0 to 1.7.1.2 (using "pip install airflow --upgrade"). Do I need to any "post upgrade" procedures ? For example, any DB schema changes (I am using MySQL). Thanks. Jason Chen

Re: Upgrade from 1.7.0 to 1.7.1.2

2016-06-10 Thread Jason Chen
so any new migrations > > will be applied. > > > > > > On Fri, Jun 10, 2016 at 12:35 PM, Jason Chen > > wrote: > > > > > Hi, > > > > > > I upgraded airflow from 1.7.0 to 1.7.1.2 (using "pip install airflow > > > --upgrade

airflow v1.7.1.3 not sending email

2016-06-14 Thread Jason Chen
Hi I am using airflow v1.7.1.3 and it seems email is not working and I got the following error message. Any suggestions ? Thanks. Jason Chen (1) Error message [2016-06-15 00:21:15,506] {models.py:1304} INFO - All retries failed; marking task as FAILED [2016-06-15 00:21:15,509] {models.py

Re: airflow v1.7.1.3 not sending email

2016-06-14 Thread Jason Chen
cool. it' working fine ! On Tue, Jun 14, 2016 at 5:58 PM, Rob Froetscher wrote: > I just addressed this issue in UPDATING.md under 1.7.1.2 > https://github.com/apache/incubator-airflow/blob/master/UPDATING.md > > On Tue, Jun 14, 2016 at 5:56 PM, Jason Chen > wrote: > >

Question about airflow scheduler

2016-06-19 Thread Jason Chen
Hi Airflow team, To run "airflow scheduler", you can specify an argument "SCHEDULER_RUNS" Something like "airflow scheduler -n ${SCHEDULER_RUNS}". What it's used for ? Side effects when setting it to un-limit ? Thanks. Jason

Questions about upstreams

2016-07-01 Thread Jason Chen
Hi, Airflow is great to allow setting multiple upstream tasks. Say, task3 can have [task1, task2] as upstreams. My understanding is that task3 will be triggered only BOTH task1 and task2 are successful. Is that right ? Thanks. Jason

Running a task from the Airflow UI

2016-07-05 Thread Jason Chen
Hi Airflow team, I am using the "LocalExecutor" and it works very well to run the workflow I setup. I noticed that, from the UI, it can trigger a task to run. However, I got the error "Only works with the CeleryExecutor, sorry ". I can ssh into airflow node and run the command line from there. Ho

Delete a dag ?

2016-08-25 Thread Jason Chen
Hi, How to delete a dag in airflow (instead of turning it off ) ? Thanks. Jason

DAGs display together

2016-08-30 Thread Jason Chen
Hi team, In airflow main UI, it could be display many DAGs. Is it possible to visually grouping these DAGs based on some grouping rules ? Given the name is ordered alphabetically, we can name it by `group_name.function_name.major_version`. So, we can see related jobs together easier. But, I

Airflow webserver responses slow intermittently

2016-08-31 Thread Jason Chen
Hi airflow team, I am using airflow v1.7.1.3 (gunicorn 19.3.0) It seems airflow webserver responses slow intermittently. It sometimes takes time to just go to the home page of airflow DAG UI. I tried to start webserver in debug mode (`airflow webserver -d`) and can see the client/UI requests.

Re: Airflow webserver responses slow intermittently

2016-09-02 Thread Jason Chen
seem to have affected the web > session. This is just a bit of a wild guess though, haven't debugged this. > > HTH, > > Koen > > > > On Thu, Sep 1, 2016 at 7:49 AM, Jason Chen > wrote: > > > Hi airflow team, > > I am using airflow v1.7.1

About airflow scheduler

2016-09-03 Thread Jason Chen
Hi airflow team, We setup airflow as a upstart service using the suggestion here https://github.com/apache/incubator-airflow/blob/master/scripts/upstart/airflow-scheduler.conf#L33 We set SCHEDULER_RUNS=0 (unlimit). We notice that using "ps aux | grep airflow", it indicates several scheduler run

Name the DAG and the file for DAG ?

2016-09-04 Thread Jason Chen
Hi airflow team, I named my DAG file as something like "abc.def.v1.py" and the ID as "abc.def.v1". I noticed when I tried to view the codes from UI. It drops exceptions like "...No module named unusual_prefix_abc.def.v1..." and details as below. I am thinking it's because of the python hierarch

Re: Name the DAG and the file for DAG ?

2016-09-05 Thread Jason Chen
7;.'s (dots) are not supported in DAG > names. May I recommend the use of underscores? > > Best, > Arthur > > On Sun, Sep 4, 2016 at 8:09 PM, Jason Chen > wrote: > > Hi airflow team, > > > > I named my DAG file as something like "abc.def.v1.py"

Dynamic numbers of tasks ?

2016-09-13 Thread Jason Chen
Hi airflow team, I have a data pipeline as below task1 --> task2(parameter) --> task3(parameter) The parameters of task2 and task3 are based on the outputs from task1. When it runs, the task1 will create a list of data, say list[a, b, c]. Then, I want to process to run as like below. Is it

Usage of "on_failure_callback" ?

2016-10-18 Thread Jason Chen
Hi airflow team, Is there any sample code to use "on_failure_callback" ? I tried to use that as a callback to "post-process" when a task fails. However, I cannot make it work. The definition of my task is as below (just one task in my dag). It invokes "task0_python_callable" which executes a com

Re: Usage of "on_failure_callback" ?

2016-10-18 Thread Jason Chen
hon_callable(*self.op_args, **self.op_kwargs) > File "/home/oracle/airflow/dags/fail_callback.py", line 33, in > task0_python_callable > print 1/0 > ZeroDivisionError: integer division or modulo by zero > [2016-10-18 16:07:03,324] {models.py:1306} INFO - Marking task

Questions on Airflow with CeleryExecutor

2016-11-02 Thread Jason Chen
Hi Airflow team, We are using Airflow with LocalExecutor and it works great. We are moving toward to use CeleryExecutor and have couple of questions. I searched the posts and cannot find some answers. We have 3 airflow worker nodes and uses Redis as broker. (1) How airflow worker determines the

Airflow + Celery + SQS

2017-01-28 Thread Jason Chen
Hi Airflow team, Celery 4 supports AWS SQS http://docs.celeryproject.org/en/latest/getting-started/brokers/sqs.html We are using Airflow 1.7.1.3 Is there any problem, if we change config to use SQS for CeleryExecutor ? Thanks. Jason

Re: Airflow + Celery + SQS

2017-01-30 Thread Jason Chen
0, 2017 at 9:59 AM, Jeremiah Lowin wrote: > > > Jason, > > > > I don't believe Airflow cares about Celery's backend as long as the task > > API remains the same. You should be OK (though I haven't tested to > > confirm). > > > > J > &g

Re: scheduler running on multiple nodes

2017-02-24 Thread Jason Chen
A side question related to this topic: I am running Airflow w/ celery executor in multiple nodes. Each node is running celery, worker, scheduler and webserver. These nodes are registered to a Redis for celery queue and these nodes are sharing the same dags, logs folder (and MySQL) It seems running

Re: scheduler running on multiple nodes

2017-02-24 Thread Jason Chen
> instance. > > Best, > Arthur > > On Fri, Feb 24, 2017 at 11:04 AM, Jason Chen > wrote: > > > A side question related to this topic: > > I am running Airflow w/ celery executor in multiple nodes. Each node is > > running celery, worker, scheduler and webse

High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
Hi team, We are using airflow v1.7.1.3 and schedule about 50 dags (each dags is about 10 to one hour intervals). It's with LocalExecutor. Recently, we noticed the RDS (MySQL 5.6.x with AWS) runs with ~100% CPU. I am wondering if airflow scheduler and webserver can cause high CPU load of MySQL, g

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
anks for opening this. I would love to hear on how people are working > around this. > > > > > > On Tue, Mar 7, 2017 at 9:42 AM, Jason Chen > wrote: > > > Hi team, > > > > We are using airflow v1.7.1.3 and schedule about 50 dags (each dags is > > abo

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
eat = 5 sec so > that we do not lose time when a task is ready to run (I think there is > another known bug here - tasks dont move from one queued -> running state > even after "job heartbeat" ) > > > On Tue, Mar 7, 2017 at 10:41 AM, Jason Chen > wrote: > &

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
, that cannot share memory. Curl can cache > memory itself as well. You probably have peak times and longer running > tasks so it is not evenly spread, then it starts adding up quickly? > > Bolke. > > > > On 7 Mar 2017, at 19:41, Jason Chen wrote: > > > > Hi Harish,

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
WHERE ((ti.start_date <= DATE_SUB(NOW(), INTERVAL 30 DAY) AND > ti.state != "running") OR >(ISNULL(ti.start_date) AND > ti.state = "failed")) AND > (ISNULL(dr.id) OR dr.state != "ru

Re: High load in CPU of MySQL when running airflow

2017-03-08 Thread Jason Chen
; Max > > On Tue, Mar 7, 2017 at 9:57 PM, Dan Davydov invalid> > wrote: > > > We will need to come up with a plan soon (better DB indexes and/or the > > ability to rotate out old task instances according to some policy). > Nothing > > concrete as of yet though. >

Re: delays in UI causing confusion

2017-03-08 Thread Jason Chen
Matt, In our use cases, we run airflow scheduler and webserver as services. Restart webserver service can reduce the delay. -Jason On Wed, Mar 8, 2017 at 5:56 PM, Matt Martin wrote: > Hello, > > We've been testing out airflow and the delay between when people update > their DAGs and when those

How to handle this case "Another instance is running, skipping" ?

2017-05-20 Thread Jason Chen
Hi Airflow team, I am using airflow with celery (2 nodes; i.e., two AWS instances) My dag looks like below (the python dag name is task_ABC.py). Note in the dag python file, I setup "max_active_runs=1" /-> TaskB1 ---> TaskC1-\ TaskA ---> TaskB2 --

Re: How to handle this case "Another instance is running, skipping" ?

2017-05-25 Thread Jason Chen
here are a handful of reasons it might be showing up in your logs. > > Which version of Airflow are you running? Is your scheduler set to restart > periodically? Are you running more than one scheduler? > > On Sat, May 20, 2017 at 6:53 PM Jason Chen > wrote: > > > Hi Airflow

Quick questions on Airflow with CeleryExecutor

2017-06-03 Thread Jason Chen
Hi Airflow team, (1) I am running airflow v1.7.1.3 with CeleryExecutor (2) I have two instances running with same Redis queue: (a) Instance1: is running webserver, scheduler, worker and flower (b) Instance2: is running webserver, worker and flower (3) In instance2, I noticed there is somethi

Re: Tasks Queued but never run

2017-06-07 Thread Jason Chen
I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this issue. I am wondering if this issue is only for 1.8.x ? On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce wrote: > Depending on how fast you can clear down your queue, -n can be harmful and > really stack up your celery queue. Ke

Exception "killed as zombie"

2017-06-12 Thread Jason Chen
Hi Airflow team, I am running airflow 1.7.1.3 with celery (3 workers) and Redis as queue. I use most default settings for airflow config. One scheduler. It occasionally drops exception "killed as zombie" for a particular task (a task is running a shell script), but my scheduler is still runni

Re: Exception "killed as zombie"

2017-06-15 Thread Jason Chen
Hi, Is any feedback/guidance on this ? Thanks. -Jason On Mon, Jun 12, 2017 at 9:03 PM, Jason Chen wrote: > Hi Airflow team, > > I am running airflow 1.7.1.3 with celery (3 workers) and Redis as queue. > I use most default settings for airflow config. > One sch