Re: Experiences with 1.8.0

2017-01-25 Thread Chris Riccomini
Hey all, I have sent in a PR and JIRA here: https://github.com/apache/incubator-airflow/pull/2021 https://issues.apache.org/jira/browse/AIRFLOW-807 Please have a look. EDIT: I see Arthur just did haha. Cheers, Chris On Tue, Jan 24, 2017 at 9:41 PM, Chris Riccomini

Re: Experiences with 1.8.0

2017-01-23 Thread Maxime Beauchemin
Can you rebuild your indexes and recompute the table's stats and see if the optimizer is still off tracks? Assuming InnoDB and from memory: OPTIMIZE TABLE task_instances; ANALYZE TABLE task_instances; Max On Mon, Jan 23, 2017 at 3:45 PM, Arthur Wiedmer wrote: >

Re: Experiences with 1.8.0

2017-01-23 Thread Arthur Wiedmer
Maybe we can start with " .with_hint(TI, 'USE INDEX (PRIMARY)', dialect_name='mysql')" and see if other databases exhibit the same query plan issue ? Best, Arthur On Mon, Jan 23, 2017 at 3:27 PM, Chris Riccomini wrote: > With this patch: > > $ git diff > diff --git

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
With this patch: $ git diff diff --git a/airflow/jobs.py b/airflow/jobs.py index f1de333..9d08e75 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -544,6 +544,7 @@ class SchedulerJob(BaseJob): .query( TI.task_id,

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
OK, it's using `state` instead of PRIMARY. Using PRIMARY with a hint, query takes .47s. Without hint, 10s. Going to try and patch. On Mon, Jan 23, 2017 at 2:57 PM, Chris Riccomini wrote: > This inner query takes 10s: > > SELECT task_instance.task_id AS task_id,

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
This inner query takes 10s: SELECT task_instance.task_id AS task_id, max(task_instance.execution_date) AS max_ti FROM task_instance WHERE task_instance.dag_id = 'dag1' AND task_instance.state = 'success' AND task_instance.task_id IN ('t1', 't2') GROUP BY task_instance.task_id Explain seems OK:

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Can confirm it's a slow query on task_instance table. Still digging. Unfortunately, the query is truncated in my UI right now: SELECT task_instance.task_id AS task_instance_... On Mon, Jan 23, 2017 at 1:56 PM, Chris Riccomini wrote: > Digging. Might be a bit. > > On Mon,

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Digging. Might be a bit. On Mon, Jan 23, 2017 at 1:32 PM, Bolke de Bruin wrote: > Slow query log? Db load? > > B. > > Verstuurd vanaf mijn iPad > > > Op 23 jan. 2017 om 21:59 heeft Chris Riccomini > het volgende geschreven: > > > > Note: 6.5 million

Re: Experiences with 1.8.0

2017-01-23 Thread Bolke de Bruin
Slow query log? Db load? B. Verstuurd vanaf mijn iPad > Op 23 jan. 2017 om 21:59 heeft Chris Riccomini het > volgende geschreven: > > Note: 6.5 million TIs in the task_instance table. > > On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini >

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Note: 6.5 million TIs in the task_instance table. On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini wrote: > Hey Bolke, > > Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free > as well. > > I am thinking that this is DB related, given that it's

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Hey Bolke, Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free as well. I am thinking that this is DB related, given that it's pausing when executing an update. Was looking at the update_state method in models.py, which logs right before the 15s pause. Cheers, Chris On

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Oops, yes, 15 seconds, sorry. Operating without much sleep. :P On Mon, Jan 23, 2017 at 12:35 PM, Arthur Wiedmer wrote: > Chris, > > Just double checking, you mean more than 15 seconds not 15 minutes, right? > > Best, > Arthur > > On Mon, Jan 23, 2017 at 12:27 PM,

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Also, seeing this in EVERY task that runs: [2017-01-23 20:26:13,777] {jobs.py:2112} WARNING - State of this instance has been externally set to queued. Taking the poison pill. So long. [2017-01-23 20:26:13,841] {jobs.py:2051} INFO - Task exited with return code 0 All successful tasks are

Re: Experiences with 1.8.0

2017-01-23 Thread Arthur Wiedmer
Chris, Just double checking, you mean more than 15 seconds not 15 minutes, right? Best, Arthur On Mon, Jan 23, 2017 at 12:27 PM, Chris Riccomini wrote: > Hey all, > > I've upgraded on production. Things seem to be working so far (only been an > hour), but I am seeing

Re: Experiences with 1.8.0

2017-01-23 Thread Chris Riccomini
Hey all, I've upgraded on production. Things seem to be working so far (only been an hour), but I am seeing this in the scheduler logs: File Path PID RuntimeLast RuntimeLast Run

Re: Experiences with 1.8.0

2017-01-22 Thread Bolke de Bruin
I created: - AIRFLOW-791: At start up all running dag_runs are being checked, but not fixed - AIRFLOW-790: DagRuns do not exist for certain tasks, but don’t get fixed - AIRFLOW-788: Context unexpectedly added to hive conf - AIRFLOW-792: Allow fixing of schedule when wrong start_date / interval

Re: Experiences with 1.8.0

2017-01-20 Thread Dan Davydov
I'd be happy to lend a hand fixing these issues and hopefully some others are too. Do you mind creating jiras for these since you have the full context? I have created a JIRA for (1) and have assigned it to myself: https://issues.apache.org/jira/browse/AIRFLOW-780 On Fri, Jan 20, 2017 at 1:01 AM,