Hey all,
I have sent in a PR and JIRA here:
https://github.com/apache/incubator-airflow/pull/2021
https://issues.apache.org/jira/browse/AIRFLOW-807
Please have a look.
EDIT: I see Arthur just did haha.
Cheers,
Chris
On Tue, Jan 24, 2017 at 9:41 PM, Chris Riccomini
Can you rebuild your indexes and recompute the table's stats and see if the
optimizer is still off tracks?
Assuming InnoDB and from memory:
OPTIMIZE TABLE task_instances;
ANALYZE TABLE task_instances;
Max
On Mon, Jan 23, 2017 at 3:45 PM, Arthur Wiedmer
wrote:
>
Maybe we can start with
" .with_hint(TI, 'USE INDEX (PRIMARY)', dialect_name='mysql')"
and see if other databases exhibit the same query plan issue ?
Best,
Arthur
On Mon, Jan 23, 2017 at 3:27 PM, Chris Riccomini
wrote:
> With this patch:
>
> $ git diff
> diff --git
With this patch:
$ git diff
diff --git a/airflow/jobs.py b/airflow/jobs.py
index f1de333..9d08e75 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -544,6 +544,7 @@ class SchedulerJob(BaseJob):
.query(
TI.task_id,
OK, it's using `state` instead of PRIMARY. Using PRIMARY with a hint, query
takes .47s. Without hint, 10s. Going to try and patch.
On Mon, Jan 23, 2017 at 2:57 PM, Chris Riccomini
wrote:
> This inner query takes 10s:
>
> SELECT task_instance.task_id AS task_id,
This inner query takes 10s:
SELECT task_instance.task_id AS task_id, max(task_instance.execution_date)
AS max_ti
FROM task_instance
WHERE task_instance.dag_id = 'dag1' AND task_instance.state = 'success' AND
task_instance.task_id IN ('t1', 't2') GROUP BY task_instance.task_id
Explain seems OK:
Can confirm it's a slow query on task_instance table. Still digging.
Unfortunately, the query is truncated in my UI right now:
SELECT task_instance.task_id AS task_instance_...
On Mon, Jan 23, 2017 at 1:56 PM, Chris Riccomini
wrote:
> Digging. Might be a bit.
>
> On Mon,
Digging. Might be a bit.
On Mon, Jan 23, 2017 at 1:32 PM, Bolke de Bruin wrote:
> Slow query log? Db load?
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 23 jan. 2017 om 21:59 heeft Chris Riccomini
> het volgende geschreven:
> >
> > Note: 6.5 million
Slow query log? Db load?
B.
Verstuurd vanaf mijn iPad
> Op 23 jan. 2017 om 21:59 heeft Chris Riccomini het
> volgende geschreven:
>
> Note: 6.5 million TIs in the task_instance table.
>
> On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini
>
Note: 6.5 million TIs in the task_instance table.
On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini
wrote:
> Hey Bolke,
>
> Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free
> as well.
>
> I am thinking that this is DB related, given that it's
Hey Bolke,
Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free
as well.
I am thinking that this is DB related, given that it's pausing when
executing an update. Was looking at the update_state method in models.py,
which logs right before the 15s pause.
Cheers,
Chris
On
Oops, yes, 15 seconds, sorry. Operating without much sleep. :P
On Mon, Jan 23, 2017 at 12:35 PM, Arthur Wiedmer
wrote:
> Chris,
>
> Just double checking, you mean more than 15 seconds not 15 minutes, right?
>
> Best,
> Arthur
>
> On Mon, Jan 23, 2017 at 12:27 PM,
Also, seeing this in EVERY task that runs:
[2017-01-23 20:26:13,777] {jobs.py:2112} WARNING - State of this
instance has been externally set to queued. Taking the poison pill. So
long.
[2017-01-23 20:26:13,841] {jobs.py:2051} INFO - Task exited with return code 0
All successful tasks are
Chris,
Just double checking, you mean more than 15 seconds not 15 minutes, right?
Best,
Arthur
On Mon, Jan 23, 2017 at 12:27 PM, Chris Riccomini
wrote:
> Hey all,
>
> I've upgraded on production. Things seem to be working so far (only been an
> hour), but I am seeing
Hey all,
I've upgraded on production. Things seem to be working so far (only been an
hour), but I am seeing this in the scheduler logs:
File Path PID
RuntimeLast RuntimeLast Run
I created:
- AIRFLOW-791: At start up all running dag_runs are being checked, but not fixed
- AIRFLOW-790: DagRuns do not exist for certain tasks, but don’t get fixed
- AIRFLOW-788: Context unexpectedly added to hive conf
- AIRFLOW-792: Allow fixing of schedule when wrong start_date / interval
I'd be happy to lend a hand fixing these issues and hopefully some others
are too. Do you mind creating jiras for these since you have the full
context? I have created a JIRA for (1) and have assigned it to myself:
https://issues.apache.org/jira/browse/AIRFLOW-780
On Fri, Jan 20, 2017 at 1:01 AM,
17 matches
Mail list logo