[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358482#comment-15358482
 ] 

Kengo Seki commented on AIRFLOW-246:


Thanks ;) I'm going to file a new issue as a follow-up and fix query on that.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
> Fix For: Airflow 1.8
>
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358077#comment-15358077
 ] 

Chris Riccomini commented on AIRFLOW-246:
-

I have no strong preference. Either JIRA or UPDATING.md or fixing query is fine 
with me. :) [~sekikn], since I think you'll end up doing the work--your call.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
> Fix For: Airflow 1.8
>
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358041#comment-15358041
 ] 

Kengo Seki commented on AIRFLOW-246:


bq. it delivers different results than before, as it double counts the TI 
belonging to the last DAG run if it is still running?

Indeed. I totally missed that. Thanks for pointing out!

bq. please open a JIRA about updating UPDATES.md.

As another option, adding {{AND (state IS NULL OR state <> 'running')}} to the 
WHERE clause in the second subquery will fix this.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
> Fix For: Airflow 1.8
>
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Matthias Huschle (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356998#comment-15356998
 ] 

Matthias Huschle commented on AIRFLOW-246:
--

Do I understand the query correctly, that it delivers different results than 
before, as it double counts the TI belonging to the last DAG run if it is still 
running? Not that I'd consider this a bad trade-off, but it might be worth 
noting it.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-29 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356332#comment-15356332
 ] 

Kengo Seki commented on AIRFLOW-246:


[~nhanlon] the patch seems to be merged into master at 
https://github.com/apache/incubator-airflow/commit/a2ed55f2c40857efaaf1876de5bc7929cbfb3166.
Thanks a lot [~bolke]!

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-17 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336365#comment-15336365
 ] 

Kengo Seki commented on AIRFLOW-246:


bq. Looks good to me. Query only takes ~4 seconds to run on our db (~750k rows).

I'm glad to hear that. Thanks for confirming ;)

bq. maybe it should also be looked at whether or not the whole thing should be 
rearchitected... both of the queries (original, and the one you wrote) are 
pretty ugly.

Well, I'm not sure whether it should be, since I'm a newbie to Airflow. I've 
just started evaluation and contribution to Airflow a month ago. I hope Airflow 
experts give us some advice, workarounds, etc.
I agree that these queries are not so readable, but I also think it's somewhat 
inevitable as far as using ORM such as SQLAlchemy. The original python code 
which generates the query in question is not so difficult to read to me.

bq. Let me know if you get a code fix for this and I'd be happy to test it.

Sure. I hope I can submit the first PR this weekend.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-17 Thread Neil Hanlon (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336229#comment-15336229
 ] 

Neil Hanlon commented on AIRFLOW-246:
-

[~sekikn] Looks good to me. Query only takes ~4 seconds to run on our db (~750k 
rows).

Obviously with more task instances the actual size of this query will increase 
which will require database tuning to allow it to run (max_allowed_packet) and 
all that... so maybe it should also be looked at whether or not the whole thing 
should be rearchitected... both of the queries (original, and the one you 
wrote) are pretty ugly. I'm not saying all queries have to be pretty, but...

Let me know if you get a code fix for this and I'd be happy to test it.

(new query profile for posterity)

{pre}
airflow-rwdb01:airflow> show profile for query 1;
++--+
| Status | Duration |
++--+
| starting   | 0.33 |
| Waiting for query cache lock   | 0.05 |
| Waiting on query cache mutex   | 0.04 |
| checking query cache for query | 0.002673 |
| checking permissions   | 0.06 |
| checking permissions   | 0.04 |
| checking permissions   | 0.03 |
| checking permissions   | 0.06 |
| Opening tables | 0.25 |
| System lock| 0.001740 |
| optimizing | 0.05 |
| statistics | 0.17 |
| preparing  | 0.08 |
| executing  | 0.07 |
| Sorting result | 0.04 |
| Sending data   | 0.000855 |
| optimizing | 0.09 |
| statistics | 0.09 |
| preparing  | 0.10 |
| executing  | 0.04 |
| Sending data   | 0.000622 |
| optimizing | 0.73 |
| statistics | 1.371602 |
| preparing  | 0.000239 |
| executing  | 0.08 |
| Sending data   | 0.407134 |
| optimizing | 0.000145 |
| statistics | 1.176737 |
| preparing  | 0.000174 |
| executing  | 0.06 |
| Sending data   | 0.395361 |
| optimizing | 0.11 |
| statistics | 0.14 |
| preparing  | 0.12 |
| executing  | 0.05 |
| Sending data   | 0.000654 |
| removing tmp table | 0.21 |
| Sending data   | 0.13 |
| Waiting for query cache lock   | 0.05 |
| Waiting on query cache mutex   | 0.05 |
| Sending data   | 0.001048 |
| init   | 0.23 |
| optimizing | 0.06 |
| statistics | 0.07 |
| preparing  | 0.08 |
| Creating tmp table | 0.000386 |
| executing  | 0.06 |
| Copying to tmp table   | 0.005354 |
| Sorting result | 0.91 |
| Sending data   | 0.90 |
| end| 0.04 |
| removing tmp table | 0.000169 |
| end| 0.06 |
| query end  | 0.07 |
| closing tables | 0.04 |
| removing tmp table | 0.07 |
| closing tables | 0.04 |
| removing tmp table | 0.07 |
| closing tables | 0.04 |
| removing tmp table | 0.05 |
| closing tables | 0.13 |
| freeing items  | 0.000144 |
| Waiting for query cache lock   | 0.04 |
| Waiting on query cache mutex   | 0.03 |
| freeing items  | 0.57 |
| Waiting for query cache lock   | 0.04 |
| Waiting on query cache mutex   | 0.05 |
| freeing items  | 0.04 |
| storing result in query cache  | 0.07 |
| logging slow query | 0.04 |
| logging slow query | 0.000221 |
| cleaning up| 0.13 |
++--+
72 rows in set (0.00 sec)
{pre}

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that enti

[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-17 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335724#comment-15335724
 ] 

Kengo Seki commented on AIRFLOW-246:


Multiple left outer joins seem to affect performance. I think we can rewrite 
the query in question by replacing left outer join with inner join and union, 
such as:

{code:sql}
SELECT
  dag_id AS task_instance_dag_id,
  state AS task_instance_state,
  count(*) as count_1
FROM (
  SELECT
task_instance.dag_id,
task_instance.state
  FROM
task_instance
  JOIN (
SELECT
  dag_run.dag_id AS dag_id,
  dag_run.execution_date AS execution_date
FROM
  dag_run
WHERE
  dag_run.state = 'running'
  ) AS running_dag_run
  ON
running_dag_run.dag_id = task_instance.dag_id
  AND
running_dag_run.execution_date = task_instance.execution_date
  WHERE
task_id IN ...

  UNION ALL

  SELECT
task_instance.dag_id,
task_instance.state
  FROM
task_instance
  JOIN (
SELECT
  dag_run.dag_id AS dag_id,
  max(dag_run.execution_date) AS execution_date
FROM
  dag_run
GROUP BY
  dag_run.dag_id
  ) AS last_dag_run
  ON
last_dag_run.dag_id = task_instance.dag_id
  AND
last_dag_run.execution_date = task_instance.execution_date
  WHERE
task_id IN ...
) t
GROUP BY
  dag_id,
  state;
{code}

I compared these queries with some dummy data, and got x3-4 improvement.

{code}
mysql> select count(*) from dag_run;
+--+
| count(*) |
+--+
| 3417 |
+--+
1 row in set (0.00 sec)

mysql> select count(*) from task_instance;
+--+
| count(*) |
+--+
|   229089 |
+--+
1 row in set (0.00 sec)

mysql> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state 
AS task_instance_state, count(task_instance.task_id) AS count_1 FROM 
task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
'running') AS running_dag_run ON running_dag_run.dag_id = task_instance.dag_id 
AND running_dag_run.execution_date = task_instance.execution_date LEFT OUTER 
JOIN (SELECT dag_run.dag_id AS dag_id, max(dag_run.execution_date) AS 
execution_date FROM dag_run GROUP BY dag_run.dag_id) AS last_dag_run ON 
last_dag_run.dag_id = task_instance.dag_id AND last_dag_run.execution_date = 
task_instance.execution_date WHERE task_instance.task_id IN ('all_success', 
'also_run_this', 'always_true_1', 'always_true_2', 'bash_task', 'branching', 
'branch_a', 'branch_b', 'branch_c', 'branch_d', 'condition', 
'condition_is_False', 'condition_is_True', 'del_op', 'end', 'false_1', 
'false_2', 'final_1', 'final_2', 'follow_branch_a', 'follow_branch_b', 
'follow_branch_c', 'follow_branch_d', 'get_op', 'http_sensor_check', 'join', 
'one_success', 'oper_1', 'oper_2', 'post_op', 'post_op_formenc', 'print_date', 
'puller', 'push', 'push_by_returning', 'put_op', 'runme_0', 'runme_1', 
'runme_2', 'run_after_loop', 'run_this', 'run_this_first', 'run_this_last', 
'section-1', 'section-1-task-1', 'section-1-task-2', 'section-1-task-3', 
'section-1-task-4', 'section-1-task-5', 'section-2', 'section-2-task-1', 
'section-2-task-2', 'section-2-task-3', 'section-2-task-4', 'section-2-task-5', 
'skip_operator_1', 'skip_operator_2', 'sleep', 'some-other-task', 'start', 
'templated', 'test_trigger_dagrun', 'true_1', 'true_2') AND 
(running_dag_run.dag_id IS NOT NULL OR last_dag_run.dag_id IS NOT NULL) GROUP 
BY task_instance.dag_id, task_instance.state; 
+-+-+-+
| task_instance_dag_id| task_instance_state | count_1 |
+-+-+-+
| example_bash_operator   | success |   6 |
| example_branch_dop_operator_v3  | NULL|   3 |
| example_branch_operator | skipped |   6 |
| example_branch_operator | success |   5 |
| example_http_operator   | failed  |   1 |
| example_http_operator   | upstream_failed |   5 |
| example_passing_params_via_test_command | success |   2 |
| example_short_circuit_operator  | skipped |   2 |
| example_short_circuit_operator  | success |   4 |
| example_skip_dag| skipped |   4 |
| example_skip_dag| success |   4 |
| example_subdag_operator | success |   5 |
| example_trigger_controller_dag  | success |   1 |
| example_trigger_target_dag  | success |   2 |
| example_xcom| success |   3 |
| tutorial