[jira] [Updated] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0
[ https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-32: -- Description: A number of features have been marked for deprecation in Airflow 2.0. They need to be deleted prior to release. Usually the error message or comments will mention Airflow 2.0 with either a #TODO or #FIXME. Tracking list (not necessarily complete!): JIRA: AIRFLOW-31 AIRFLOW-200 GitHub: https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233 https://github.com/airbnb/airflow/pull/1219 https://github.com/airbnb/airflow/pull/1285 was: A number of features have been marked for deprecation in Airflow 2.0. They need to be deleted prior to release. Usually the error message or comments will mention Airflow 2.0 with either a #TODO or #FIXME. Tracking list (not necessarily complete!): JIRA: AIRFLOW-31 GitHub: https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233 https://github.com/airbnb/airflow/pull/1219 https://github.com/airbnb/airflow/pull/1285 > Remove deprecated features prior to releasing Airflow 2.0 > - > > Key: AIRFLOW-32 > URL: https://issues.apache.org/jira/browse/AIRFLOW-32 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Jeremiah Lowin > Labels: deprecated > Fix For: Airflow 2.0 > > > A number of features have been marked for deprecation in Airflow 2.0. They > need to be deleted prior to release. > Usually the error message or comments will mention Airflow 2.0 with either a > #TODO or #FIXME. > Tracking list (not necessarily complete!): > JIRA: > AIRFLOW-31 > AIRFLOW-200 > GitHub: > https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233 > https://github.com/airbnb/airflow/pull/1219 > https://github.com/airbnb/airflow/pull/1285 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating
[ https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358097#comment-15358097 ] Rob Froetscher commented on AIRFLOW-303: Ah no worries, figured it was a bot because it was on the github PR > EMR Operator job_flow_overrides allow for templating > > > Key: AIRFLOW-303 > URL: https://issues.apache.org/jira/browse/AIRFLOW-303 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Rob Froetscher >Assignee: Rob Froetscher >Priority: Minor > > See discussion here: https://github.com/apache/incubator-airflow/pull/1630 > Idea is that we want to allow templating on a dictionary argument to an > operator. This is probably best done by introspecting what are the strings in > the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating
[ https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358079#comment-15358079 ] Chris Riccomini commented on AIRFLOW-303: - Derp, sorry. Mistakenly closed this. > EMR Operator job_flow_overrides allow for templating > > > Key: AIRFLOW-303 > URL: https://issues.apache.org/jira/browse/AIRFLOW-303 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Rob Froetscher >Assignee: Rob Froetscher >Priority: Minor > > See discussion here: https://github.com/apache/incubator-airflow/pull/1630 > Idea is that we want to allow templating on a dictionary argument to an > operator. This is probably best done by introspecting what are the strings in > the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query
[ https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358077#comment-15358077 ] Chris Riccomini commented on AIRFLOW-246: - I have no strong preference. Either JIRA or UPDATING.md or fixing query is fine with me. :) [~sekikn], since I think you'll end up doing the work--your call. > dag_stats endpoint has a terrible query > --- > > Key: AIRFLOW-246 > URL: https://issues.apache.org/jira/browse/AIRFLOW-246 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 > Environment: MySQL Backend through sqlalchemy >Reporter: Neil Hanlon >Assignee: Kengo Seki > Fix For: Airflow 1.8 > > > Hitting this endpoint creates a series of queries on the database which take > over 20 seconds to run, causing the page to not load for that entire time. > Luckily the main page (which includes this under "Recent Statuses") loads > this synchronously, but still... waiting almost half a minute (at times more) > to see the statuses for dags is really not fun. > We have less than a million rows in the task_instance table--so it's not even > a problem with that. > Here's a query profile for the query: > https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf > We've done some optimizations on the database, but to no avail. > The query: > {code:sql} > SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS > task_instance_state, count(task_instance.task_id) AS count_1 FROM > task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, > dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = > 'running') AS running_dag_run ON running_dag_run.dag_id = > task_instance.dag_id AND running_dag_run.execution_date = > task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS > dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY > dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id > AND last_dag_run.execution_date = task_instance.execution_date WHERE > task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR > last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, > task_instance.state; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-304) Disable connection pooling for `airflow run`
Paul Yang created AIRFLOW-304: - Summary: Disable connection pooling for `airflow run` Key: AIRFLOW-304 URL: https://issues.apache.org/jira/browse/AIRFLOW-304 Project: Apache Airflow Issue Type: Bug Components: cli Affects Versions: Airflow 1.8, Airflow 1.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Fix For: Airflow 1.8 When airflow runs a task through the `airflow run` command, it keeps around a connection to the DB due to the connection pool settings and the way that sessions are managed. To reduce the number of connections to the DB, the run command should create connections as needed. This is dependent on the job heart rate configuration, but by default, this is only once every 5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-100) Add flexibility to ExternalTaskSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-100. Resolution: Fixed Issue resolved by pull request #1641 [https://github.com/apache/incubator-airflow/pull/1641] > Add flexibility to ExternalTaskSensor > - > > Key: AIRFLOW-100 > URL: https://issues.apache.org/jira/browse/AIRFLOW-100 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Labels: operator > > The ExternalTaskSensor defaults to sensing tasks with the same > {{execution_date}} as it does, and has an {{execution_delta}} parameter for > looking back farther in time. However, this doesn't support the case where > the sensing task has a smaller schedule_interval than the target task. > For example, if the ETS were run every hour, one couldn't use a fixed > {{execution_delta}} to sense a task that only ran daily (since each instance > of the ETS would need a different execution_delta). > However, a Daily task can wait for multiple hourly tasks, because it knows in > advance that it needs 24 ETS's with deltas == range(24). > Concrete suggestion: > - add a param ({{execution_delta_fn}}?) that takes in the current > execution_date and is expected to return the desired sense date (for example, > it could always return midnight of the previous day, no matter what the ETS > was executed). > cc [~criccomini] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-301) Unit test always fails on the last day of the month
[ https://issues.apache.org/jira/browse/AIRFLOW-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358055#comment-15358055 ] ASF subversion and git services commented on AIRFLOW-301: - Commit f49e238b72c1701df8ad88a0be232b6619f03edb in incubator-airflow's branch refs/heads/master from jlowin [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=f49e238 ] [AIRFLOW-301] Fix broken unit test This unit tests always fails on the last day of the month, since it tries to access a nonexistent day (like June 31st). > Unit test always fails on the last day of the month > --- > > Key: AIRFLOW-301 > URL: https://issues.apache.org/jira/browse/AIRFLOW-301 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Affects Versions: Airflow 1.7.1.3 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > The following line in the unit test for clearing XComs fails on the last day > of the month: > https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560 > {code} > == > ERROR: tests xcom fetch behavior with different execution dates, using > -- > Traceback (most recent call last): > File "/home/travis/build/apache/incubator-airflow/tests/models.py", line > 560, in test_xcom_pull_different_execution_date > exec_date = exec_date.replace(day=exec_date.day + 1) > ValueError: day is out of range for month > >> begin captured logging << > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[4/4] incubator-airflow git commit: Merge pull request #1641 from jlowin/external-task
Merge pull request #1641 from jlowin/external-task Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0a94cae4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0a94cae4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0a94cae4 Branch: refs/heads/master Commit: 0a94cae4b7f3824278702fddb8bac672fc01c894 Parents: 9f49f12 ddbcd88 Author: jlowinAuthored: Thu Jun 30 19:42:38 2016 -0400 Committer: jlowin Committed: Thu Jun 30 19:42:38 2016 -0400 -- .../versions/211e584da130_add_ti_state_index.py | 14 +++ airflow/operators/sensors.py| 17 - airflow/utils/tests.py | 5 ++- tests/core.py | 39 tests/models.py | 2 +- 5 files changed, 74 insertions(+), 3 deletions(-) --
[jira] [Resolved] (AIRFLOW-301) Unit test always fails on the last day of the month
[ https://issues.apache.org/jira/browse/AIRFLOW-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-301. Resolution: Fixed Issue resolved by pull request #1641 [https://github.com/apache/incubator-airflow/pull/1641] > Unit test always fails on the last day of the month > --- > > Key: AIRFLOW-301 > URL: https://issues.apache.org/jira/browse/AIRFLOW-301 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Affects Versions: Airflow 1.7.1.3 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > The following line in the unit test for clearing XComs fails on the last day > of the month: > https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560 > {code} > == > ERROR: tests xcom fetch behavior with different execution dates, using > -- > Traceback (most recent call last): > File "/home/travis/build/apache/incubator-airflow/tests/models.py", line > 560, in test_xcom_pull_different_execution_date > exec_date = exec_date.replace(day=exec_date.day + 1) > ValueError: day is out of range for month > >> begin captured logging << > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating
[ https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Froetscher reopened AIRFLOW-303: > EMR Operator job_flow_overrides allow for templating > > > Key: AIRFLOW-303 > URL: https://issues.apache.org/jira/browse/AIRFLOW-303 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Rob Froetscher >Assignee: Rob Froetscher >Priority: Minor > > See discussion here: https://github.com/apache/incubator-airflow/pull/1630 > Idea is that we want to allow templating on a dictionary argument to an > operator. This is probably best done by introspecting what are the strings in > the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query
[ https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358041#comment-15358041 ] Kengo Seki commented on AIRFLOW-246: bq. it delivers different results than before, as it double counts the TI belonging to the last DAG run if it is still running? Indeed. I totally missed that. Thanks for pointing out! bq. please open a JIRA about updating UPDATES.md. As another option, adding {{AND (state IS NULL OR state <> 'running')}} to the WHERE clause in the second subquery will fix this. > dag_stats endpoint has a terrible query > --- > > Key: AIRFLOW-246 > URL: https://issues.apache.org/jira/browse/AIRFLOW-246 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 > Environment: MySQL Backend through sqlalchemy >Reporter: Neil Hanlon >Assignee: Kengo Seki > Fix For: Airflow 1.8 > > > Hitting this endpoint creates a series of queries on the database which take > over 20 seconds to run, causing the page to not load for that entire time. > Luckily the main page (which includes this under "Recent Statuses") loads > this synchronously, but still... waiting almost half a minute (at times more) > to see the statuses for dags is really not fun. > We have less than a million rows in the task_instance table--so it's not even > a problem with that. > Here's a query profile for the query: > https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf > We've done some optimizations on the database, but to no avail. > The query: > {code:sql} > SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS > task_instance_state, count(task_instance.task_id) AS count_1 FROM > task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, > dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = > 'running') AS running_dag_run ON running_dag_run.dag_id = > task_instance.dag_id AND running_dag_run.execution_date = > task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS > dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY > dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id > AND last_dag_run.execution_date = task_instance.execution_date WHERE > task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR > last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, > task_instance.state; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-247) EMR Hook, Operators, Sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357993#comment-15357993 ] ASF subversion and git services commented on AIRFLOW-247: - Commit 9f49f12853d83dd051f0f1ed58b5df20bfcfe087 in incubator-airflow's branch refs/heads/master from [~rfroetscher] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=9f49f12 ] [AIRFLOW-247] Add EMR hook, operators and sensors. Add AWS base hook Closes #1630 from rfroetscher/emr > EMR Hook, Operators, Sensor > --- > > Key: AIRFLOW-247 > URL: https://issues.apache.org/jira/browse/AIRFLOW-247 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Rob Froetscher >Assignee: Rob Froetscher >Priority: Minor > > Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be > nice to have an EMR hook and operators. > Hook to generally interact with EMR. > Operators to: > * setup and start a job flow > * add steps to an existing jobflow > A sensor to: > * monitor completion and status of EMR jobs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating
Rob Froetscher created AIRFLOW-303: -- Summary: EMR Operator job_flow_overrides allow for templating Key: AIRFLOW-303 URL: https://issues.apache.org/jira/browse/AIRFLOW-303 Project: Apache Airflow Issue Type: Improvement Reporter: Rob Froetscher Assignee: Rob Froetscher Priority: Minor See discussion here: https://github.com/apache/incubator-airflow/pull/1630 Idea is that we want to allow templating on a dictionary argument to an operator. This is probably best done by introspecting what are the strings in the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-282] Remove PR Tool logic that depends on version formatting
Repository: incubator-airflow Updated Branches: refs/heads/master 7980f7771 -> 096564848 [AIRFLOW-282] Remove PR Tool logic that depends on version formatting Closes #1625 from jlowin/pr-tool Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/09656484 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/09656484 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/09656484 Branch: refs/heads/master Commit: 09656484806672b8ffaa133dffaf755f45c98307 Parents: 7980f77 Author: jlowinAuthored: Thu Jun 30 18:14:55 2016 -0400 Committer: jlowin Committed: Thu Jun 30 18:14:55 2016 -0400 -- dev/airflow-pr | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09656484/dev/airflow-pr -- diff --git a/dev/airflow-pr b/dev/airflow-pr index 937ce28..7f10a89 100755 --- a/dev/airflow-pr +++ b/dev/airflow-pr @@ -503,17 +503,20 @@ def resolve_jira_issue(comment=None, jira_id=None, merge_branches=None): else: default_fix_versions = [] -for v in default_fix_versions: -# Handles the case where we have forked a release branch but not yet made the release. -# In this case, if the PR is committed to the master branch and the release branch, we -# only consider the release branch to be the fix version. E.g. it is not valid to have -# both 1.1.0 and 1.0.0 as fix versions. -(major, minor, patch) = v.split(".") -if patch == "0": -previous = "%s.%s.%s" % (major, int(minor) - 1, 0) -if previous in default_fix_versions: -default_fix_versions = list(filter( -lambda x: x != v, default_fix_versions)) +# TODO Airflow versions vary from two to four decimal places (2.0, 1.7.1.3) +# The following logic can be reintroduced if/when a standard emerges. + +# for v in default_fix_versions: +# # Handles the case where we have forked a release branch but not yet made the release. +# # In this case, if the PR is committed to the master branch and the release branch, we +# # only consider the release branch to be the fix version. E.g. it is not valid to have +# # both 1.1.0 and 1.0.0 as fix versions. +# (major, minor, patch) = v.split(".") +# if patch == "0": +# previous = "%s.%s.%s" % (major, int(minor) - 1, 0) +# if previous in default_fix_versions: +# default_fix_versions = list(filter( +# lambda x: x != v, default_fix_versions)) default_fix_versions = ",".join(default_fix_versions) fix_versions = click.prompt(
[jira] [Updated] (AIRFLOW-302) The Bolke Correction: take squash commit messages from individual commit messages
[ https://issues.apache.org/jira/browse/AIRFLOW-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-302: --- Description: Currently, squash commit subjects are taken from the PR title (which should reference all JIRA issues). However this can be redundant in the case where there is just one commit, since it will be in the PR title and the commit message. Add an option (possibly default, or maybe default when there is only one referenced issue or one commit) to use the first commit message as the squash commit message. cc [~bolke] for suggesting it was:Currently, squash commit subjects are taken from the PR title (which should reference all JIRA issues). However this can be redundant in the case where there is just one commit, since it will be in the PR title and the commit message. Add an option (possibly default, or maybe default when there is only one referenced issue or one commit) to use the first commit message as the squash commit message. > The Bolke Correction: take squash commit messages from individual commit > messages > - > > Key: AIRFLOW-302 > URL: https://issues.apache.org/jira/browse/AIRFLOW-302 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > Currently, squash commit subjects are taken from the PR title (which should > reference all JIRA issues). However this can be redundant in the case where > there is just one commit, since it will be in the PR title and the commit > message. Add an option (possibly default, or maybe default when there is only > one referenced issue or one commit) to use the first commit message as the > squash commit message. > cc [~bolke] for suggesting it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-302) The Bolke Correction: take squash commit messages from individual commit messages
Jeremiah Lowin created AIRFLOW-302: -- Summary: The Bolke Correction: take squash commit messages from individual commit messages Key: AIRFLOW-302 URL: https://issues.apache.org/jira/browse/AIRFLOW-302 Project: Apache Airflow Issue Type: Bug Components: PR tool Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin Currently, squash commit subjects are taken from the PR title (which should reference all JIRA issues). However this can be redundant in the case where there is just one commit, since it will be in the PR title and the commit message. Add an option (possibly default, or maybe default when there is only one referenced issue or one commit) to use the first commit message as the squash commit message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-301) Unit test always fails on the last day of the month
Jeremiah Lowin created AIRFLOW-301: -- Summary: Unit test always fails on the last day of the month Key: AIRFLOW-301 URL: https://issues.apache.org/jira/browse/AIRFLOW-301 Project: Apache Airflow Issue Type: Bug Components: tests Affects Versions: Airflow 1.7.1.3 Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin The following line in the unit test for clearing XComs fails on the last day of the month: https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560 {code} == ERROR: tests xcom fetch behavior with different execution dates, using -- Traceback (most recent call last): File "/home/travis/build/apache/incubator-airflow/tests/models.py", line 560, in test_xcom_pull_different_execution_date exec_date = exec_date.replace(day=exec_date.day + 1) ValueError: day is out of range for month >> begin captured logging << {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-291) Add an index for task_instance.state
[ https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-291. --- Resolution: Fixed Fix Version/s: Airflow 1.8 > Add an index for task_instance.state > > > Key: AIRFLOW-291 > URL: https://issues.apache.org/jira/browse/AIRFLOW-291 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Dan Davydov >Assignee: Dan Davydov > Labels: database, scalability > Fix For: Airflow 1.8 > > > Add an index for task_instance.state to speed up querying for currently > queued/running tasks. > From my experimentation at Airbnb on our mysql DB this has shown about a > 15-40% decrease in the number of mysql connections depending on qps and has > brought down CPU usage of the DB from constant spikes of 100% to 2%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-291) Add an index for task_instance.state
[ https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357893#comment-15357893 ] ASF subversion and git services commented on AIRFLOW-291: - Commit 7980f7771ab6b6f84259ea9a52e78e4f5f690e42 in incubator-airflow's branch refs/heads/master from [~aoen] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7980f77 ] [AIRFLOW-291] Add index for state in TI table Closes #1635 from aoen/ddavydov/add_index_to_task_instance_state > Add an index for task_instance.state > > > Key: AIRFLOW-291 > URL: https://issues.apache.org/jira/browse/AIRFLOW-291 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Dan Davydov >Assignee: Dan Davydov > Labels: database, scalability > > Add an index for task_instance.state to speed up querying for currently > queued/running tasks. > From my experimentation at Airbnb on our mysql DB this has shown about a > 15-40% decrease in the number of mysql connections depending on qps and has > brought down CPU usage of the DB from constant spikes of 100% to 2%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-100) Add flexibility to ExternalTaskSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-100: --- External issue URL: https://github.com/apache/incubator-airflow/pull/1641 > Add flexibility to ExternalTaskSensor > - > > Key: AIRFLOW-100 > URL: https://issues.apache.org/jira/browse/AIRFLOW-100 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Labels: operator > > The ExternalTaskSensor defaults to sensing tasks with the same > {{execution_date}} as it does, and has an {{execution_delta}} parameter for > looking back farther in time. However, this doesn't support the case where > the sensing task has a smaller schedule_interval than the target task. > For example, if the ETS were run every hour, one couldn't use a fixed > {{execution_delta}} to sense a task that only ran daily (since each instance > of the ETS would need a different execution_delta). > However, a Daily task can wait for multiple hourly tasks, because it knows in > advance that it needs 24 ETS's with deltas == range(24). > Concrete suggestion: > - add a param ({{execution_delta_fn}}?) that takes in the current > execution_date and is expected to return the desired sense date (for example, > it could always return midnight of the previous day, no matter what the ETS > was executed). > cc [~criccomini] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357775#comment-15357775 ] ASF subversion and git services commented on AIRFLOW-296: - Commit 24d41b8909840451c1ef7d70c1c7671e6d87528c in incubator-airflow's branch refs/heads/master from [~msumit] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=24d41b8 ] [AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator Closes #1638 from msumit/AIRFLOW-296 > Getting TemplateNotFound Error while using QuboleOperator > - > > Key: AIRFLOW-296 > URL: https://issues.apache.org/jira/browse/AIRFLOW-296 > Project: Apache Airflow > Issue Type: Bug >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari > > Getting following error while using Qubole operator. > {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}} > Found the error and fix, will open a PR soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357774#comment-15357774 ] ASF subversion and git services commented on AIRFLOW-296: - Commit 24d41b8909840451c1ef7d70c1c7671e6d87528c in incubator-airflow's branch refs/heads/master from [~msumit] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=24d41b8 ] [AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator Closes #1638 from msumit/AIRFLOW-296 > Getting TemplateNotFound Error while using QuboleOperator > - > > Key: AIRFLOW-296 > URL: https://issues.apache.org/jira/browse/AIRFLOW-296 > Project: Apache Airflow > Issue Type: Bug >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari > > Getting following error while using Qubole operator. > {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}} > Found the error and fix, will open a PR soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-299) Evaluate a plugin architecture to replace contrib
Dan Davydov created AIRFLOW-299: --- Summary: Evaluate a plugin architecture to replace contrib Key: AIRFLOW-299 URL: https://issues.apache.org/jira/browse/AIRFLOW-299 Project: Apache Airflow Issue Type: Task Components: contrib Reporter: Dan Davydov Priority: Minor We should take a look at the usefulness of a plugin architecture similar to tools like Jenkins for the contrib folder. This way we can delegate/scale committers for contrib, keep the git history/repo size/tests of the repo smaller, allow users to publish operators without waiting for their commits to be merged by a committer etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator
Repository: incubator-airflow Updated Branches: refs/heads/master 002cf85bd -> 24d41b890 [AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator Closes #1638 from msumit/AIRFLOW-296 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/24d41b89 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/24d41b89 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/24d41b89 Branch: refs/heads/master Commit: 24d41b8909840451c1ef7d70c1c7671e6d87528c Parents: 002cf85 Author: Sumit MaheshwariAuthored: Thu Jun 30 13:16:02 2016 -0700 Committer: Chris Riccomini Committed: Thu Jun 30 13:16:02 2016 -0700 -- airflow/contrib/operators/qubole_operator.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/24d41b89/airflow/contrib/operators/qubole_operator.py -- diff --git a/airflow/contrib/operators/qubole_operator.py b/airflow/contrib/operators/qubole_operator.py index cbf15c4..c462dca 100755 --- a/airflow/contrib/operators/qubole_operator.py +++ b/airflow/contrib/operators/qubole_operator.py @@ -95,7 +95,7 @@ class QuboleOperator(BaseOperator): template_fields = ('query', 'script_location', 'sub_command', 'script', 'files', 'archives', 'program', 'cmdline', 'sql', 'where_clause', 'extract_query', 'boundary_query', 'macros', 'tags', 'name', 'parameters') -template_ext = ('.txt') +template_ext = ('.txt',) ui_color = '#3064A1' ui_fgcolor = '#fff'
[jira] [Commented] (AIRFLOW-286) Improve FTPHook to implement context manager interface
[ https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357768#comment-15357768 ] ASF subversion and git services commented on AIRFLOW-286: - Commit 002cf85bdbbe62d04cec741b078e1600e0fd6c0a in incubator-airflow's branch refs/heads/master from [~skudriashev] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=002cf85 ] [AIRFLOW-286] Improve FTPHook to implement context manager interface Closes #1632 from skudriashev/airflow-286 > Improve FTPHook to implement context manager interface > -- > > Key: AIRFLOW-286 > URL: https://issues.apache.org/jira/browse/AIRFLOW-286 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Stanislav Kudriashev >Assignee: Stanislav Kudriashev >Priority: Minor > Fix For: Airflow 1.8 > > > It would be very nice to use FTPHook as a context manager: > {code} > with FTPHook(...) as hook: > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-286) Improve FTPHook to implement context manager interface
[ https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-286. --- Resolution: Fixed Fix Version/s: Airflow 1.8 Merged, thanks! > Improve FTPHook to implement context manager interface > -- > > Key: AIRFLOW-286 > URL: https://issues.apache.org/jira/browse/AIRFLOW-286 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Stanislav Kudriashev >Assignee: Stanislav Kudriashev >Priority: Minor > Fix For: Airflow 1.8 > > > It would be very nice to use FTPHook as a context manager: > {code} > with FTPHook(...) as hook: > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-286) Improve FTPHook to implement context manager interface
[ https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-286: External issue URL: https://github.com/apache/incubator-airflow/pull/1632 > Improve FTPHook to implement context manager interface > -- > > Key: AIRFLOW-286 > URL: https://issues.apache.org/jira/browse/AIRFLOW-286 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Stanislav Kudriashev >Assignee: Stanislav Kudriashev >Priority: Minor > > It would be very nice to use FTPHook as a context manager: > {code} > with FTPHook(...) as hook: > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-293) Task execution independent of heartrate
[ https://issues.apache.org/jira/browse/AIRFLOW-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-293: Affects Version/s: Airflow 1.7.1.3 External issue URL: https://github.com/apache/incubator-airflow/pull/1637 > Task execution independent of heartrate > --- > > Key: AIRFLOW-293 > URL: https://issues.apache.org/jira/browse/AIRFLOW-293 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Affects Versions: Airflow 1.8, Airflow 1.7.1.3 >Reporter: Paul Yang >Assignee: Paul Yang >Priority: Minor > Fix For: Airflow 1.8 > > > When a task runs through the `airflow run` command, it currently does not > return until at least `job_heartbeat_sec` seconds have elapsed. > When the job heartrate is a high value (for example, to reduce DB load), this > makes short task instances run for much longer than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-291) Add an index for task_instance.state
[ https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-291: External issue URL: https://github.com/apache/incubator-airflow/pull/1635 > Add an index for task_instance.state > > > Key: AIRFLOW-291 > URL: https://issues.apache.org/jira/browse/AIRFLOW-291 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Dan Davydov >Assignee: Dan Davydov > Labels: database, scalability > > Add an index for task_instance.state to speed up querying for currently > queued/running tasks. > From my experimentation at Airbnb on our mysql DB this has shown about a > 15-40% decrease in the number of mysql connections depending on qps and has > brought down CPU usage of the DB from constant spikes of 100% to 2%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-243. --- Resolution: Fixed Fix Version/s: (was: Airflow 2.0) Airflow 1.8 > Use a more efficient Thrift call for HivePartitionSensor > > > Key: AIRFLOW-243 > URL: https://issues.apache.org/jira/browse/AIRFLOW-243 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: Airflow 1.7.1.3 >Reporter: Paul Yang >Assignee: Li Xuanji >Priority: Minor > Fix For: Airflow 1.8 > > > The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call > that can result in some expensive SQL queries for tables that have many > partitions and are partitioned by multiple keys. We've seen our metastore DB > get hammered by these sensors resulting in service degradation for other > metastore users. > The {{MetastorePartitionSensor}} is efficient, but it can result in too many > connections to the metastore DB. > An alternative is to use the `get_partition_by_name` Thrift call that > translates into more efficient SQL queries. Because connections will be > pooled on the Thrift server, the DB won't get overloaded as with the > {{MetastorePartitionSensor}}. The semantics of the arguments will change, so > either a new argument needs to be introduced, or a new operator needs to be > created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-243: Affects Version/s: (was: Airflow 2.0) Airflow 1.7.1.3 > Use a more efficient Thrift call for HivePartitionSensor > > > Key: AIRFLOW-243 > URL: https://issues.apache.org/jira/browse/AIRFLOW-243 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: Airflow 1.7.1.3 >Reporter: Paul Yang >Assignee: Li Xuanji >Priority: Minor > Fix For: Airflow 1.8 > > > The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call > that can result in some expensive SQL queries for tables that have many > partitions and are partitioned by multiple keys. We've seen our metastore DB > get hammered by these sensors resulting in service degradation for other > metastore users. > The {{MetastorePartitionSensor}} is efficient, but it can result in too many > connections to the metastore DB. > An alternative is to use the `get_partition_by_name` Thrift call that > translates into more efficient SQL queries. Because connections will be > pooled on the Thrift server, the DB won't get overloaded as with the > {{MetastorePartitionSensor}}. The semantics of the arguments will change, so > either a new argument needs to be introduced, or a new operator needs to be > created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-189) Highlighting of Parent/Child nodes in Graphs
[ https://issues.apache.org/jira/browse/AIRFLOW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-189. --- Resolution: Fixed Fix Version/s: Airflow 1.8 > Highlighting of Parent/Child nodes in Graphs > > > Key: AIRFLOW-189 > URL: https://issues.apache.org/jira/browse/AIRFLOW-189 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.7.1.3 >Reporter: Paul Rhodes >Assignee: Paul Rhodes >Priority: Minor > Fix For: Airflow 1.8 > > Attachments: Screenshot from 2016-05-29 15-19-39.png > > > Sometimes in large graphs it's difficult to see the direct > ancestors/descendants. This change would use mouseover events to highlight > these tasks to make it easier to interpret the graphs. > An example can be seen here in the attachment > !Screenshot from 2016-05-29 15-19-39.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-189) Highlighting of Parent/Child nodes in Graphs
[ https://issues.apache.org/jira/browse/AIRFLOW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-189: Affects Version/s: Airflow 1.7.1.3 > Highlighting of Parent/Child nodes in Graphs > > > Key: AIRFLOW-189 > URL: https://issues.apache.org/jira/browse/AIRFLOW-189 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.7.1.3 >Reporter: Paul Rhodes >Assignee: Paul Rhodes >Priority: Minor > Fix For: Airflow 1.8 > > Attachments: Screenshot from 2016-05-29 15-19-39.png > > > Sometimes in large graphs it's difficult to see the direct > ancestors/descendants. This change would use mouseover events to highlight > these tasks to make it easier to interpret the graphs. > An example can be seen here in the attachment > !Screenshot from 2016-05-29 15-19-39.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-255) schedule_dag shouldn't return early if dagrun_timeout is given
[ https://issues.apache.org/jira/browse/AIRFLOW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-255: Affects Version/s: Airflow 1.7.1.3 > schedule_dag shouldn't return early if dagrun_timeout is given > -- > > Key: AIRFLOW-255 > URL: https://issues.apache.org/jira/browse/AIRFLOW-255 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1.3 >Reporter: Kevin Lin >Assignee: Kevin Lin > Fix For: Airflow 1.8 > > > In > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L396, > schedule_dag returns as long as `len(active_runs) >= dag.max_active_runs`. It > should also take into account dag.dagrun_timeout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-255) schedule_dag shouldn't return early if dagrun_timeout is given
[ https://issues.apache.org/jira/browse/AIRFLOW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-255. --- Resolution: Fixed Fix Version/s: Airflow 1.8 > schedule_dag shouldn't return early if dagrun_timeout is given > -- > > Key: AIRFLOW-255 > URL: https://issues.apache.org/jira/browse/AIRFLOW-255 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1.3 >Reporter: Kevin Lin >Assignee: Kevin Lin > Fix For: Airflow 1.8 > > > In > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L396, > schedule_dag returns as long as `len(active_runs) >= dag.max_active_runs`. It > should also take into account dag.dagrun_timeout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-289) Use datetime.utcnow() to keep airflow system independent
[ https://issues.apache.org/jira/browse/AIRFLOW-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-289: Affects Version/s: Airflow 1.7.1.3 > Use datetime.utcnow() to keep airflow system independent > > > Key: AIRFLOW-289 > URL: https://issues.apache.org/jira/browse/AIRFLOW-289 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.7.1.3 >Reporter: Vineet Goel >Assignee: Vineet Goel >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-298) incubator disclaimer isn't proper on documentation website
[ https://issues.apache.org/jira/browse/AIRFLOW-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime Beauchemin updated AIRFLOW-298: -- External issue URL: https://github.com/apache/incubator-airflow/pull/1640 > incubator disclaimer isn't proper on documentation website > -- > > Key: AIRFLOW-298 > URL: https://issues.apache.org/jira/browse/AIRFLOW-298 > Project: Apache Airflow > Issue Type: Bug >Reporter: Maxime Beauchemin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-297) Exponential Backoff Retry Delay
[ https://issues.apache.org/jira/browse/AIRFLOW-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-297: Summary: Exponential Backoff Retry Delay (was: Exponential Backoff) > Exponential Backoff Retry Delay > --- > > Key: AIRFLOW-297 > URL: https://issues.apache.org/jira/browse/AIRFLOW-297 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Minor > > The retry delay time is currently fixed. It would be an useful option to > support progressive longer waits between retries via exponential backoff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-297) Exponential Backoff
Joy Gao created AIRFLOW-297: --- Summary: Exponential Backoff Key: AIRFLOW-297 URL: https://issues.apache.org/jira/browse/AIRFLOW-297 Project: Apache Airflow Issue Type: Improvement Reporter: Joy Gao Assignee: Joy Gao Priority: Minor The retry delay time is currently fixed. It would be an useful option to support progressive longer waits between retries via exponential backoff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (AIRFLOW-290) Implement ODBC Hook
[ https://issues.apache.org/jira/browse/AIRFLOW-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-290 started by Jason Damiani. - > Implement ODBC Hook > --- > > Key: AIRFLOW-290 > URL: https://issues.apache.org/jira/browse/AIRFLOW-290 > Project: Apache Airflow > Issue Type: New Feature > Components: hooks >Reporter: Jason Damiani >Assignee: Jason Damiani >Priority: Minor > > Add support for hooking into ODBC data sources -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query
[ https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356998#comment-15356998 ] Matthias Huschle commented on AIRFLOW-246: -- Do I understand the query correctly, that it delivers different results than before, as it double counts the TI belonging to the last DAG run if it is still running? Not that I'd consider this a bad trade-off, but it might be worth noting it. > dag_stats endpoint has a terrible query > --- > > Key: AIRFLOW-246 > URL: https://issues.apache.org/jira/browse/AIRFLOW-246 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: Airflow 1.7.1 > Environment: MySQL Backend through sqlalchemy >Reporter: Neil Hanlon >Assignee: Kengo Seki > > Hitting this endpoint creates a series of queries on the database which take > over 20 seconds to run, causing the page to not load for that entire time. > Luckily the main page (which includes this under "Recent Statuses") loads > this synchronously, but still... waiting almost half a minute (at times more) > to see the statuses for dags is really not fun. > We have less than a million rows in the task_instance table--so it's not even > a problem with that. > Here's a query profile for the query: > https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf > We've done some optimizations on the database, but to no avail. > The query: > {code:sql} > SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS > task_instance_state, count(task_instance.task_id) AS count_1 FROM > task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, > dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = > 'running') AS running_dag_run ON running_dag_run.dag_id = > task_instance.dag_id AND running_dag_run.execution_date = > task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS > dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY > dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id > AND last_dag_run.execution_date = task_instance.execution_date WHERE > task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR > last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, > task_instance.state; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Maheshwari updated AIRFLOW-296: - Description: Getting following error while using Qubole operator. {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}} Found the error and fix, will open a PR soon. was: Getting following error while using Qubole operator. {{ jinja2.exceptions.TemplateNotFound: select count(*) from default.payment }} Found the error and fix, will open a PR soon. > Getting TemplateNotFound Error while using QuboleOperator > - > > Key: AIRFLOW-296 > URL: https://issues.apache.org/jira/browse/AIRFLOW-296 > Project: Apache Airflow > Issue Type: Bug >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari > > Getting following error while using Qubole operator. > {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}} > Found the error and fix, will open a PR soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator
Sumit Maheshwari created AIRFLOW-296: Summary: Getting TemplateNotFound Error while using QuboleOperator Key: AIRFLOW-296 URL: https://issues.apache.org/jira/browse/AIRFLOW-296 Project: Apache Airflow Issue Type: Bug Reporter: Sumit Maheshwari Assignee: Sumit Maheshwari Getting following error while using Qubole operator. {{ jinja2.exceptions.TemplateNotFound: select count(*) from default.payment }} Found the error and fix, will open a PR soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-188) Exception on /admin/airflow/clear
[ https://issues.apache.org/jira/browse/AIRFLOW-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356786#comment-15356786 ] peter pang commented on AIRFLOW-188: Hi, I met the same Ooops when I press the [Clear] button in the task dialog window. the airflow version is 1.7.1.3 , which I installed through pip. "airflow clear dag_name" command is ok. Ooops happens on all the dags, lists on the webpage. Click One Dag, In the [ Graph View ] press any task running successfully, comes out the task operation dialog, press [ Clear ] then Ooops > Exception on /admin/airflow/clear > - > > Key: AIRFLOW-188 > URL: https://issues.apache.org/jira/browse/AIRFLOW-188 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.0 > Environment: - CeleryExecutor > - broker Redis > - backend PostgreSQL >Reporter: Cyril Scetbon >Assignee: Siddharth Anand > > When I try to clear the past of a task I get a Oops. Here is the output I get > : http://pastebin.com/yTYsekyB > The code uses 2 files : > http://pastebin.com/Uyp41wEh (subdags) > http://pastebin.com/wiGP964w (maindag that uses subdags) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() reads unclosed file and skip last statement
[ https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Sanko updated AIRFLOW-295: - Description: If hql into HiveOperator which use beeline connection contains a lot of statements and doesn't contain additional space line at the end beeline skip last statement. As I understand beeline directly reads unclosed file (which cannot be closed of NamedTemporaryFile class usage) and get unexpected EOF. {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:01:51,346] {models.py:1041} INFO - Executingon 2016-01-01 00:00:00 [2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: asanko_hive [2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd [2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms [2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:1/default;auth=none [2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache Hive [2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko; [2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 seconds) [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko; [2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 seconds) [2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual; [2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 seconds) [2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; [2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 seconds) [2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: org.apache.hive.jdbc.HiveConnection {code} If You create file manually with the same hql and pass it to beeline all statements will be called. If You manually add empty row last statement successfully run: {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing on 2016-01-01 00:00:00 [2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:04:01,388]
[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() reads unclosed file and skip last statement
[ https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Sanko updated AIRFLOW-295: - Description: If hql into HiveOperator which use beeline connection contains a lot of statements and doesn't contain additional space line at the end beeline skip last statement. As I understand beeline directly reads unclosed file (which cannot be closed of NamedTemporaryFile class usage) and get unexpected EOF. {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:01:51,346] {models.py:1041} INFO - Executingon 2016-01-01 00:00:00 [2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: asanko_hive [2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd [2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms [2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:1/default;auth=none [2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache Hive [2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko; [2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 seconds) [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko; [2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 seconds) [2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual; [2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 seconds) [2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; [2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 seconds) [2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: org.apache.hive.jdbc.HiveConnection {code} If You create file manually with the same hql and pass it to beeline all statements will be called. If we manually add empty row last statement successfully run: {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing on 2016-01-01 00:00:00 [2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:04:01,388]
[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() read unclosed file and skip last statement
[ https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Sanko updated AIRFLOW-295: - Description: If hql into HiveOperator which use beeline connection contains a lot of statements and doesn't contain additional space line at the end beeline skip last statement. As I understand beeline directly reads unclosed file (which cannot be closed of NamedTemporaryFile class usage) and get unexpected EOF. {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:01:51,346] {models.py:1041} INFO - Executingon 2016-01-01 00:00:00 [2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: asanko_hive [2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd [2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms [2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:1/default;auth=none [2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache Hive [2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko; [2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 seconds) [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko; [2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 seconds) [2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual; [2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 seconds) [2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; [2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 seconds) [2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: org.apache.hive.jdbc.HiveConnection {code} But if we manually add empty row last statement successfully run: {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing on 2016-01-01 00:00:00 [2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:04:01,388] {base_hook.py:53} INFO - Using connection to: asanko_hive [2016-06-29 03:04:01,390] {hive_hooks.py:105} INFO
[jira] [Created] (AIRFLOW-295) Beeline called into HiveCliHook.run() read unclosed file and skip last statement
Alexey Sanko created AIRFLOW-295: Summary: Beeline called into HiveCliHook.run() read unclosed file and skip last statement Key: AIRFLOW-295 URL: https://issues.apache.org/jira/browse/AIRFLOW-295 Project: Apache Airflow Issue Type: Bug Reporter: Alexey Sanko If hql into HiveOperator which use beeline connection contains a lot of statements and doesn't contain additional space line at the end beeline skip last statement. As I understand beeline directly read unclosed file (which cannot be closed of NamedTemporaryFile class usage) and get unexpected EOF. {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:01:51,346] {models.py:1041} INFO - Executingon 2016-01-01 00:00:00 [2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; desc asanko.test_airflow_dual; [2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: asanko_hive [2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f /tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd [2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms [2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to jdbc:hive2://asanko_hive:1/default;auth=none [2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 0.12.0-cdh5.1.3) [2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 0.12.0-cdh5.1.3 by Apache Hive [2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> USE asanko; [2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 seconds) [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> [2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> use asanko; [2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 seconds) [2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual; [2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 seconds) [2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01'; [2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom. [2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 seconds) [2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: org.apache.hive.jdbc.HiveConnection {code} But if we manually add empty row last statement successfully run: {code} hive = HiveOperator( dag = hive_dag, start_date = datetime(2016, 1, 1), task_id='asanko_cli_remote_test', hql = """ use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '{{ ds }}'; desc asanko.test_airflow_dual; """, hive_cli_conn_id='asanko_hive_cli_beeline', schema='asanko', default_args=args, run_as_owner=True) {code} Log: {code} [2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing on 2016-01-01 00:00:00 [2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: use asanko; drop table if exists test_airflow_dual; create table asanko.test_airflow_dual as select * from asanko.dual where x <> '2016-01-01';