[jira] [Updated] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0

2016-06-30 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-32:
--
Description: 
A number of features have been marked for deprecation in Airflow 2.0. They need 
to be deleted prior to release. 

Usually the error message or comments will mention Airflow 2.0 with either a 
#TODO or #FIXME.

Tracking list (not necessarily complete!):
JIRA:
AIRFLOW-31
AIRFLOW-200

GitHub:
https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
https://github.com/airbnb/airflow/pull/1219
https://github.com/airbnb/airflow/pull/1285



  was:
A number of features have been marked for deprecation in Airflow 2.0. They need 
to be deleted prior to release. 

Usually the error message or comments will mention Airflow 2.0 with either a 
#TODO or #FIXME.

Tracking list (not necessarily complete!):
JIRA:
AIRFLOW-31

GitHub:
https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
https://github.com/airbnb/airflow/pull/1219
https://github.com/airbnb/airflow/pull/1285




> Remove deprecated features prior to releasing Airflow 2.0
> -
>
> Key: AIRFLOW-32
> URL: https://issues.apache.org/jira/browse/AIRFLOW-32
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>  Labels: deprecated
> Fix For: Airflow 2.0
>
>
> A number of features have been marked for deprecation in Airflow 2.0. They 
> need to be deleted prior to release. 
> Usually the error message or comments will mention Airflow 2.0 with either a 
> #TODO or #FIXME.
> Tracking list (not necessarily complete!):
> JIRA:
> AIRFLOW-31
> AIRFLOW-200
> GitHub:
> https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
> https://github.com/airbnb/airflow/pull/1219
> https://github.com/airbnb/airflow/pull/1285



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating

2016-06-30 Thread Rob Froetscher (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358097#comment-15358097
 ] 

Rob Froetscher commented on AIRFLOW-303:


Ah no worries, figured it was a bot because it was on the github PR

> EMR Operator job_flow_overrides allow for templating
> 
>
> Key: AIRFLOW-303
> URL: https://issues.apache.org/jira/browse/AIRFLOW-303
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> See discussion here: https://github.com/apache/incubator-airflow/pull/1630
> Idea is that we want to allow templating on a dictionary argument to an 
> operator. This is probably best done by introspecting what are the strings in 
> the dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating

2016-06-30 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358079#comment-15358079
 ] 

Chris Riccomini commented on AIRFLOW-303:
-

Derp, sorry. Mistakenly closed this.

> EMR Operator job_flow_overrides allow for templating
> 
>
> Key: AIRFLOW-303
> URL: https://issues.apache.org/jira/browse/AIRFLOW-303
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> See discussion here: https://github.com/apache/incubator-airflow/pull/1630
> Idea is that we want to allow templating on a dictionary argument to an 
> operator. This is probably best done by introspecting what are the strings in 
> the dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358077#comment-15358077
 ] 

Chris Riccomini commented on AIRFLOW-246:
-

I have no strong preference. Either JIRA or UPDATING.md or fixing query is fine 
with me. :) [~sekikn], since I think you'll end up doing the work--your call.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
> Fix For: Airflow 1.8
>
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-304) Disable connection pooling for `airflow run`

2016-06-30 Thread Paul Yang (JIRA)
Paul Yang created AIRFLOW-304:
-

 Summary: Disable connection pooling for `airflow run`
 Key: AIRFLOW-304
 URL: https://issues.apache.org/jira/browse/AIRFLOW-304
 Project: Apache Airflow
  Issue Type: Bug
  Components: cli
Affects Versions: Airflow 1.8, Airflow 1.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Fix For: Airflow 1.8


When airflow runs a task through the `airflow run` command, it keeps around a 
connection to the DB due to the connection pool settings and the way that 
sessions are managed.

To reduce the number of connections to the DB, the run command should create 
connections as needed. This is dependent on the job heart rate configuration, 
but by default, this is only once every 5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-100) Add flexibility to ExternalTaskSensor

2016-06-30 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-100.

Resolution: Fixed

Issue resolved by pull request #1641
[https://github.com/apache/incubator-airflow/pull/1641]

> Add flexibility to ExternalTaskSensor
> -
>
> Key: AIRFLOW-100
> URL: https://issues.apache.org/jira/browse/AIRFLOW-100
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
>  Labels: operator
>
> The ExternalTaskSensor defaults to sensing tasks with the same 
> {{execution_date}} as it does, and has an {{execution_delta}} parameter for 
> looking back farther in time. However, this doesn't support the case where 
> the sensing task has a smaller schedule_interval than the target task.
> For example, if the ETS were run every hour, one couldn't use a fixed 
> {{execution_delta}} to sense a task that only ran daily (since each instance 
> of the ETS would need a different execution_delta). 
> However, a Daily task can wait for multiple hourly tasks, because it knows in 
> advance that it needs 24 ETS's with deltas == range(24).
> Concrete suggestion:
> - add a param ({{execution_delta_fn}}?) that takes in the current 
> execution_date and is expected to return the desired sense date (for example, 
> it could always return midnight of the previous day, no matter what the ETS 
> was executed).
> cc [~criccomini]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-301) Unit test always fails on the last day of the month

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358055#comment-15358055
 ] 

ASF subversion and git services commented on AIRFLOW-301:
-

Commit f49e238b72c1701df8ad88a0be232b6619f03edb in incubator-airflow's branch 
refs/heads/master from jlowin
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=f49e238 ]

[AIRFLOW-301] Fix broken unit test

This unit tests always fails on the last day of the month, since it
tries to access a nonexistent day (like June 31st).


> Unit test always fails on the last day of the month
> ---
>
> Key: AIRFLOW-301
> URL: https://issues.apache.org/jira/browse/AIRFLOW-301
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Affects Versions: Airflow 1.7.1.3
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> The following line in the unit test for clearing XComs fails on the last day 
> of the month:
> https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560
> {code}
> ==
> ERROR: tests xcom fetch behavior with different execution dates, using
> --
> Traceback (most recent call last):
>   File "/home/travis/build/apache/incubator-airflow/tests/models.py", line 
> 560, in test_xcom_pull_different_execution_date
> exec_date = exec_date.replace(day=exec_date.day + 1)
> ValueError: day is out of range for month
>  >> begin captured logging << 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[4/4] incubator-airflow git commit: Merge pull request #1641 from jlowin/external-task

2016-06-30 Thread jlowin
Merge pull request #1641 from jlowin/external-task


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0a94cae4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0a94cae4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0a94cae4

Branch: refs/heads/master
Commit: 0a94cae4b7f3824278702fddb8bac672fc01c894
Parents: 9f49f12 ddbcd88
Author: jlowin 
Authored: Thu Jun 30 19:42:38 2016 -0400
Committer: jlowin 
Committed: Thu Jun 30 19:42:38 2016 -0400

--
 .../versions/211e584da130_add_ti_state_index.py | 14 +++
 airflow/operators/sensors.py| 17 -
 airflow/utils/tests.py  |  5 ++-
 tests/core.py   | 39 
 tests/models.py |  2 +-
 5 files changed, 74 insertions(+), 3 deletions(-)
--




[jira] [Resolved] (AIRFLOW-301) Unit test always fails on the last day of the month

2016-06-30 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-301.

Resolution: Fixed

Issue resolved by pull request #1641
[https://github.com/apache/incubator-airflow/pull/1641]

> Unit test always fails on the last day of the month
> ---
>
> Key: AIRFLOW-301
> URL: https://issues.apache.org/jira/browse/AIRFLOW-301
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Affects Versions: Airflow 1.7.1.3
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> The following line in the unit test for clearing XComs fails on the last day 
> of the month:
> https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560
> {code}
> ==
> ERROR: tests xcom fetch behavior with different execution dates, using
> --
> Traceback (most recent call last):
>   File "/home/travis/build/apache/incubator-airflow/tests/models.py", line 
> 560, in test_xcom_pull_different_execution_date
> exec_date = exec_date.replace(day=exec_date.day + 1)
> ValueError: day is out of range for month
>  >> begin captured logging << 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating

2016-06-30 Thread Rob Froetscher (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Froetscher reopened AIRFLOW-303:


> EMR Operator job_flow_overrides allow for templating
> 
>
> Key: AIRFLOW-303
> URL: https://issues.apache.org/jira/browse/AIRFLOW-303
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> See discussion here: https://github.com/apache/incubator-airflow/pull/1630
> Idea is that we want to allow templating on a dictionary argument to an 
> operator. This is probably best done by introspecting what are the strings in 
> the dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358041#comment-15358041
 ] 

Kengo Seki commented on AIRFLOW-246:


bq. it delivers different results than before, as it double counts the TI 
belonging to the last DAG run if it is still running?

Indeed. I totally missed that. Thanks for pointing out!

bq. please open a JIRA about updating UPDATES.md.

As another option, adding {{AND (state IS NULL OR state <> 'running')}} to the 
WHERE clause in the second subquery will fix this.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
> Fix For: Airflow 1.8
>
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-247) EMR Hook, Operators, Sensor

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357993#comment-15357993
 ] 

ASF subversion and git services commented on AIRFLOW-247:
-

Commit 9f49f12853d83dd051f0f1ed58b5df20bfcfe087 in incubator-airflow's branch 
refs/heads/master from [~rfroetscher]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=9f49f12 ]

[AIRFLOW-247] Add EMR hook, operators and sensors. Add AWS base hook

Closes #1630 from rfroetscher/emr


> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-303) EMR Operator job_flow_overrides allow for templating

2016-06-30 Thread Rob Froetscher (JIRA)
Rob Froetscher created AIRFLOW-303:
--

 Summary: EMR Operator job_flow_overrides allow for templating
 Key: AIRFLOW-303
 URL: https://issues.apache.org/jira/browse/AIRFLOW-303
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Rob Froetscher
Assignee: Rob Froetscher
Priority: Minor


See discussion here: https://github.com/apache/incubator-airflow/pull/1630

Idea is that we want to allow templating on a dictionary argument to an 
operator. This is probably best done by introspecting what are the strings in 
the dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


incubator-airflow git commit: [AIRFLOW-282] Remove PR Tool logic that depends on version formatting

2016-06-30 Thread jlowin
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 7980f7771 -> 096564848


[AIRFLOW-282] Remove PR Tool logic that depends on version formatting

Closes #1625 from jlowin/pr-tool


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/09656484
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/09656484
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/09656484

Branch: refs/heads/master
Commit: 09656484806672b8ffaa133dffaf755f45c98307
Parents: 7980f77
Author: jlowin 
Authored: Thu Jun 30 18:14:55 2016 -0400
Committer: jlowin 
Committed: Thu Jun 30 18:14:55 2016 -0400

--
 dev/airflow-pr | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09656484/dev/airflow-pr
--
diff --git a/dev/airflow-pr b/dev/airflow-pr
index 937ce28..7f10a89 100755
--- a/dev/airflow-pr
+++ b/dev/airflow-pr
@@ -503,17 +503,20 @@ def resolve_jira_issue(comment=None, jira_id=None, 
merge_branches=None):
 else:
 default_fix_versions = []
 
-for v in default_fix_versions:
-# Handles the case where we have forked a release branch but not yet 
made the release.
-# In this case, if the PR is committed to the master branch and the 
release branch, we
-# only consider the release branch to be the fix version. E.g. it is 
not valid to have
-# both 1.1.0 and 1.0.0 as fix versions.
-(major, minor, patch) = v.split(".")
-if patch == "0":
-previous = "%s.%s.%s" % (major, int(minor) - 1, 0)
-if previous in default_fix_versions:
-default_fix_versions = list(filter(
-lambda x: x != v, default_fix_versions))
+# TODO Airflow versions vary from two to four decimal places (2.0, 1.7.1.3)
+# The following logic can be reintroduced if/when a standard emerges.
+
+# for v in default_fix_versions:
+# # Handles the case where we have forked a release branch but not yet 
made the release.
+# # In this case, if the PR is committed to the master branch and the 
release branch, we
+# # only consider the release branch to be the fix version. E.g. it is 
not valid to have
+# # both 1.1.0 and 1.0.0 as fix versions.
+# (major, minor, patch) = v.split(".")
+# if patch == "0":
+# previous = "%s.%s.%s" % (major, int(minor) - 1, 0)
+# if previous in default_fix_versions:
+# default_fix_versions = list(filter(
+# lambda x: x != v, default_fix_versions))
 default_fix_versions = ",".join(default_fix_versions)
 
 fix_versions = click.prompt(



[jira] [Updated] (AIRFLOW-302) The Bolke Correction: take squash commit messages from individual commit messages

2016-06-30 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-302:
---
Description: 
Currently, squash commit subjects are taken from the PR title (which should 
reference all JIRA issues). However this can be redundant in the case where 
there is just one commit, since it will be in the PR title and the commit 
message. Add an option (possibly default, or maybe default when there is only 
one referenced issue or one commit) to use the first commit message as the 
squash commit message.

cc [~bolke] for suggesting it

  was:Currently, squash commit subjects are taken from the PR title (which 
should reference all JIRA issues). However this can be redundant in the case 
where there is just one commit, since it will be in the PR title and the commit 
message. Add an option (possibly default, or maybe default when there is only 
one referenced issue or one commit) to use the first commit message as the 
squash commit message.


> The Bolke Correction: take squash commit messages from individual commit 
> messages
> -
>
> Key: AIRFLOW-302
> URL: https://issues.apache.org/jira/browse/AIRFLOW-302
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> Currently, squash commit subjects are taken from the PR title (which should 
> reference all JIRA issues). However this can be redundant in the case where 
> there is just one commit, since it will be in the PR title and the commit 
> message. Add an option (possibly default, or maybe default when there is only 
> one referenced issue or one commit) to use the first commit message as the 
> squash commit message.
> cc [~bolke] for suggesting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-302) The Bolke Correction: take squash commit messages from individual commit messages

2016-06-30 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-302:
--

 Summary: The Bolke Correction: take squash commit messages from 
individual commit messages
 Key: AIRFLOW-302
 URL: https://issues.apache.org/jira/browse/AIRFLOW-302
 Project: Apache Airflow
  Issue Type: Bug
  Components: PR tool
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


Currently, squash commit subjects are taken from the PR title (which should 
reference all JIRA issues). However this can be redundant in the case where 
there is just one commit, since it will be in the PR title and the commit 
message. Add an option (possibly default, or maybe default when there is only 
one referenced issue or one commit) to use the first commit message as the 
squash commit message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-301) Unit test always fails on the last day of the month

2016-06-30 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-301:
--

 Summary: Unit test always fails on the last day of the month
 Key: AIRFLOW-301
 URL: https://issues.apache.org/jira/browse/AIRFLOW-301
 Project: Apache Airflow
  Issue Type: Bug
  Components: tests
Affects Versions: Airflow 1.7.1.3
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


The following line in the unit test for clearing XComs fails on the last day of 
the month:
https://github.com/apache/incubator-airflow/blob/7980f7771ab6b6f84259ea9a52e78e4f5f690e42/tests/models.py#L560

{code}
==
ERROR: tests xcom fetch behavior with different execution dates, using
--
Traceback (most recent call last):
  File "/home/travis/build/apache/incubator-airflow/tests/models.py", line 560, 
in test_xcom_pull_different_execution_date
exec_date = exec_date.replace(day=exec_date.day + 1)
ValueError: day is out of range for month
 >> begin captured logging << 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-291) Add an index for task_instance.state

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-291.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

> Add an index for task_instance.state
> 
>
> Key: AIRFLOW-291
> URL: https://issues.apache.org/jira/browse/AIRFLOW-291
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>  Labels: database, scalability
> Fix For: Airflow 1.8
>
>
> Add an index for task_instance.state to speed up querying for currently 
> queued/running tasks.
> From my experimentation at Airbnb on our mysql DB this has shown about a 
> 15-40% decrease in the number of mysql connections depending on qps and has 
> brought down CPU usage of the DB from constant spikes of 100% to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-291) Add an index for task_instance.state

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357893#comment-15357893
 ] 

ASF subversion and git services commented on AIRFLOW-291:
-

Commit 7980f7771ab6b6f84259ea9a52e78e4f5f690e42 in incubator-airflow's branch 
refs/heads/master from [~aoen]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7980f77 ]

[AIRFLOW-291] Add index for state in TI table

Closes #1635 from aoen/ddavydov/add_index_to_task_instance_state


> Add an index for task_instance.state
> 
>
> Key: AIRFLOW-291
> URL: https://issues.apache.org/jira/browse/AIRFLOW-291
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>  Labels: database, scalability
>
> Add an index for task_instance.state to speed up querying for currently 
> queued/running tasks.
> From my experimentation at Airbnb on our mysql DB this has shown about a 
> 15-40% decrease in the number of mysql connections depending on qps and has 
> brought down CPU usage of the DB from constant spikes of 100% to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-100) Add flexibility to ExternalTaskSensor

2016-06-30 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-100:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/1641

> Add flexibility to ExternalTaskSensor
> -
>
> Key: AIRFLOW-100
> URL: https://issues.apache.org/jira/browse/AIRFLOW-100
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
>  Labels: operator
>
> The ExternalTaskSensor defaults to sensing tasks with the same 
> {{execution_date}} as it does, and has an {{execution_delta}} parameter for 
> looking back farther in time. However, this doesn't support the case where 
> the sensing task has a smaller schedule_interval than the target task.
> For example, if the ETS were run every hour, one couldn't use a fixed 
> {{execution_delta}} to sense a task that only ran daily (since each instance 
> of the ETS would need a different execution_delta). 
> However, a Daily task can wait for multiple hourly tasks, because it knows in 
> advance that it needs 24 ETS's with deltas == range(24).
> Concrete suggestion:
> - add a param ({{execution_delta_fn}}?) that takes in the current 
> execution_date and is expected to return the desired sense date (for example, 
> it could always return midnight of the previous day, no matter what the ETS 
> was executed).
> cc [~criccomini]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357775#comment-15357775
 ] 

ASF subversion and git services commented on AIRFLOW-296:
-

Commit 24d41b8909840451c1ef7d70c1c7671e6d87528c in incubator-airflow's branch 
refs/heads/master from [~msumit]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=24d41b8 ]

[AIRFLOW-296] template_ext is being treated as a string rather than a tuple in 
qubole operator

Closes #1638 from msumit/AIRFLOW-296


> Getting TemplateNotFound Error while using QuboleOperator
> -
>
> Key: AIRFLOW-296
> URL: https://issues.apache.org/jira/browse/AIRFLOW-296
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>
> Getting following error while using Qubole operator. 
> {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}}
> Found the error and fix, will open a PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357774#comment-15357774
 ] 

ASF subversion and git services commented on AIRFLOW-296:
-

Commit 24d41b8909840451c1ef7d70c1c7671e6d87528c in incubator-airflow's branch 
refs/heads/master from [~msumit]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=24d41b8 ]

[AIRFLOW-296] template_ext is being treated as a string rather than a tuple in 
qubole operator

Closes #1638 from msumit/AIRFLOW-296


> Getting TemplateNotFound Error while using QuboleOperator
> -
>
> Key: AIRFLOW-296
> URL: https://issues.apache.org/jira/browse/AIRFLOW-296
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>
> Getting following error while using Qubole operator. 
> {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}}
> Found the error and fix, will open a PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-299) Evaluate a plugin architecture to replace contrib

2016-06-30 Thread Dan Davydov (JIRA)
Dan Davydov created AIRFLOW-299:
---

 Summary: Evaluate a plugin architecture to replace contrib
 Key: AIRFLOW-299
 URL: https://issues.apache.org/jira/browse/AIRFLOW-299
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib
Reporter: Dan Davydov
Priority: Minor


We should take a look at the usefulness of a plugin architecture similar to 
tools like Jenkins for the contrib folder. This way we can delegate/scale 
committers for contrib, keep the git history/repo size/tests of the repo 
smaller, allow users to publish operators without waiting for their commits to 
be merged by a committer etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


incubator-airflow git commit: [AIRFLOW-296] template_ext is being treated as a string rather than a tuple in qubole operator

2016-06-30 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 002cf85bd -> 24d41b890


[AIRFLOW-296] template_ext is being treated as a string rather than a tuple in 
qubole operator

Closes #1638 from msumit/AIRFLOW-296


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/24d41b89
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/24d41b89
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/24d41b89

Branch: refs/heads/master
Commit: 24d41b8909840451c1ef7d70c1c7671e6d87528c
Parents: 002cf85
Author: Sumit Maheshwari 
Authored: Thu Jun 30 13:16:02 2016 -0700
Committer: Chris Riccomini 
Committed: Thu Jun 30 13:16:02 2016 -0700

--
 airflow/contrib/operators/qubole_operator.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/24d41b89/airflow/contrib/operators/qubole_operator.py
--
diff --git a/airflow/contrib/operators/qubole_operator.py 
b/airflow/contrib/operators/qubole_operator.py
index cbf15c4..c462dca 100755
--- a/airflow/contrib/operators/qubole_operator.py
+++ b/airflow/contrib/operators/qubole_operator.py
@@ -95,7 +95,7 @@ class QuboleOperator(BaseOperator):
 
 template_fields = ('query', 'script_location', 'sub_command', 'script', 
'files', 'archives', 'program', 'cmdline',
'sql', 'where_clause', 'extract_query', 
'boundary_query', 'macros', 'tags', 'name', 'parameters')
-template_ext = ('.txt')
+template_ext = ('.txt',)
 ui_color = '#3064A1'
 ui_fgcolor = '#fff'
 



[jira] [Commented] (AIRFLOW-286) Improve FTPHook to implement context manager interface

2016-06-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357768#comment-15357768
 ] 

ASF subversion and git services commented on AIRFLOW-286:
-

Commit 002cf85bdbbe62d04cec741b078e1600e0fd6c0a in incubator-airflow's branch 
refs/heads/master from [~skudriashev]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=002cf85 ]

[AIRFLOW-286] Improve FTPHook to implement context manager interface

Closes #1632 from skudriashev/airflow-286


> Improve FTPHook to implement context manager interface
> --
>
> Key: AIRFLOW-286
> URL: https://issues.apache.org/jira/browse/AIRFLOW-286
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> It would be very nice to use FTPHook as a context manager:
> {code}
> with FTPHook(...) as hook:
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-286) Improve FTPHook to implement context manager interface

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-286.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

Merged, thanks!

> Improve FTPHook to implement context manager interface
> --
>
> Key: AIRFLOW-286
> URL: https://issues.apache.org/jira/browse/AIRFLOW-286
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> It would be very nice to use FTPHook as a context manager:
> {code}
> with FTPHook(...) as hook:
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-286) Improve FTPHook to implement context manager interface

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-286:

External issue URL: https://github.com/apache/incubator-airflow/pull/1632

> Improve FTPHook to implement context manager interface
> --
>
> Key: AIRFLOW-286
> URL: https://issues.apache.org/jira/browse/AIRFLOW-286
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
>
> It would be very nice to use FTPHook as a context manager:
> {code}
> with FTPHook(...) as hook:
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-293) Task execution independent of heartrate

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-293:

 Affects Version/s: Airflow 1.7.1.3
External issue URL: https://github.com/apache/incubator-airflow/pull/1637

> Task execution independent of heartrate
> ---
>
> Key: AIRFLOW-293
> URL: https://issues.apache.org/jira/browse/AIRFLOW-293
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: Airflow 1.8, Airflow 1.7.1.3
>Reporter: Paul Yang
>Assignee: Paul Yang
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> When a task runs through the `airflow run` command, it currently does not 
> return until at least `job_heartbeat_sec` seconds have elapsed.
> When the job heartrate is a high value (for example, to reduce DB load), this 
> makes short task instances run for much longer than necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-291) Add an index for task_instance.state

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-291:

External issue URL: https://github.com/apache/incubator-airflow/pull/1635

> Add an index for task_instance.state
> 
>
> Key: AIRFLOW-291
> URL: https://issues.apache.org/jira/browse/AIRFLOW-291
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>  Labels: database, scalability
>
> Add an index for task_instance.state to speed up querying for currently 
> queued/running tasks.
> From my experimentation at Airbnb on our mysql DB this has shown about a 
> 15-40% decrease in the number of mysql connections depending on qps and has 
> brought down CPU usage of the DB from constant spikes of 100% to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-243.
---
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   Airflow 1.8

> Use a more efficient Thrift call for HivePartitionSensor
> 
>
> Key: AIRFLOW-243
> URL: https://issues.apache.org/jira/browse/AIRFLOW-243
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 1.7.1.3
>Reporter: Paul Yang
>Assignee: Li Xuanji
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call 
> that can result in some expensive SQL queries for tables that have many 
> partitions and are partitioned by multiple keys. We've seen our metastore DB 
> get hammered by these sensors resulting in service degradation for other 
> metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many 
> connections to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that 
> translates into more efficient SQL queries. Because connections will be 
> pooled on the Thrift server, the DB won't get overloaded as with the 
> {{MetastorePartitionSensor}}. The semantics of the arguments will change, so 
> either a new argument needs to be introduced, or a new operator needs to be 
> created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-243) Use a more efficient Thrift call for HivePartitionSensor

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-243:

Affects Version/s: (was: Airflow 2.0)
   Airflow 1.7.1.3

> Use a more efficient Thrift call for HivePartitionSensor
> 
>
> Key: AIRFLOW-243
> URL: https://issues.apache.org/jira/browse/AIRFLOW-243
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: Airflow 1.7.1.3
>Reporter: Paul Yang
>Assignee: Li Xuanji
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call 
> that can result in some expensive SQL queries for tables that have many 
> partitions and are partitioned by multiple keys. We've seen our metastore DB 
> get hammered by these sensors resulting in service degradation for other 
> metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many 
> connections to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that 
> translates into more efficient SQL queries. Because connections will be 
> pooled on the Thrift server, the DB won't get overloaded as with the 
> {{MetastorePartitionSensor}}. The semantics of the arguments will change, so 
> either a new argument needs to be introduced, or a new operator needs to be 
> created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-189) Highlighting of Parent/Child nodes in Graphs

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-189.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

> Highlighting of Parent/Child nodes in Graphs
> 
>
> Key: AIRFLOW-189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-189
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.7.1.3
>Reporter: Paul Rhodes
>Assignee: Paul Rhodes
>Priority: Minor
> Fix For: Airflow 1.8
>
> Attachments: Screenshot from 2016-05-29 15-19-39.png
>
>
> Sometimes in large graphs it's difficult to see the direct 
> ancestors/descendants. This change would use mouseover events to highlight 
> these tasks to make it easier to interpret the graphs.
> An example can be seen here in the attachment
> !Screenshot from 2016-05-29 15-19-39.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-189) Highlighting of Parent/Child nodes in Graphs

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-189:

Affects Version/s: Airflow 1.7.1.3

> Highlighting of Parent/Child nodes in Graphs
> 
>
> Key: AIRFLOW-189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-189
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.7.1.3
>Reporter: Paul Rhodes
>Assignee: Paul Rhodes
>Priority: Minor
> Fix For: Airflow 1.8
>
> Attachments: Screenshot from 2016-05-29 15-19-39.png
>
>
> Sometimes in large graphs it's difficult to see the direct 
> ancestors/descendants. This change would use mouseover events to highlight 
> these tasks to make it easier to interpret the graphs.
> An example can be seen here in the attachment
> !Screenshot from 2016-05-29 15-19-39.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-255) schedule_dag shouldn't return early if dagrun_timeout is given

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-255:

Affects Version/s: Airflow 1.7.1.3

> schedule_dag shouldn't return early if dagrun_timeout is given
> --
>
> Key: AIRFLOW-255
> URL: https://issues.apache.org/jira/browse/AIRFLOW-255
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
>Reporter: Kevin Lin
>Assignee: Kevin Lin
> Fix For: Airflow 1.8
>
>
> In 
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L396, 
> schedule_dag returns as long as `len(active_runs) >= dag.max_active_runs`. It 
> should also take into account dag.dagrun_timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-255) schedule_dag shouldn't return early if dagrun_timeout is given

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-255.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

> schedule_dag shouldn't return early if dagrun_timeout is given
> --
>
> Key: AIRFLOW-255
> URL: https://issues.apache.org/jira/browse/AIRFLOW-255
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
>Reporter: Kevin Lin
>Assignee: Kevin Lin
> Fix For: Airflow 1.8
>
>
> In 
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L396, 
> schedule_dag returns as long as `len(active_runs) >= dag.max_active_runs`. It 
> should also take into account dag.dagrun_timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-289) Use datetime.utcnow() to keep airflow system independent

2016-06-30 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-289:

Affects Version/s: Airflow 1.7.1.3

> Use datetime.utcnow() to keep airflow system independent
> 
>
> Key: AIRFLOW-289
> URL: https://issues.apache.org/jira/browse/AIRFLOW-289
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.7.1.3
>Reporter: Vineet Goel
>Assignee: Vineet Goel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-298) incubator disclaimer isn't proper on documentation website

2016-06-30 Thread Maxime Beauchemin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxime Beauchemin updated AIRFLOW-298:
--
External issue URL: https://github.com/apache/incubator-airflow/pull/1640

> incubator disclaimer isn't proper on documentation website
> --
>
> Key: AIRFLOW-298
> URL: https://issues.apache.org/jira/browse/AIRFLOW-298
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-297) Exponential Backoff Retry Delay

2016-06-30 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-297:

Summary: Exponential Backoff Retry Delay  (was: Exponential Backoff)

> Exponential Backoff Retry Delay
> ---
>
> Key: AIRFLOW-297
> URL: https://issues.apache.org/jira/browse/AIRFLOW-297
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Minor
>
> The retry delay time is currently fixed. It would be an useful option to 
> support progressive longer waits between retries via exponential backoff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-297) Exponential Backoff

2016-06-30 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-297:
---

 Summary: Exponential Backoff
 Key: AIRFLOW-297
 URL: https://issues.apache.org/jira/browse/AIRFLOW-297
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Joy Gao
Assignee: Joy Gao
Priority: Minor


The retry delay time is currently fixed. It would be an useful option to 
support progressive longer waits between retries via exponential backoff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (AIRFLOW-290) Implement ODBC Hook

2016-06-30 Thread Jason Damiani (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-290 started by Jason Damiani.
-
> Implement ODBC Hook
> ---
>
> Key: AIRFLOW-290
> URL: https://issues.apache.org/jira/browse/AIRFLOW-290
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks
>Reporter: Jason Damiani
>Assignee: Jason Damiani
>Priority: Minor
>
> Add support for hooking into ODBC data sources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-246) dag_stats endpoint has a terrible query

2016-06-30 Thread Matthias Huschle (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356998#comment-15356998
 ] 

Matthias Huschle commented on AIRFLOW-246:
--

Do I understand the query correctly, that it delivers different results than 
before, as it double counts the TI belonging to the last DAG run if it is still 
running? Not that I'd consider this a bad trade-off, but it might be worth 
noting it.

> dag_stats endpoint has a terrible query
> ---
>
> Key: AIRFLOW-246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-246
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: Airflow 1.7.1
> Environment: MySQL Backend through sqlalchemy
>Reporter: Neil Hanlon
>Assignee: Kengo Seki
>
> Hitting this endpoint creates a series of queries on the database which take 
> over 20 seconds to run, causing the page to not load for that entire time. 
> Luckily the main page (which includes this under "Recent Statuses") loads 
> this synchronously, but still... waiting almost half a minute (at times more) 
> to see the statuses for dags is really not fun.
> We have less than a million rows in the task_instance table--so it's not even 
> a problem with that.
> Here's a query profile for the query:
> https://gist.github.com/NeilHanlon/613f12724e802bc51c23fca7d46d28bf
> We've done some optimizations on the database, but to no avail.
> The query:
> {code:sql}
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS 
> task_instance_state, count(task_instance.task_id) AS count_1 FROM 
> task_instance LEFT OUTER JOIN (SELECT dag_run.dag_id AS dag_id, 
> dag_run.execution_date AS execution_date FROM dag_run WHERE dag_run.state = 
> 'running') AS running_dag_run ON running_dag_run.dag_id = 
> task_instance.dag_id AND running_dag_run.execution_date = 
> task_instance.execution_date LEFT OUTER JOIN (SELECT dag_run.dag_id AS 
> dag_id, max(dag_run.execution_date) AS execution_date FROM dag_run GROUP BY 
> dag_run.dag_id) AS last_dag_run ON last_dag_run.dag_id = task_instance.dag_id 
> AND last_dag_run.execution_date = task_instance.execution_date WHERE 
> task_instance.task_id IN ... AND (running_dag_run.dag_id IS NOT NULL OR 
> last_dag_run.dag_id IS NOT NULL) GROUP BY task_instance.dag_id, 
> task_instance.state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator

2016-06-30 Thread Sumit Maheshwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Maheshwari updated AIRFLOW-296:
-
Description: 
Getting following error while using Qubole operator. 

{{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}}

Found the error and fix, will open a PR soon.

  was:
Getting following error while using Qubole operator. 

{{ jinja2.exceptions.TemplateNotFound: select count(*) from default.payment }}

Found the error and fix, will open a PR soon.


> Getting TemplateNotFound Error while using QuboleOperator
> -
>
> Key: AIRFLOW-296
> URL: https://issues.apache.org/jira/browse/AIRFLOW-296
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>
> Getting following error while using Qubole operator. 
> {{jinja2.exceptions.TemplateNotFound: select count(*) from default.payment}}
> Found the error and fix, will open a PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-296) Getting TemplateNotFound Error while using QuboleOperator

2016-06-30 Thread Sumit Maheshwari (JIRA)
Sumit Maheshwari created AIRFLOW-296:


 Summary: Getting TemplateNotFound Error while using QuboleOperator
 Key: AIRFLOW-296
 URL: https://issues.apache.org/jira/browse/AIRFLOW-296
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Sumit Maheshwari
Assignee: Sumit Maheshwari


Getting following error while using Qubole operator. 

{{ jinja2.exceptions.TemplateNotFound: select count(*) from default.payment }}

Found the error and fix, will open a PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-188) Exception on /admin/airflow/clear

2016-06-30 Thread peter pang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356786#comment-15356786
 ] 

peter pang commented on AIRFLOW-188:


Hi, I met the same Ooops when I press the [Clear] button in the task dialog 
window. 

the airflow version is 1.7.1.3 , which I installed through pip.

"airflow clear dag_name" command is ok. 

Ooops happens on all the dags, lists on the webpage. 

Click One Dag, In the [ Graph View ] press any task running successfully, comes 
out the task operation dialog, press [ Clear ] then Ooops 

> Exception on /admin/airflow/clear
> -
>
> Key: AIRFLOW-188
> URL: https://issues.apache.org/jira/browse/AIRFLOW-188
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.0
> Environment: - CeleryExecutor
> - broker Redis
> - backend PostgreSQL
>Reporter: Cyril Scetbon
>Assignee: Siddharth Anand
>
> When I try to clear the past of a task I get a Oops. Here is the output I get 
> : http://pastebin.com/yTYsekyB
> The code uses 2 files : 
> http://pastebin.com/Uyp41wEh (subdags)
> http://pastebin.com/wiGP964w (maindag that uses subdags)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() reads unclosed file and skip last statement

2016-06-30 Thread Alexey Sanko (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Sanko updated AIRFLOW-295:
-
Description: 
If hql into HiveOperator which use beeline connection contains a lot of 
statements and doesn't contain additional space line at the end beeline skip 
last statement.
As I understand beeline directly reads unclosed file (which cannot be closed of 
NamedTemporaryFile class usage) and get unexpected EOF.
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;
""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:01:51,346] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;
[2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: 
asanko_hive
[2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f 
/tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u 
jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd
[2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms
[2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to 
jdbc:hive2://asanko_hive:1/default;auth=none
[2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive 
(version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 
0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: 
TRANSACTION_REPEATABLE_READ
[2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 
0.12.0-cdh5.1.3 by Apache Hive
[2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 
seconds)
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive>
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 
seconds)
[2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual;
[2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 
seconds)
[2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * 
from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 
seconds)
[2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: 
org.apache.hive.jdbc.HiveConnection
{code}

If You create file manually with the same hql and pass it to beeline all 
statements will be called.

If You manually add empty row last statement successfully run:
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;

""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;

[2016-06-29 03:04:01,388] 

[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() reads unclosed file and skip last statement

2016-06-30 Thread Alexey Sanko (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Sanko updated AIRFLOW-295:
-
Description: 
If hql into HiveOperator which use beeline connection contains a lot of 
statements and doesn't contain additional space line at the end beeline skip 
last statement.
As I understand beeline directly reads unclosed file (which cannot be closed of 
NamedTemporaryFile class usage) and get unexpected EOF.
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;
""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:01:51,346] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;
[2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: 
asanko_hive
[2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f 
/tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u 
jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd
[2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms
[2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to 
jdbc:hive2://asanko_hive:1/default;auth=none
[2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive 
(version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 
0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: 
TRANSACTION_REPEATABLE_READ
[2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 
0.12.0-cdh5.1.3 by Apache Hive
[2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 
seconds)
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive>
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 
seconds)
[2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual;
[2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 
seconds)
[2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * 
from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 
seconds)
[2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: 
org.apache.hive.jdbc.HiveConnection
{code}

If You create file manually with the same hql and pass it to beeline all 
statements will be called.

If we manually add empty row last statement successfully run:
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;

""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;

[2016-06-29 03:04:01,388] 

[jira] [Updated] (AIRFLOW-295) Beeline called into HiveCliHook.run() read unclosed file and skip last statement

2016-06-30 Thread Alexey Sanko (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Sanko updated AIRFLOW-295:
-
Description: 
If hql into HiveOperator which use beeline connection contains a lot of 
statements and doesn't contain additional space line at the end beeline skip 
last statement.
As I understand beeline directly reads unclosed file (which cannot be closed of 
NamedTemporaryFile class usage) and get unexpected EOF.
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;
""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:01:51,346] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;
[2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: 
asanko_hive
[2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f 
/tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u 
jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd
[2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms
[2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to 
jdbc:hive2://asanko_hive:1/default;auth=none
[2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive 
(version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 
0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: 
TRANSACTION_REPEATABLE_READ
[2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 
0.12.0-cdh5.1.3 by Apache Hive
[2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 
seconds)
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive>
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 
seconds)
[2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual;
[2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 
seconds)
[2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * 
from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 
seconds)
[2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: 
org.apache.hive.jdbc.HiveConnection
{code}

But if we manually add empty row last statement successfully run:
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;

""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;

[2016-06-29 03:04:01,388] {base_hook.py:53} INFO - Using connection to: 
asanko_hive
[2016-06-29 03:04:01,390] {hive_hooks.py:105} INFO 

[jira] [Created] (AIRFLOW-295) Beeline called into HiveCliHook.run() read unclosed file and skip last statement

2016-06-30 Thread Alexey Sanko (JIRA)
Alexey Sanko created AIRFLOW-295:


 Summary: Beeline called into HiveCliHook.run() read unclosed file 
and skip last statement
 Key: AIRFLOW-295
 URL: https://issues.apache.org/jira/browse/AIRFLOW-295
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alexey Sanko


If hql into HiveOperator which use beeline connection contains a lot of 
statements and doesn't contain additional space line at the end beeline skip 
last statement.
As I understand beeline directly read unclosed file (which cannot be closed of 
NamedTemporaryFile class usage) and get unexpected EOF.
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;
""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:01:51,346] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:01:51,354] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';
desc asanko.test_airflow_dual;
[2016-06-29 03:01:51,357] {base_hook.py:53} INFO - Using connection to: 
asanko_hive
[2016-06-29 03:01:51,358] {hive_hooks.py:105} INFO - beeline -f 
/tmp/airflow_hiveop_EDQ7kE/tmpuSq2NR -u 
jdbc:hive2://asanko_hive:1/default;auth=none -n asanko -p pwd
[2016-06-29 03:01:52,119] {hive_hooks.py:116} INFO - scan complete in 3ms
[2016-06-29 03:01:52,120] {hive_hooks.py:116} INFO - Connecting to 
jdbc:hive2://asanko_hive:1/default;auth=none
[2016-06-29 03:01:52,375] {hive_hooks.py:116} INFO - Connected to: Apache Hive 
(version 0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Driver: Hive JDBC (version 
0.12.0-cdh5.1.3)
[2016-06-29 03:01:52,376] {hive_hooks.py:116} INFO - Transaction isolation: 
TRANSACTION_REPEATABLE_READ
[2016-06-29 03:01:52,385] {hive_hooks.py:116} INFO - Beeline version 
0.12.0-cdh5.1.3 by Apache Hive
[2016-06-29 03:01:52,386] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> USE asanko;
[2016-06-29 03:01:52,428] {hive_hooks.py:116} INFO - No rows affected (0.041 
seconds)
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive>
[2016-06-29 03:01:52,441] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> use asanko;
[2016-06-29 03:01:52,451] {hive_hooks.py:116} INFO - No rows affected (0.01 
seconds)
[2016-06-29 03:01:52,452] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> drop table if exists test_airflow_dual;
[2016-06-29 03:01:52,463] {hive_hooks.py:116} INFO - No rows affected (0.009 
seconds)
[2016-06-29 03:01:52,465] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> create table asanko.test_airflow_dual as select * 
from asanko.dual where x <> '2016-01-01';
[2016-06-29 03:01:55,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:00,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:05,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:10,006] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:15,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:20,003] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:25,011] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:30,010] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:35,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:40,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:45,007] {jobs.py:142} DEBUG - [heart] Boom.
[2016-06-29 03:02:46,575] {hive_hooks.py:116} INFO - No rows affected (54.109 
seconds)
[2016-06-29 03:02:46,578] {hive_hooks.py:116} INFO - 0: 
jdbc:hive2://asanko_hive> desc asanko.test_airflow_dual;Closing: 
org.apache.hive.jdbc.HiveConnection
{code}

But if we manually add empty row last statement successfully run:
{code}
hive = HiveOperator(
dag = hive_dag,
start_date = datetime(2016, 1, 1),
task_id='asanko_cli_remote_test',
hql = """
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'{{ ds }}';
desc asanko.test_airflow_dual;

""",
hive_cli_conn_id='asanko_hive_cli_beeline',
schema='asanko',
default_args=args,
run_as_owner=True)
{code}
Log:
{code}
[2016-06-29 03:04:01,378] {models.py:1041} INFO - Executing 
 on 2016-01-01 00:00:00
[2016-06-29 03:04:01,386] {hive_operator.py:63} INFO - Executing: 
use asanko;
drop table if exists test_airflow_dual;
create table asanko.test_airflow_dual as select * from asanko.dual where x <> 
'2016-01-01';