[jira] [Created] (AIRFLOW-2097) UnboundLocalError: local variable 'tz' referenced before assignment

2018-02-09 Thread Bryce Drennnan (JIRA)
Bryce Drennnan created AIRFLOW-2097:
---

 Summary: UnboundLocalError: local variable 'tz' referenced before 
assignment
 Key: AIRFLOW-2097
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2097
 Project: Apache Airflow
  Issue Type: Bug
  Components: utils
Reporter: Bryce Drennnan


The date_range function references the variable tz before its assigned.  I 
noticed this while running doctests.

See this part of the code:

[https://github.com/apache/incubator-airflow/blob/15b8a36b9011166b06f176f684b71703a4aebddd/airflow/utils/dates.py#L73-L84]

I believe this bug was introduced here:

https://github.com/apache/incubator-airflow/commit/518a41acf319af27d49bdc0c84bda64b6b8af0b3#commitcomment-27433613



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2096) SparkSubmitOperator should expose yarn_application_id in the templates

2018-02-09 Thread Azhagu Selvan SP (JIRA)
Azhagu Selvan SP created AIRFLOW-2096:
-

 Summary: SparkSubmitOperator should expose yarn_application_id in 
the templates
 Key: AIRFLOW-2096
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2096
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Azhagu Selvan SP


The SparkSubmitHook parses and stores the `application_id` from YARN in the 
field `yarn_application_id`. It would be very useful if this value is exposed 
as a template field in the SparkSubmitOperator so that we can build a direct 
link to the YARN Application master from the Airflow UI. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2095) Add operator to create External BigQuery Table

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2095:
---

 Summary: Add operator to create External BigQuery Table
 Key: AIRFLOW-2095
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2095
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, gcp
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0


We already have hook to create an External BQ table. This task is to create an 
operator from that hook.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators

2018-02-09 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2094.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3027
[https://github.com/apache/incubator-airflow/pull/3027]

> Jinjafy project_id, region & zone in DataProc{*} Operators
> --
>
> Key: AIRFLOW-2094
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2094
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>
> The project_id, region, and zone in DataProc{*} Operators are not jinjafied. 
> If we can do that, we can use Airflow variables to use a default project_id, 
> region and zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358878#comment-16358878
 ] 

ASF subversion and git services commented on AIRFLOW-2094:
--

Commit 556c9ec5ba12973cc0335cd18d375227797ec62f in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=556c9ec ]

[AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators

- Minor docstring change
- Jinjafied project_id, region & zone in
DataProc{*} Operators to allow using Airflow
variables

Closes #3027 from kaxil/patch-3


> Jinjafy project_id, region & zone in DataProc{*} Operators
> --
>
> Key: AIRFLOW-2094
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2094
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>
> The project_id, region, and zone in DataProc{*} Operators are not jinjafied. 
> If we can do that, we can use Airflow variables to use a default project_id, 
> region and zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators

2018-02-09 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master bf1296fbd -> 556c9ec5b


[AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators

- Minor docstring change
- Jinjafied project_id, region & zone in
DataProc{*} Operators to allow using Airflow
variables

Closes #3027 from kaxil/patch-3


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/556c9ec5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/556c9ec5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/556c9ec5

Branch: refs/heads/master
Commit: 556c9ec5ba12973cc0335cd18d375227797ec62f
Parents: bf1296f
Author: Kaxil Naik 
Authored: Fri Feb 9 20:56:53 2018 +0100
Committer: Bolke de Bruin 
Committed: Fri Feb 9 20:56:53 2018 +0100

--
 airflow/contrib/operators/dataproc_operator.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/556c9ec5/airflow/contrib/operators/dataproc_operator.py
--
diff --git a/airflow/contrib/operators/dataproc_operator.py 
b/airflow/contrib/operators/dataproc_operator.py
index 3444cc6..ebcc402 100644
--- a/airflow/contrib/operators/dataproc_operator.py
+++ b/airflow/contrib/operators/dataproc_operator.py
@@ -64,7 +64,7 @@ class DataprocClusterCreateOperator(BaseOperator):
 :type master_machine_type: string
 :param master_disk_size: Disk size for the master node
 :type master_disk_size: int
-:param worker_machine_type:Compute engine machine type to use for the 
worker nodes
+:param worker_machine_type: Compute engine machine type to use for the 
worker nodes
 :type worker_machine_type: string
 :param worker_disk_size: Disk size for the worker nodes
 :type worker_disk_size: int
@@ -95,7 +95,7 @@ class DataprocClusterCreateOperator(BaseOperator):
 :type service_account_scopes: list[string]
 """
 
-template_fields = ['cluster_name']
+template_fields = ['cluster_name', 'project_id', 'zone', 'region']
 
 @apply_defaults
 def __init__(self,
@@ -339,7 +339,7 @@ class DataprocClusterDeleteOperator(BaseOperator):
 :type delegate_to: string
 """
 
-template_fields = ['cluster_name']
+template_fields = ['cluster_name', 'project_id', 'region']
 
 @apply_defaults
 def __init__(self,



[jira] [Created] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2094:
---

 Summary: Jinjafy project_id, region & zone in DataProc{*} Operators
 Key: AIRFLOW-2094
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2094
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, gcp
Reporter: Kaxil Naik


The project_id, region, and zone in DataProc{*} Operators are not jinjafied. If 
we can do that, we can use Airflow variables to use a default project_id, 
region and zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish

2018-02-09 Thread Ash Berlin-Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358757#comment-16358757
 ] 

Ash Berlin-Taylor commented on AIRFLOW-1667:


The process that writes to the log files is a sub-process of the celery worker 
itself – that just invokes {{airflow run --local}} - and that means the flush 
should happen as soon the task instance finishes running.

I do not see this behaivour on Py3/1.9.0 - our tasks appear in S3 when the task 
instance is finished. Are you saying you have to stop the {{airflow worker}} 
process for the logs to appear in S3?

> Remote log handlers don't upload logs on task finish
> 
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>Priority: Major
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish

2018-02-09 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358749#comment-16358749
 ] 

Josh Bacon commented on AIRFLOW-1667:
-

+1 We are using CeleryExecutors and notice that our logs never ship unless we 
shut down our workers. Flush probably needs to happen on some interval or task 
event handler.

> Remote log handlers don't upload logs on task finish
> 
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>Priority: Major
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2093) Feature DagBag loading from S3 (and others)

2018-02-09 Thread Bruno Bonagura (JIRA)
Bruno Bonagura created AIRFLOW-2093:
---

 Summary: Feature DagBag loading from S3 (and others)
 Key: AIRFLOW-2093
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2093
 Project: Apache Airflow
  Issue Type: Improvement
  Components: aws, boto3, DAG, scheduler, worker
Reporter: Bruno Bonagura


Deploying DAGs is a pain point for CeleryExecutor Airflow infrastructure. 
Allowing it to be configured to read them directly from remote sources (S3, 
FTP, HTTP...) not just local FS would enhance a lot this aspect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358343#comment-16358343
 ] 

ASF subversion and git services commented on AIRFLOW-2092:
--

Commit bf1296fbd2024135d4300fe9f0e5ce8f0dac1825 in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=bf1296f ]

[AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook

Fixed the datatype of parameter
`local_full_path_or_buffer` in `retrieve_file`
method for FTPHook

Closes #3026 from kaxil/patch-2


> Fixed incorrect parameter in docstring for FTPHook
> --
>
> Key: AIRFLOW-2092
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2092
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Fixed the datatype of parameter `local_full_path_or_buffer` in 
> `retrieve_file` method for FTPHook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2092.
---
Resolution: Fixed

Issue resolved by pull request #3026
[https://github.com/apache/incubator-airflow/pull/3026]

> Fixed incorrect parameter in docstring for FTPHook
> --
>
> Key: AIRFLOW-2092
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2092
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Fixed the datatype of parameter `local_full_path_or_buffer` in 
> `retrieve_file` method for FTPHook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master a1e5a075c -> bf1296fbd


[AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook

Fixed the datatype of parameter
`local_full_path_or_buffer` in `retrieve_file`
method for FTPHook

Closes #3026 from kaxil/patch-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/bf1296fb
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/bf1296fb
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/bf1296fb

Branch: refs/heads/master
Commit: bf1296fbd2024135d4300fe9f0e5ce8f0dac1825
Parents: a1e5a07
Author: Kaxil Naik 
Authored: Fri Feb 9 13:45:05 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 13:45:05 2018 +0100

--
 airflow/contrib/hooks/ftp_hook.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/bf1296fb/airflow/contrib/hooks/ftp_hook.py
--
diff --git a/airflow/contrib/hooks/ftp_hook.py 
b/airflow/contrib/hooks/ftp_hook.py
index b1e224d..bd46ba5 100644
--- a/airflow/contrib/hooks/ftp_hook.py
+++ b/airflow/contrib/hooks/ftp_hook.py
@@ -154,7 +154,7 @@ class FTPHook(BaseHook, LoggingMixin):
 :type remote_full_path: str
 :param local_full_path_or_buffer: full path to the local file or a
 file-like buffer
-:type local_full_path: str or file-like buffer
+:type local_full_path_or_buffer: str or file-like buffer
 """
 conn = self.get_conn()
 



[jira] [Created] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2092:
---

 Summary: Fixed incorrect parameter in docstring for FTPHook
 Key: AIRFLOW-2092
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2092
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, Documentation
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0


Fixed the datatype of parameter `local_full_path_or_buffer` in `retrieve_file` 
method for FTPHook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-XXX] Add SocialCops to Airflow users

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 759d8f83e -> a1e5a075c


[AIRFLOW-XXX] Add SocialCops to Airflow users

Closes #3018 from vinayak-mehta/update_readme


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a1e5a075
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a1e5a075
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a1e5a075

Branch: refs/heads/master
Commit: a1e5a075ccbd4abe48c205181e2faed659ab6898
Parents: 759d8f8
Author: Vinayak Mehta 
Authored: Fri Feb 9 13:25:56 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 13:26:00 2018 +0100

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a1e5a075/README.md
--
diff --git a/README.md b/README.md
index b22bbfb..6528ecc 100644
--- a/README.md
+++ b/README.md
@@ -189,6 +189,7 @@ Currently **officially** using Airflow:
 1. [Sidecar](https://hello.getsidecar.com/) 
[[@getsidecar](https://github.com/getsidecar)]
 1. [SimilarWeb](https://www.similarweb.com/) 
[[@similarweb](https://github.com/similarweb)]
 1. [SmartNews](https://www.smartnews.com/) [[@takus](https://github.com/takus)]
+1. [SocialCops](https://www.socialcops.com/) 
[[@vinayak-mehta](https://github.com/vinayak-mehta) & 
[@sharky93](https://github.com/sharky93)]
 1. [Spotify](https://github.com/spotify) 
[[@znichols](https://github.com/znichols)]
 1. [Stackspace](https://beta.stackspace.io/)
 1. [Stripe](https://stripe.com) [[@jbalogh](https://github.com/jbalogh)]



[jira] [Resolved] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2088.
---
Resolution: Fixed

Issue resolved by pull request #3022
[https://github.com/apache/incubator-airflow/pull/3022]

> Duplicate keys in MySQL to GCS Operator
> ---
>
> Key: AIRFLOW-2088
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2088
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Affects Versions: 1.9.1
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> The helper method `type_map` in `mysql_to_gcs` operator contains duplicate 
> key "FIELD_TYPE.INT24".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358329#comment-16358329
 ] 

ASF subversion and git services commented on AIRFLOW-2088:
--

Commit 759d8f83e79f7d22a5cfca93205ebaf1ce30a5ad in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=759d8f8 ]

[AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function

- Remove the duplicate key in
`MySqlToGoogleCloudStorageOperator` in the
`type_map` helper function that maps from MySQL
fields to BigQuery fields.

Closes #3022 from kaxil/duplicate-keys-fix


> Duplicate keys in MySQL to GCS Operator
> ---
>
> Key: AIRFLOW-2088
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2088
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Affects Versions: 1.9.1
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> The helper method `type_map` in `mysql_to_gcs` operator contains duplicate 
> key "FIELD_TYPE.INT24".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358326#comment-16358326
 ] 

ASF subversion and git services commented on AIRFLOW-2091:
--

Commit 0a71370d0541da48eb1d4ffc5aa2f5d3a4be1e59 in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0a71370 ]

[AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook

- Instead of `seq_of_parameters`, the docstring
contains `parameters`. Fixed this

Closes #3025 from kaxil/fix-parameter


> Fix incorrect parameter in BigQueryCursor docstring
> ---
>
> Key: AIRFLOW-2091
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2091
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Fixed docstring in BigQuery Cursor. It contains incorrect docstring of 
> `seq_of_parameters` as `parameters`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 0a71370d0 -> 759d8f83e


[AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function

- Remove the duplicate key in
`MySqlToGoogleCloudStorageOperator` in the
`type_map` helper function that maps from MySQL
fields to BigQuery fields.

Closes #3022 from kaxil/duplicate-keys-fix


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/759d8f83
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/759d8f83
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/759d8f83

Branch: refs/heads/master
Commit: 759d8f83e79f7d22a5cfca93205ebaf1ce30a5ad
Parents: 0a71370
Author: Kaxil Naik 
Authored: Fri Feb 9 13:24:21 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 13:24:21 2018 +0100

--
 airflow/contrib/operators/mysql_to_gcs.py | 1 -
 1 file changed, 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/759d8f83/airflow/contrib/operators/mysql_to_gcs.py
--
diff --git a/airflow/contrib/operators/mysql_to_gcs.py 
b/airflow/contrib/operators/mysql_to_gcs.py
index 784481d..41e23f5 100644
--- a/airflow/contrib/operators/mysql_to_gcs.py
+++ b/airflow/contrib/operators/mysql_to_gcs.py
@@ -214,7 +214,6 @@ class MySqlToGoogleCloudStorageOperator(BaseOperator):
 when a schema_filename is set.
 """
 d = {
-FIELD_TYPE.INT24: 'INTEGER',
 FIELD_TYPE.TINY: 'INTEGER',
 FIELD_TYPE.BIT: 'INTEGER',
 FIELD_TYPE.DATETIME: 'TIMESTAMP',



[jira] [Resolved] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2091.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3025
[https://github.com/apache/incubator-airflow/pull/3025]

> Fix incorrect parameter in BigQueryCursor docstring
> ---
>
> Key: AIRFLOW-2091
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2091
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Fixed docstring in BigQuery Cursor. It contains incorrect docstring of 
> `seq_of_parameters` as `parameters`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 4c7ae420a -> 0a71370d0


[AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook

- Instead of `seq_of_parameters`, the docstring
contains `parameters`. Fixed this

Closes #3025 from kaxil/fix-parameter


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0a71370d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0a71370d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0a71370d

Branch: refs/heads/master
Commit: 0a71370d0541da48eb1d4ffc5aa2f5d3a4be1e59
Parents: 4c7ae42
Author: Kaxil Naik 
Authored: Fri Feb 9 13:23:31 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 13:23:31 2018 +0100

--
 airflow/contrib/hooks/bigquery_hook.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0a71370d/airflow/contrib/hooks/bigquery_hook.py
--
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 653cb1b..dca4d33 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -1256,9 +1256,9 @@ class BigQueryCursor(BigQueryBaseCursor):
 
 :param operation: The query to execute.
 :type operation: string
-:param parameters: List of dictionary parameters to substitute into the
+:param seq_of_parameters: List of dictionary parameters to substitute 
into the
 query.
-:type parameters: list
+:type seq_of_parameters: list
 """
 for parameters in seq_of_parameters:
 self.execute(operation, parameters)



[jira] [Commented] (AIRFLOW-2086) The tree view page is too slow when display big dag.

2018-02-09 Thread Andrew Maguire (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358307#comment-16358307
 ] 

Andrew Maguire commented on AIRFLOW-2086:
-

I found this too. I also found that for big dags with lots of tasks many of the 
charts become unusable due to the legend becoming so big (one could argue 
legend not really needed with lots of tasks as you can just hover over 
interesting stuff to see what it is). 

So it would be cool if some of the UI stuff could be configured for defaults 
etc in some way. 

> The tree view page is too slow when display big dag.
> 
>
> Key: AIRFLOW-2086
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2086
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Lintao LI
>Priority: Major
>
> The tree view page is too slow for big(actually not too big) dag. 
> The page size will increase dramatically to hundreds of MB.
> please refer to 
> [here|https://stackoverflow.com/questions/48656221/apache-airflow-webui-tree-view-is-too-slow]
>  for details.
> I think the page contains a lot of redundant data. it's a bug or a flaw of 
> design.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2091:
---

 Summary: Fix incorrect parameter in BigQueryCursor docstring
 Key: AIRFLOW-2091
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2091
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, Documentation, gcp
Reporter: Kaxil Naik
Assignee: Kaxil Naik


Fixed docstring in BigQuery Cursor. It contains incorrect docstring of 
`seq_of_parameters` as `parameters`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2090) Fix typo in DataStore Hook

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358281#comment-16358281
 ] 

ASF subversion and git services commented on AIRFLOW-2090:
--

Commit 4c7ae420a7ff48819d90a278a144d4d91820bc37 in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4c7ae42 ]

[AIRFLOW-2090] Fix typo in DataStore Hook

Spelling mistake in the word `simultaneously` in
DataStore docs

Closes #3024 from kaxil/patch-1


> Fix typo in DataStore Hook
> --
>
> Key: AIRFLOW-2090
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2090
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Spelling mistake in the word `simultaneously` in DataStore docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2090) Fix typo in DataStore Hook

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2090.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3024
[https://github.com/apache/incubator-airflow/pull/3024]

> Fix typo in DataStore Hook
> --
>
> Key: AIRFLOW-2090
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2090
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, Documentation, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Spelling mistake in the word `simultaneously` in DataStore docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2090] Fix typo in DataStore Hook

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master e6973b159 -> 4c7ae420a


[AIRFLOW-2090] Fix typo in DataStore Hook

Spelling mistake in the word `simultaneously` in
DataStore docs

Closes #3024 from kaxil/patch-1


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4c7ae420
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4c7ae420
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4c7ae420

Branch: refs/heads/master
Commit: 4c7ae420a7ff48819d90a278a144d4d91820bc37
Parents: e6973b1
Author: Kaxil Naik 
Authored: Fri Feb 9 12:26:02 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 12:26:02 2018 +0100

--
 airflow/contrib/hooks/datastore_hook.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c7ae420/airflow/contrib/hooks/datastore_hook.py
--
diff --git a/airflow/contrib/hooks/datastore_hook.py 
b/airflow/contrib/hooks/datastore_hook.py
index ba690e0..a71333e 100644
--- a/airflow/contrib/hooks/datastore_hook.py
+++ b/airflow/contrib/hooks/datastore_hook.py
@@ -25,7 +25,7 @@ class DatastoreHook(GoogleCloudBaseHook):
 connection.
 
 This object is not threads safe. If you want to make multiple requests
-simultaniously, you will need to create a hook per thread.
+simultaneously, you will need to create a hook per thread.
 """
 
 def __init__(self,



[jira] [Created] (AIRFLOW-2090) Fix typo in DataStore Hook

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2090:
---

 Summary: Fix typo in DataStore Hook
 Key: AIRFLOW-2090
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2090
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, Documentation, gcp
Reporter: Kaxil Naik
Assignee: Kaxil Naik


Spelling mistake in the word `simultaneously` in DataStore docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2089) Add on kill for SparkSubmit in Standalone Cluster

2018-02-09 Thread Milan van der Meer (JIRA)
Milan van der Meer created AIRFLOW-2089:
---

 Summary: Add on kill for SparkSubmit in Standalone Cluster
 Key: AIRFLOW-2089
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2089
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Milan van der Meer
Assignee: Milan van der Meer






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator

2018-02-09 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2088:
---

 Summary: Duplicate keys in MySQL to GCS Operator
 Key: AIRFLOW-2088
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2088
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, gcp
Affects Versions: 1.9.1
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0


The helper method `type_map` in `mysql_to_gcs` operator contains duplicate key 
"FIELD_TYPE.INT24".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-1157.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3002
[https://github.com/apache/incubator-airflow/pull/3002]

> Assigning a task to a pool that doesn't exist crashes the scheduler
> ---
>
> Key: AIRFLOW-1157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
>Reporter: John Culver
>Assignee: David Klosowski
>Priority: Critical
> Fix For: 2.0.0
>
>
> If a dag is run that contains a task using a pool that doesn't exist, the 
> scheduler will crash.
> Manually triggering the run of this dag on an environment without a pool 
> named 'a_non_existent_pool' will crash the scheduler:
> {code}
> from datetime import datetime
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(dag_id='crash_scheduler',
>   start_date=datetime(2017,1,1),
>   schedule_interval=None)
> t1 = DummyOperator(task_id='crash',
>pool='a_non_existent_pool',
>dag=dag)
> {code}
> Here is the relevant log output on the scheduler:
> {noformat}
> [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test-3.py finished
> [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test_s3_file_move.py finished
> [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process 
> (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log
> [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process 
> (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py 
> - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log
> [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution:
>  19:31:22.298893 [scheduled]>
> [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in 
> Pool(name=None) with 128 open slots and 1 task instances in queue
> [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has 
> 0/16 running tasks
> [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) to queued
> [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: 
> airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local 
> -sd /opt/airflow/dags/test_s3_file_move.py
> [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor
> [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process 
> manager
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/crash_scheduler.py finished
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/configuration/constants.py finished
> [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process 
> (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log
> [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process 
> (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log
> [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution:
>  [scheduled]>
> [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129
> [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for 
> processes to exit...
> Traceback (most recent call last):
>   File "/usr/bin/airflow", line 28, in 
> args.func(args)
>   File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in 
> scheduler
> job.run()
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in 

[jira] [Commented] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358187#comment-16358187
 ] 

ASF subversion and git services commented on AIRFLOW-1157:
--

Commit e6973b1596914e5d62567e065223e7b169d1c26c in incubator-airflow's branch 
refs/heads/master from Ian Suvak
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e6973b1 ]

[AIRFLOW-1157] Fix missing pools crashing the scheduler

Throw a warning when a pool associated with a Task
Instance
doesn't exist instead of crashing the scheduler.
Use the default value of 0 slots for non-existent
pools.

Closes #3002 from iansuvak/1157_nonexistent_pool


> Assigning a task to a pool that doesn't exist crashes the scheduler
> ---
>
> Key: AIRFLOW-1157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
>Reporter: John Culver
>Assignee: David Klosowski
>Priority: Critical
>
> If a dag is run that contains a task using a pool that doesn't exist, the 
> scheduler will crash.
> Manually triggering the run of this dag on an environment without a pool 
> named 'a_non_existent_pool' will crash the scheduler:
> {code}
> from datetime import datetime
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(dag_id='crash_scheduler',
>   start_date=datetime(2017,1,1),
>   schedule_interval=None)
> t1 = DummyOperator(task_id='crash',
>pool='a_non_existent_pool',
>dag=dag)
> {code}
> Here is the relevant log output on the scheduler:
> {noformat}
> [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test-3.py finished
> [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test_s3_file_move.py finished
> [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process 
> (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log
> [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process 
> (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py 
> - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log
> [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution:
>  19:31:22.298893 [scheduled]>
> [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in 
> Pool(name=None) with 128 open slots and 1 task instances in queue
> [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has 
> 0/16 running tasks
> [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) to queued
> [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: 
> airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local 
> -sd /opt/airflow/dags/test_s3_file_move.py
> [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor
> [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process 
> manager
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/crash_scheduler.py finished
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/configuration/constants.py finished
> [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process 
> (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log
> [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process 
> (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log
> [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution:
>  [scheduled]>
> [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129
> [2017-04-27 

incubator-airflow git commit: [AIRFLOW-1157] Fix missing pools crashing the scheduler

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 44551e249 -> e6973b159


[AIRFLOW-1157] Fix missing pools crashing the scheduler

Throw a warning when a pool associated with a Task
Instance
doesn't exist instead of crashing the scheduler.
Use the default value of 0 slots for non-existent
pools.

Closes #3002 from iansuvak/1157_nonexistent_pool


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e6973b15
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e6973b15
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e6973b15

Branch: refs/heads/master
Commit: e6973b1596914e5d62567e065223e7b169d1c26c
Parents: 44551e2
Author: Ian Suvak 
Authored: Fri Feb 9 11:04:38 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 11:04:38 2018 +0100

--
 airflow/jobs.py |  9 -
 tests/jobs.py   | 24 
 2 files changed, 32 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e6973b15/airflow/jobs.py
--
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 35a3fb6..00d6b22 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1097,7 +1097,14 @@ class SchedulerJob(BaseJob):
 # non_pooled_task_slot_count per run
 open_slots = conf.getint('core', 'non_pooled_task_slot_count')
 else:
-open_slots = pools[pool].open_slots(session=session)
+if pool not in pools:
+self.log.warning(
+"Tasks using non-existent pool '%s' will not be 
scheduled",
+pool
+)
+open_slots = 0
+else:
+open_slots = pools[pool].open_slots(session=session)
 
 num_queued = len(task_instances)
 self.log.info(

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e6973b15/tests/jobs.py
--
diff --git a/tests/jobs.py b/tests/jobs.py
index 5771bf1..1c87b8f 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -1170,6 +1170,30 @@ class SchedulerJobTest(unittest.TestCase):
 self.assertIn(tis[1].key, res_keys)
 self.assertIn(tis[3].key, res_keys)
 
+def test_nonexistent_pool(self):
+dag_id = 'SchedulerJobTest.test_nonexistent_pool'
+task_id = 'dummy_wrong_pool'
+dag = DAG(dag_id=dag_id, start_date=DEFAULT_DATE, concurrency=16)
+task = DummyOperator(dag=dag, task_id=task_id, 
pool="this_pool_doesnt_exist")
+dagbag = self._make_simple_dag_bag([dag])
+
+scheduler = SchedulerJob(**self.default_scheduler_args)
+session = settings.Session()
+
+dr = scheduler.create_dag_run(dag)
+
+ti = TI(task, dr.execution_date)
+ti.state = State.SCHEDULED
+session.merge(ti)
+session.commit()
+
+res = scheduler._find_executable_task_instances(
+dagbag,
+states=[State.SCHEDULED],
+session=session)
+session.commit()
+self.assertEqual(0, len(res))
+
 def test_find_executable_task_instances_none(self):
 dag_id = 'SchedulerJobTest.test_find_executable_task_instances_none'
 task_id_1 = 'dummy'



[jira] [Commented] (AIRFLOW-713) EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not jinjafied

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358147#comment-16358147
 ] 

ASF subversion and git services commented on AIRFLOW-713:
-

Commit 44551e249fd338f3c4d24ef95d4b9c021f3b0688 in incubator-airflow's branch 
refs/heads/master from [~Swalloow]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=44551e2 ]

[AIRFLOW-713] Jinjafy {EmrCreateJobFlow,EmrAddSteps}Operator attributes

To dynamically templat the fields of the Emr Operators, we need
to pass the fields to jinja

Closes #3016 from Swalloow/emr-jinjafied


> EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not 
> jinjafied
> -
>
> Key: AIRFLOW-713
> URL: https://issues.apache.org/jira/browse/AIRFLOW-713
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: Airflow 2.0
>Reporter: anselmo da silva
>Assignee: Junyoung Park
>Priority: Major
>  Labels: easyfix
> Fix For: 2.0.0
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> (Contrib) EmrCreateJobFlowOperator 'job_flow_overrides'  field) and 
> EmrAddStepsOperatordoes 'steps' field are not being jinjafied.
> EMR jobs definitions that depends on execution context or previous tasks have 
> now way to use macros.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-713) EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not jinjafied

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-713.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   2.0.0

Issue resolved by pull request #3016
[https://github.com/apache/incubator-airflow/pull/3016]

> EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not 
> jinjafied
> -
>
> Key: AIRFLOW-713
> URL: https://issues.apache.org/jira/browse/AIRFLOW-713
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: Airflow 2.0
>Reporter: anselmo da silva
>Assignee: Junyoung Park
>Priority: Major
>  Labels: easyfix
> Fix For: 2.0.0
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> (Contrib) EmrCreateJobFlowOperator 'job_flow_overrides'  field) and 
> EmrAddStepsOperatordoes 'steps' field are not being jinjafied.
> EMR jobs definitions that depends on execution context or previous tasks have 
> now way to use macros.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-713] Jinjafy {EmrCreateJobFlow, EmrAddSteps}Operator attributes

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master fd6772116 -> 44551e249


[AIRFLOW-713] Jinjafy {EmrCreateJobFlow,EmrAddSteps}Operator attributes

To dynamically templat the fields of the Emr Operators, we need
to pass the fields to jinja

Closes #3016 from Swalloow/emr-jinjafied


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/44551e24
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/44551e24
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/44551e24

Branch: refs/heads/master
Commit: 44551e249fd338f3c4d24ef95d4b9c021f3b0688
Parents: fd67721
Author: Swalloow 
Authored: Fri Feb 9 10:20:02 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 10:20:06 2018 +0100

--
 .../operators/emr_create_job_flow_operator.py   |  2 +-
 .../operators/test_emr_add_steps_operator.py| 77 ++
 .../test_emr_create_job_flow_operator.py| 86 
 3 files changed, 134 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/44551e24/airflow/contrib/operators/emr_create_job_flow_operator.py
--
diff --git a/airflow/contrib/operators/emr_create_job_flow_operator.py 
b/airflow/contrib/operators/emr_create_job_flow_operator.py
index 2544adf..8111800 100644
--- a/airflow/contrib/operators/emr_create_job_flow_operator.py
+++ b/airflow/contrib/operators/emr_create_job_flow_operator.py
@@ -29,7 +29,7 @@ class EmrCreateJobFlowOperator(BaseOperator):
 :param job_flow_overrides: boto3 style arguments to override 
emr_connection extra
 :type steps: dict
 """
-template_fields = []
+template_fields = ['job_flow_overrides']
 template_ext = ()
 ui_color = '#f9c915'
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/44551e24/tests/contrib/operators/test_emr_add_steps_operator.py
--
diff --git a/tests/contrib/operators/test_emr_add_steps_operator.py 
b/tests/contrib/operators/test_emr_add_steps_operator.py
index 141e986..e5ac9fe 100644
--- a/tests/contrib/operators/test_emr_add_steps_operator.py
+++ b/tests/contrib/operators/test_emr_add_steps_operator.py
@@ -13,10 +13,16 @@
 # limitations under the License.
 
 import unittest
+from datetime import timedelta
+
 from mock import MagicMock, patch
 
-from airflow import configuration
+from airflow import DAG, configuration
 from airflow.contrib.operators.emr_add_steps_operator import 
EmrAddStepsOperator
+from airflow.models import TaskInstance
+from airflow.utils import timezone
+
+DEFAULT_DATE = timezone.datetime(2017, 1, 1)
 
 ADD_STEPS_SUCCESS_RETURN = {
 'ResponseMetadata': {
@@ -27,30 +33,71 @@ ADD_STEPS_SUCCESS_RETURN = {
 
 
 class TestEmrAddStepsOperator(unittest.TestCase):
+# When
+_config = [{
+'Name': 'test_step',
+'ActionOnFailure': 'CONTINUE',
+'HadoopJarStep': {
+'Jar': 'command-runner.jar',
+'Args': [
+'/usr/lib/spark/bin/run-example',
+'{{ macros.ds_add(ds, -1) }}',
+'{{ ds }}'
+]
+}
+}]
+
 def setUp(self):
 configuration.load_test_config()
+args = {
+'owner': 'airflow',
+'start_date': DEFAULT_DATE
+}
 
 # Mock out the emr_client (moto has incorrect response)
-mock_emr_client = MagicMock()
-mock_emr_client.add_job_flow_steps.return_value = 
ADD_STEPS_SUCCESS_RETURN
+self.emr_client_mock = MagicMock()
+self.operator = EmrAddStepsOperator(
+task_id='test_task',
+job_flow_id='j-8989898989',
+aws_conn_id='aws_default',
+steps=self._config,
+dag=DAG('test_dag_id', default_args=args)
+)
 
-mock_emr_session = MagicMock()
-mock_emr_session.client.return_value = mock_emr_client
+def test_init(self):
+self.assertEqual(self.operator.job_flow_id, 'j-8989898989')
+self.assertEqual(self.operator.aws_conn_id, 'aws_default')
 
-# Mock out the emr_client creator
-self.boto3_session_mock = MagicMock(return_value=mock_emr_session)
+def test_render_template(self):
+ti = TaskInstance(self.operator, DEFAULT_DATE)
+ti.render_templates()
 
+expected_args = [{
+'Name': 'test_step',
+'ActionOnFailure': 'CONTINUE',
+'HadoopJarStep': {
+'Jar': 'command-runner.jar',
+'Args': [
+'/usr/lib/spark/bin/run-example',
+(DEFAULT_DATE - 

[2/2] incubator-airflow git commit: Merge pull request #3019 from hyw/master

2018-02-09 Thread fokko
Merge pull request #3019 from hyw/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/fd677211
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/fd677211
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/fd677211

Branch: refs/heads/master
Commit: fd6772116b1ab4bfd4c3c9f237fad6a7cac654e8
Parents: 15b8a36 822296a
Author: Fokko Driesprong 
Authored: Fri Feb 9 10:15:19 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 10:15:19 2018 +0100

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--




[1/2] incubator-airflow git commit: [AIRFLOW-XXX] add Karmic to list of companies

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 15b8a36b9 -> fd6772116


[AIRFLOW-XXX] add Karmic to list of companies


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/822296af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/822296af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/822296af

Branch: refs/heads/master
Commit: 822296af45282e3fba403e56c0c9c93e0b1a1bc0
Parents: 4751abf
Author: Yang Wang 
Authored: Thu Feb 8 13:38:02 2018 -0800
Committer: Yang Wang 
Committed: Thu Feb 8 13:38:02 2018 -0800

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/822296af/README.md
--
diff --git a/README.md b/README.md
index 77f6410..b22bbfb 100644
--- a/README.md
+++ b/README.md
@@ -145,6 +145,7 @@ Currently **officially** using Airflow:
 1. [Intercom](http://www.intercom.com/) [[@fox](https://github.com/fox) & 
[@paulvic](https://github.com/paulvic)]
 1. [Jampp](https://github.com/jampp)
 1. [JobTeaser](https://www.jobteaser.com) 
[[@stefani75](https://github.com/stefani75) &  
[@knil-sama](https://github.com/knil-sama)]
+1. [Karmic](https://karmiclabs.com) [[@hyw](https://github.com/hyw)]
 1. [Kiwi.com](https://kiwi.com/) [[@underyx](https://github.com/underyx)]
 1. [Kogan.com](https://github.com/kogan) 
[[@geeknam](https://github.com/geeknam)]
 1. [Lemann Foundation](http://fundacaolemann.org.br) 
[[@fernandosjp](https://github.com/fernandosjp)]



[jira] [Resolved] (AIRFLOW-2083) Incorrect usage of "it's" appears throughout the documentation

2018-02-09 Thread Fokko Driesprong (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2083.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3020
[https://github.com/apache/incubator-airflow/pull/3020]

> Incorrect usage of "it's" appears throughout the documentation
> --
>
> Key: AIRFLOW-2083
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2083
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Trivial
> Fix For: 2.0.0
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In several places, the word "it's" appears when it ought to be "its"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2083) Incorrect usage of "it's" appears throughout the documentation

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358138#comment-16358138
 ] 

ASF subversion and git services commented on AIRFLOW-2083:
--

Commit 15b8a36b9011166b06f176f684b71703a4aebddd in incubator-airflow's branch 
refs/heads/master from [~wrp]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=15b8a36 ]

[AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate

Closes #3020 from wrp/spelling


> Incorrect usage of "it's" appears throughout the documentation
> --
>
> Key: AIRFLOW-2083
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2083
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: William Pursell
>Assignee: William Pursell
>Priority: Trivial
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In several places, the word "it's" appears when it ought to be "its"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 2920d0475 -> 15b8a36b9


[AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate

Closes #3020 from wrp/spelling


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/15b8a36b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/15b8a36b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/15b8a36b

Branch: refs/heads/master
Commit: 15b8a36b9011166b06f176f684b71703a4aebddd
Parents: 2920d04
Author: William Pursell 
Authored: Fri Feb 9 10:08:06 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 10:08:06 2018 +0100

--
 airflow/bin/cli.py   |  2 +-
 airflow/contrib/hooks/redshift_hook.py   |  2 +-
 airflow/jobs.py  |  6 +++---
 airflow/ti_deps/deps/trigger_rule_dep.py |  2 +-
 airflow/utils/dates.py   |  2 +-
 airflow/www/views.py | 10 ++
 docs/plugins.rst |  2 +-
 tests/jobs.py|  2 +-
 8 files changed, 15 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/bin/cli.py
--
diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 6bfcdcc..424fcda 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1572,7 +1572,7 @@ class CLIFactory(object):
 'func': test,
 'help': (
 "Test a task instance. This will run a task without checking 
for "
-"dependencies or recording it's state in the database."),
+"dependencies or recording its state in the database."),
 'args': (
 'dag_id', 'task_id', 'execution_date', 'subdir', 'dry_run',
 'task_params'),

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/contrib/hooks/redshift_hook.py
--
diff --git a/airflow/contrib/hooks/redshift_hook.py 
b/airflow/contrib/hooks/redshift_hook.py
index 70a4854..baa11e7 100644
--- a/airflow/contrib/hooks/redshift_hook.py
+++ b/airflow/contrib/hooks/redshift_hook.py
@@ -79,7 +79,7 @@ class RedshiftHook(AwsHook):
 
 def restore_from_cluster_snapshot(self, cluster_identifier, 
snapshot_identifier):
 """
-Restores a cluster from it's snapshot
+Restores a cluster from its snapshot
 
 :param cluster_identifier: unique identifier of a cluster
 :type cluster_identifier: str

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/jobs.py
--
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 172792d..35a3fb6 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -68,7 +68,7 @@ class BaseJob(Base, LoggingMixin):
 """
 Abstract class to be derived for jobs. Jobs are processing items with state
 and duration that aren't task instances. For instance a BackfillJob is
-a collection of task instance runs, but should have it's own state, start
+a collection of task instance runs, but should have its own state, start
 and end time.
 """
 
@@ -1796,8 +1796,8 @@ class SchedulerJob(BaseJob):
 dep_context = DepContext(deps=QUEUE_DEPS, ignore_task_deps=True)
 
 # Only schedule tasks that have their dependencies met, e.g. to 
avoid
-# a task that recently got it's state changed to RUNNING from 
somewhere
-# other than the scheduler from getting it's state overwritten.
+# a task that recently got its state changed to RUNNING from 
somewhere
+# other than the scheduler from getting its state overwritten.
 # TODO(aoen): It's not great that we have to check all the task 
instance
 # dependencies twice; once to get the task scheduled, and again to 
actually
 # run the task. We should try to come up with a way to only check 
them once.

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/ti_deps/deps/trigger_rule_dep.py
--
diff --git a/airflow/ti_deps/deps/trigger_rule_dep.py 
b/airflow/ti_deps/deps/trigger_rule_dep.py
index 5a80314..30a5a13 100644
--- a/airflow/ti_deps/deps/trigger_rule_dep.py
+++ b/airflow/ti_deps/deps/trigger_rule_dep.py
@@ -127,7 +127,7 @@ class TriggerRuleDep(BaseTIDep):
 "total": upstream, "successes": successes, "skipped": skipped,
 "failed": failed, "upstream_failed": upstream_failed, 

[jira] [Commented] (AIRFLOW-2066) Add an operator to create an Empty BigQuery Table

2018-02-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358133#comment-16358133
 ] 

ASF subversion and git services commented on AIRFLOW-2066:
--

Commit 2920d047541c0c410e7db72c7ae81a6ee85bb08c in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2920d04 ]

[AIRFLOW-2066] Add operator to create empty BQ table

- Add operator that creates a new, empty table in
the specified BigQuery dataset, optionally with
schema.

Closes #3006 from kaxil/bq_empty_table_op


> Add an operator to create an Empty BigQuery Table
> -
>
> Key: AIRFLOW-2066
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2066
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> There are currently no operators to create an Empty BigQuery Table 
> with/without schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


incubator-airflow git commit: [AIRFLOW-2066] Add operator to create empty BQ table

2018-02-09 Thread fokko
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 4751abf8a -> 2920d0475


[AIRFLOW-2066] Add operator to create empty BQ table

- Add operator that creates a new, empty table in
the specified BigQuery dataset, optionally with
schema.

Closes #3006 from kaxil/bq_empty_table_op


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2920d047
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2920d047
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2920d047

Branch: refs/heads/master
Commit: 2920d047541c0c410e7db72c7ae81a6ee85bb08c
Parents: 4751abf
Author: Kaxil Naik 
Authored: Fri Feb 9 10:04:18 2018 +0100
Committer: Fokko Driesprong 
Committed: Fri Feb 9 10:04:18 2018 +0100

--
 airflow/contrib/hooks/bigquery_hook.py  |  65 +
 airflow/contrib/hooks/gcs_hook.py   |  22 +++
 airflow/contrib/operators/bigquery_operator.py  | 146 +++
 docs/code.rst   |   1 +
 docs/integration.rst|   8 +
 tests/contrib/hooks/test_gcs_hook.py|  46 ++
 .../contrib/operators/test_bigquery_operator.py |  53 +++
 7 files changed, 341 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2920d047/airflow/contrib/hooks/bigquery_hook.py
--
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index e0dea46..653cb1b 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -207,6 +207,71 @@ class BigQueryBaseCursor(LoggingMixin):
 self.use_legacy_sql = use_legacy_sql
 self.running_job_id = None
 
+def create_empty_table(self,
+   project_id,
+   dataset_id,
+   table_id,
+   schema_fields=None,
+   time_partitioning={}
+   ):
+"""
+Creates a new, empty table in the dataset.
+
+:param project_id: The project to create the table into.
+:type project_id: str
+:param dataset_id: The dataset to create the table into.
+:type dataset_id: str
+:param table_id: The Name of the table to be created.
+:type table_id: str
+:param schema_fields: If set, the schema field list as defined here:
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema
+
+**Example**: ::
+
+schema_fields=[{"name": "emp_name", "type": "STRING", "mode": 
"REQUIRED"},
+   {"name": "salary", "type": "INTEGER", "mode": 
"NULLABLE"}]
+
+:type schema_fields: list
+:param time_partitioning: configure optional time partitioning fields 
i.e.
+partition by field, type and expiration as per API specifications.
+
+.. seealso::
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning
+:type time_partitioning: dict
+
+:return:
+"""
+project_id = project_id if project_id is not None else self.project_id
+
+table_resource = {
+'tableReference': {
+'tableId': table_id
+}
+}
+
+if schema_fields:
+table_resource['schema'] = {'fields': schema_fields}
+
+if time_partitioning:
+table_resource['timePartitioning'] = time_partitioning
+
+self.log.info('Creating Table %s:%s.%s',
+  project_id, dataset_id, table_id)
+
+try:
+self.service.tables().insert(
+projectId=project_id,
+datasetId=dataset_id,
+body=table_resource).execute()
+
+self.log.info('Table created successfully: %s:%s.%s',
+  project_id, dataset_id, table_id)
+
+except HttpError as err:
+raise AirflowException(
+'BigQuery job failed. Error was: {}'.format(err.content)
+)
+
 def create_external_table(self,
   external_project_dataset_table,
   schema_fields,

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2920d047/airflow/contrib/hooks/gcs_hook.py
--
diff --git a/airflow/contrib/hooks/gcs_hook.py 
b/airflow/contrib/hooks/gcs_hook.py
index f959f95..5312daa 100644
--- a/airflow/contrib/hooks/gcs_hook.py
+++ b/airflow/contrib/hooks/gcs_hook.py
@@ -17,6 +17,7 @@ from apiclient.http 

[jira] [Created] (AIRFLOW-2087) Scheduler Report shows incorrect "Total task number"

2018-02-09 Thread I don't want an account (JIRA)
I don't want an account created AIRFLOW-2087:


 Summary: Scheduler Report shows incorrect "Total task number"
 Key: AIRFLOW-2087
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2087
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: Airflow 1.8, 1.9.0
Reporter: I don't want an account


[https://github.com/apache/incubator-airflow/blob/4751abf8acad766cb576ecfe3a333d68cc693b8c/airflow/models.py#L479]
This line is printing the same "Total task number" as "Number of DAGs" in the 
cli tool `airflow list_dags -r`.

E.G. some output:
{{---}}
{{DagBag loading stats for /pang/service/airflow/dags}}
{{---}}
{{Number of DAGs: 1143}}
{{Total task number: 1143}}
{{DagBag parsing time: 24.900074}}
{{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)