[jira] [Created] (AIRFLOW-2097) UnboundLocalError: local variable 'tz' referenced before assignment
Bryce Drennnan created AIRFLOW-2097: --- Summary: UnboundLocalError: local variable 'tz' referenced before assignment Key: AIRFLOW-2097 URL: https://issues.apache.org/jira/browse/AIRFLOW-2097 Project: Apache Airflow Issue Type: Bug Components: utils Reporter: Bryce Drennnan The date_range function references the variable tz before its assigned. I noticed this while running doctests. See this part of the code: [https://github.com/apache/incubator-airflow/blob/15b8a36b9011166b06f176f684b71703a4aebddd/airflow/utils/dates.py#L73-L84] I believe this bug was introduced here: https://github.com/apache/incubator-airflow/commit/518a41acf319af27d49bdc0c84bda64b6b8af0b3#commitcomment-27433613 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2096) SparkSubmitOperator should expose yarn_application_id in the templates
Azhagu Selvan SP created AIRFLOW-2096: - Summary: SparkSubmitOperator should expose yarn_application_id in the templates Key: AIRFLOW-2096 URL: https://issues.apache.org/jira/browse/AIRFLOW-2096 Project: Apache Airflow Issue Type: Improvement Reporter: Azhagu Selvan SP The SparkSubmitHook parses and stores the `application_id` from YARN in the field `yarn_application_id`. It would be very useful if this value is exposed as a template field in the SparkSubmitOperator so that we can build a direct link to the YARN Application master from the Airflow UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2095) Add operator to create External BigQuery Table
Kaxil Naik created AIRFLOW-2095: --- Summary: Add operator to create External BigQuery Table Key: AIRFLOW-2095 URL: https://issues.apache.org/jira/browse/AIRFLOW-2095 Project: Apache Airflow Issue Type: Task Components: contrib, gcp Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 2.0.0 We already have hook to create an External BQ table. This task is to create an operator from that hook. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2094. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3027 [https://github.com/apache/incubator-airflow/pull/3027] > Jinjafy project_id, region & zone in DataProc{*} Operators > -- > > Key: AIRFLOW-2094 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2094 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Reporter: Kaxil Naik >Priority: Minor > Fix For: 1.10.0 > > > The project_id, region, and zone in DataProc{*} Operators are not jinjafied. > If we can do that, we can use Airflow variables to use a default project_id, > region and zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358878#comment-16358878 ] ASF subversion and git services commented on AIRFLOW-2094: -- Commit 556c9ec5ba12973cc0335cd18d375227797ec62f in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=556c9ec ] [AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators - Minor docstring change - Jinjafied project_id, region & zone in DataProc{*} Operators to allow using Airflow variables Closes #3027 from kaxil/patch-3 > Jinjafy project_id, region & zone in DataProc{*} Operators > -- > > Key: AIRFLOW-2094 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2094 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Reporter: Kaxil Naik >Priority: Minor > Fix For: 1.10.0 > > > The project_id, region, and zone in DataProc{*} Operators are not jinjafied. > If we can do that, we can use Airflow variables to use a default project_id, > region and zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators
Repository: incubator-airflow Updated Branches: refs/heads/master bf1296fbd -> 556c9ec5b [AIRFLOW-2094] Jinjafied project_id, region & zone in DataProc{*} Operators - Minor docstring change - Jinjafied project_id, region & zone in DataProc{*} Operators to allow using Airflow variables Closes #3027 from kaxil/patch-3 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/556c9ec5 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/556c9ec5 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/556c9ec5 Branch: refs/heads/master Commit: 556c9ec5ba12973cc0335cd18d375227797ec62f Parents: bf1296f Author: Kaxil NaikAuthored: Fri Feb 9 20:56:53 2018 +0100 Committer: Bolke de Bruin Committed: Fri Feb 9 20:56:53 2018 +0100 -- airflow/contrib/operators/dataproc_operator.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/556c9ec5/airflow/contrib/operators/dataproc_operator.py -- diff --git a/airflow/contrib/operators/dataproc_operator.py b/airflow/contrib/operators/dataproc_operator.py index 3444cc6..ebcc402 100644 --- a/airflow/contrib/operators/dataproc_operator.py +++ b/airflow/contrib/operators/dataproc_operator.py @@ -64,7 +64,7 @@ class DataprocClusterCreateOperator(BaseOperator): :type master_machine_type: string :param master_disk_size: Disk size for the master node :type master_disk_size: int -:param worker_machine_type:Compute engine machine type to use for the worker nodes +:param worker_machine_type: Compute engine machine type to use for the worker nodes :type worker_machine_type: string :param worker_disk_size: Disk size for the worker nodes :type worker_disk_size: int @@ -95,7 +95,7 @@ class DataprocClusterCreateOperator(BaseOperator): :type service_account_scopes: list[string] """ -template_fields = ['cluster_name'] +template_fields = ['cluster_name', 'project_id', 'zone', 'region'] @apply_defaults def __init__(self, @@ -339,7 +339,7 @@ class DataprocClusterDeleteOperator(BaseOperator): :type delegate_to: string """ -template_fields = ['cluster_name'] +template_fields = ['cluster_name', 'project_id', 'region'] @apply_defaults def __init__(self,
[jira] [Created] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators
Kaxil Naik created AIRFLOW-2094: --- Summary: Jinjafy project_id, region & zone in DataProc{*} Operators Key: AIRFLOW-2094 URL: https://issues.apache.org/jira/browse/AIRFLOW-2094 Project: Apache Airflow Issue Type: Task Components: contrib, gcp Reporter: Kaxil Naik The project_id, region, and zone in DataProc{*} Operators are not jinjafied. If we can do that, we can use Airflow variables to use a default project_id, region and zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358757#comment-16358757 ] Ash Berlin-Taylor commented on AIRFLOW-1667: The process that writes to the log files is a sub-process of the celery worker itself – that just invokes {{airflow run --local}} - and that means the flush should happen as soon the task instance finishes running. I do not see this behaivour on Py3/1.9.0 - our tasks appear in S3 when the task instance is finished. Are you saying you have to stop the {{airflow worker}} process for the logs to appear in S3? > Remote log handlers don't upload logs on task finish > > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil >Priority: Major > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358749#comment-16358749 ] Josh Bacon commented on AIRFLOW-1667: - +1 We are using CeleryExecutors and notice that our logs never ship unless we shut down our workers. Flush probably needs to happen on some interval or task event handler. > Remote log handlers don't upload logs on task finish > > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil >Priority: Major > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2093) Feature DagBag loading from S3 (and others)
Bruno Bonagura created AIRFLOW-2093: --- Summary: Feature DagBag loading from S3 (and others) Key: AIRFLOW-2093 URL: https://issues.apache.org/jira/browse/AIRFLOW-2093 Project: Apache Airflow Issue Type: Improvement Components: aws, boto3, DAG, scheduler, worker Reporter: Bruno Bonagura Deploying DAGs is a pain point for CeleryExecutor Airflow infrastructure. Allowing it to be configured to read them directly from remote sources (S3, FTP, HTTP...) not just local FS would enhance a lot this aspect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358343#comment-16358343 ] ASF subversion and git services commented on AIRFLOW-2092: -- Commit bf1296fbd2024135d4300fe9f0e5ce8f0dac1825 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=bf1296f ] [AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook Fixed the datatype of parameter `local_full_path_or_buffer` in `retrieve_file` method for FTPHook Closes #3026 from kaxil/patch-2 > Fixed incorrect parameter in docstring for FTPHook > -- > > Key: AIRFLOW-2092 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2092 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Fixed the datatype of parameter `local_full_path_or_buffer` in > `retrieve_file` method for FTPHook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2092. --- Resolution: Fixed Issue resolved by pull request #3026 [https://github.com/apache/incubator-airflow/pull/3026] > Fixed incorrect parameter in docstring for FTPHook > -- > > Key: AIRFLOW-2092 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2092 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Fixed the datatype of parameter `local_full_path_or_buffer` in > `retrieve_file` method for FTPHook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook
Repository: incubator-airflow Updated Branches: refs/heads/master a1e5a075c -> bf1296fbd [AIRFLOW-2092] Fixed incorrect parameter in docstring for FTPHook Fixed the datatype of parameter `local_full_path_or_buffer` in `retrieve_file` method for FTPHook Closes #3026 from kaxil/patch-2 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/bf1296fb Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/bf1296fb Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/bf1296fb Branch: refs/heads/master Commit: bf1296fbd2024135d4300fe9f0e5ce8f0dac1825 Parents: a1e5a07 Author: Kaxil NaikAuthored: Fri Feb 9 13:45:05 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 13:45:05 2018 +0100 -- airflow/contrib/hooks/ftp_hook.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/bf1296fb/airflow/contrib/hooks/ftp_hook.py -- diff --git a/airflow/contrib/hooks/ftp_hook.py b/airflow/contrib/hooks/ftp_hook.py index b1e224d..bd46ba5 100644 --- a/airflow/contrib/hooks/ftp_hook.py +++ b/airflow/contrib/hooks/ftp_hook.py @@ -154,7 +154,7 @@ class FTPHook(BaseHook, LoggingMixin): :type remote_full_path: str :param local_full_path_or_buffer: full path to the local file or a file-like buffer -:type local_full_path: str or file-like buffer +:type local_full_path_or_buffer: str or file-like buffer """ conn = self.get_conn()
[jira] [Created] (AIRFLOW-2092) Fixed incorrect parameter in docstring for FTPHook
Kaxil Naik created AIRFLOW-2092: --- Summary: Fixed incorrect parameter in docstring for FTPHook Key: AIRFLOW-2092 URL: https://issues.apache.org/jira/browse/AIRFLOW-2092 Project: Apache Airflow Issue Type: Task Components: contrib, Documentation Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 2.0.0 Fixed the datatype of parameter `local_full_path_or_buffer` in `retrieve_file` method for FTPHook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-XXX] Add SocialCops to Airflow users
Repository: incubator-airflow Updated Branches: refs/heads/master 759d8f83e -> a1e5a075c [AIRFLOW-XXX] Add SocialCops to Airflow users Closes #3018 from vinayak-mehta/update_readme Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a1e5a075 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a1e5a075 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a1e5a075 Branch: refs/heads/master Commit: a1e5a075ccbd4abe48c205181e2faed659ab6898 Parents: 759d8f8 Author: Vinayak MehtaAuthored: Fri Feb 9 13:25:56 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 13:26:00 2018 +0100 -- README.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a1e5a075/README.md -- diff --git a/README.md b/README.md index b22bbfb..6528ecc 100644 --- a/README.md +++ b/README.md @@ -189,6 +189,7 @@ Currently **officially** using Airflow: 1. [Sidecar](https://hello.getsidecar.com/) [[@getsidecar](https://github.com/getsidecar)] 1. [SimilarWeb](https://www.similarweb.com/) [[@similarweb](https://github.com/similarweb)] 1. [SmartNews](https://www.smartnews.com/) [[@takus](https://github.com/takus)] +1. [SocialCops](https://www.socialcops.com/) [[@vinayak-mehta](https://github.com/vinayak-mehta) & [@sharky93](https://github.com/sharky93)] 1. [Spotify](https://github.com/spotify) [[@znichols](https://github.com/znichols)] 1. [Stackspace](https://beta.stackspace.io/) 1. [Stripe](https://stripe.com) [[@jbalogh](https://github.com/jbalogh)]
[jira] [Resolved] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2088. --- Resolution: Fixed Issue resolved by pull request #3022 [https://github.com/apache/incubator-airflow/pull/3022] > Duplicate keys in MySQL to GCS Operator > --- > > Key: AIRFLOW-2088 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2088 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Affects Versions: 1.9.1 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > The helper method `type_map` in `mysql_to_gcs` operator contains duplicate > key "FIELD_TYPE.INT24". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358329#comment-16358329 ] ASF subversion and git services commented on AIRFLOW-2088: -- Commit 759d8f83e79f7d22a5cfca93205ebaf1ce30a5ad in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=759d8f8 ] [AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function - Remove the duplicate key in `MySqlToGoogleCloudStorageOperator` in the `type_map` helper function that maps from MySQL fields to BigQuery fields. Closes #3022 from kaxil/duplicate-keys-fix > Duplicate keys in MySQL to GCS Operator > --- > > Key: AIRFLOW-2088 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2088 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Affects Versions: 1.9.1 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > The helper method `type_map` in `mysql_to_gcs` operator contains duplicate > key "FIELD_TYPE.INT24". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring
[ https://issues.apache.org/jira/browse/AIRFLOW-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358326#comment-16358326 ] ASF subversion and git services commented on AIRFLOW-2091: -- Commit 0a71370d0541da48eb1d4ffc5aa2f5d3a4be1e59 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0a71370 ] [AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook - Instead of `seq_of_parameters`, the docstring contains `parameters`. Fixed this Closes #3025 from kaxil/fix-parameter > Fix incorrect parameter in BigQueryCursor docstring > --- > > Key: AIRFLOW-2091 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2091 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation, gcp >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Fixed docstring in BigQuery Cursor. It contains incorrect docstring of > `seq_of_parameters` as `parameters` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function
Repository: incubator-airflow Updated Branches: refs/heads/master 0a71370d0 -> 759d8f83e [AIRFLOW-2088] Fix duplicate keys in MySQL to GCS Helper function - Remove the duplicate key in `MySqlToGoogleCloudStorageOperator` in the `type_map` helper function that maps from MySQL fields to BigQuery fields. Closes #3022 from kaxil/duplicate-keys-fix Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/759d8f83 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/759d8f83 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/759d8f83 Branch: refs/heads/master Commit: 759d8f83e79f7d22a5cfca93205ebaf1ce30a5ad Parents: 0a71370 Author: Kaxil NaikAuthored: Fri Feb 9 13:24:21 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 13:24:21 2018 +0100 -- airflow/contrib/operators/mysql_to_gcs.py | 1 - 1 file changed, 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/759d8f83/airflow/contrib/operators/mysql_to_gcs.py -- diff --git a/airflow/contrib/operators/mysql_to_gcs.py b/airflow/contrib/operators/mysql_to_gcs.py index 784481d..41e23f5 100644 --- a/airflow/contrib/operators/mysql_to_gcs.py +++ b/airflow/contrib/operators/mysql_to_gcs.py @@ -214,7 +214,6 @@ class MySqlToGoogleCloudStorageOperator(BaseOperator): when a schema_filename is set. """ d = { -FIELD_TYPE.INT24: 'INTEGER', FIELD_TYPE.TINY: 'INTEGER', FIELD_TYPE.BIT: 'INTEGER', FIELD_TYPE.DATETIME: 'TIMESTAMP',
[jira] [Resolved] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring
[ https://issues.apache.org/jira/browse/AIRFLOW-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2091. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3025 [https://github.com/apache/incubator-airflow/pull/3025] > Fix incorrect parameter in BigQueryCursor docstring > --- > > Key: AIRFLOW-2091 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2091 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation, gcp >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Fixed docstring in BigQuery Cursor. It contains incorrect docstring of > `seq_of_parameters` as `parameters` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook
Repository: incubator-airflow Updated Branches: refs/heads/master 4c7ae420a -> 0a71370d0 [AIRFLOW-2091] Fix incorrect docstring parameter in BigQuery Hook - Instead of `seq_of_parameters`, the docstring contains `parameters`. Fixed this Closes #3025 from kaxil/fix-parameter Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0a71370d Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0a71370d Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0a71370d Branch: refs/heads/master Commit: 0a71370d0541da48eb1d4ffc5aa2f5d3a4be1e59 Parents: 4c7ae42 Author: Kaxil NaikAuthored: Fri Feb 9 13:23:31 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 13:23:31 2018 +0100 -- airflow/contrib/hooks/bigquery_hook.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0a71370d/airflow/contrib/hooks/bigquery_hook.py -- diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index 653cb1b..dca4d33 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -1256,9 +1256,9 @@ class BigQueryCursor(BigQueryBaseCursor): :param operation: The query to execute. :type operation: string -:param parameters: List of dictionary parameters to substitute into the +:param seq_of_parameters: List of dictionary parameters to substitute into the query. -:type parameters: list +:type seq_of_parameters: list """ for parameters in seq_of_parameters: self.execute(operation, parameters)
[jira] [Commented] (AIRFLOW-2086) The tree view page is too slow when display big dag.
[ https://issues.apache.org/jira/browse/AIRFLOW-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358307#comment-16358307 ] Andrew Maguire commented on AIRFLOW-2086: - I found this too. I also found that for big dags with lots of tasks many of the charts become unusable due to the legend becoming so big (one could argue legend not really needed with lots of tasks as you can just hover over interesting stuff to see what it is). So it would be cool if some of the UI stuff could be configured for defaults etc in some way. > The tree view page is too slow when display big dag. > > > Key: AIRFLOW-2086 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2086 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Lintao LI >Priority: Major > > The tree view page is too slow for big(actually not too big) dag. > The page size will increase dramatically to hundreds of MB. > please refer to > [here|https://stackoverflow.com/questions/48656221/apache-airflow-webui-tree-view-is-too-slow] > for details. > I think the page contains a lot of redundant data. it's a bug or a flaw of > design. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2091) Fix incorrect parameter in BigQueryCursor docstring
Kaxil Naik created AIRFLOW-2091: --- Summary: Fix incorrect parameter in BigQueryCursor docstring Key: AIRFLOW-2091 URL: https://issues.apache.org/jira/browse/AIRFLOW-2091 Project: Apache Airflow Issue Type: Task Components: contrib, Documentation, gcp Reporter: Kaxil Naik Assignee: Kaxil Naik Fixed docstring in BigQuery Cursor. It contains incorrect docstring of `seq_of_parameters` as `parameters` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2090) Fix typo in DataStore Hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358281#comment-16358281 ] ASF subversion and git services commented on AIRFLOW-2090: -- Commit 4c7ae420a7ff48819d90a278a144d4d91820bc37 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4c7ae42 ] [AIRFLOW-2090] Fix typo in DataStore Hook Spelling mistake in the word `simultaneously` in DataStore docs Closes #3024 from kaxil/patch-1 > Fix typo in DataStore Hook > -- > > Key: AIRFLOW-2090 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2090 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation, gcp >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Spelling mistake in the word `simultaneously` in DataStore docs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2090) Fix typo in DataStore Hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2090. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3024 [https://github.com/apache/incubator-airflow/pull/3024] > Fix typo in DataStore Hook > -- > > Key: AIRFLOW-2090 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2090 > Project: Apache Airflow > Issue Type: Task > Components: contrib, Documentation, gcp >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Trivial > Fix For: 2.0.0 > > > Spelling mistake in the word `simultaneously` in DataStore docs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2090] Fix typo in DataStore Hook
Repository: incubator-airflow Updated Branches: refs/heads/master e6973b159 -> 4c7ae420a [AIRFLOW-2090] Fix typo in DataStore Hook Spelling mistake in the word `simultaneously` in DataStore docs Closes #3024 from kaxil/patch-1 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4c7ae420 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4c7ae420 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4c7ae420 Branch: refs/heads/master Commit: 4c7ae420a7ff48819d90a278a144d4d91820bc37 Parents: e6973b1 Author: Kaxil NaikAuthored: Fri Feb 9 12:26:02 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 12:26:02 2018 +0100 -- airflow/contrib/hooks/datastore_hook.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4c7ae420/airflow/contrib/hooks/datastore_hook.py -- diff --git a/airflow/contrib/hooks/datastore_hook.py b/airflow/contrib/hooks/datastore_hook.py index ba690e0..a71333e 100644 --- a/airflow/contrib/hooks/datastore_hook.py +++ b/airflow/contrib/hooks/datastore_hook.py @@ -25,7 +25,7 @@ class DatastoreHook(GoogleCloudBaseHook): connection. This object is not threads safe. If you want to make multiple requests -simultaniously, you will need to create a hook per thread. +simultaneously, you will need to create a hook per thread. """ def __init__(self,
[jira] [Created] (AIRFLOW-2090) Fix typo in DataStore Hook
Kaxil Naik created AIRFLOW-2090: --- Summary: Fix typo in DataStore Hook Key: AIRFLOW-2090 URL: https://issues.apache.org/jira/browse/AIRFLOW-2090 Project: Apache Airflow Issue Type: Task Components: contrib, Documentation, gcp Reporter: Kaxil Naik Assignee: Kaxil Naik Spelling mistake in the word `simultaneously` in DataStore docs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2089) Add on kill for SparkSubmit in Standalone Cluster
Milan van der Meer created AIRFLOW-2089: --- Summary: Add on kill for SparkSubmit in Standalone Cluster Key: AIRFLOW-2089 URL: https://issues.apache.org/jira/browse/AIRFLOW-2089 Project: Apache Airflow Issue Type: Improvement Reporter: Milan van der Meer Assignee: Milan van der Meer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2088) Duplicate keys in MySQL to GCS Operator
Kaxil Naik created AIRFLOW-2088: --- Summary: Duplicate keys in MySQL to GCS Operator Key: AIRFLOW-2088 URL: https://issues.apache.org/jira/browse/AIRFLOW-2088 Project: Apache Airflow Issue Type: Task Components: contrib, gcp Affects Versions: 1.9.1 Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 2.0.0 The helper method `type_map` in `mysql_to_gcs` operator contains duplicate key "FIELD_TYPE.INT24". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler
[ https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-1157. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3002 [https://github.com/apache/incubator-airflow/pull/3002] > Assigning a task to a pool that doesn't exist crashes the scheduler > --- > > Key: AIRFLOW-1157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1157 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 >Reporter: John Culver >Assignee: David Klosowski >Priority: Critical > Fix For: 2.0.0 > > > If a dag is run that contains a task using a pool that doesn't exist, the > scheduler will crash. > Manually triggering the run of this dag on an environment without a pool > named 'a_non_existent_pool' will crash the scheduler: > {code} > from datetime import datetime > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > dag = DAG(dag_id='crash_scheduler', > start_date=datetime(2017,1,1), > schedule_interval=None) > t1 = DummyOperator(task_id='crash', >pool='a_non_existent_pool', >dag=dag) > {code} > Here is the relevant log output on the scheduler: > {noformat} > [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test-3.py finished > [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test_s3_file_move.py finished > [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process > (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log > [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process > (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py > - logging into > /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log > [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution: > 19:31:22.298893 [scheduled]> > [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in > Pool(name=None) with 128 open slots and 1 task instances in queue > [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has > 0/16 running tasks > [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) to queued > [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: > airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local > -sd /opt/airflow/dags/test_s3_file_move.py > [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor > [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process > manager > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/crash_scheduler.py finished > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/configuration/constants.py finished > [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process > (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log > [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process > (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into > /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log > [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution: > [scheduled]> > [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128 > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129 > [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for > processes to exit... > Traceback (most recent call last): > File "/usr/bin/airflow", line 28, in > args.func(args) > File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in > scheduler > job.run() > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in
[jira] [Commented] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler
[ https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358187#comment-16358187 ] ASF subversion and git services commented on AIRFLOW-1157: -- Commit e6973b1596914e5d62567e065223e7b169d1c26c in incubator-airflow's branch refs/heads/master from Ian Suvak [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e6973b1 ] [AIRFLOW-1157] Fix missing pools crashing the scheduler Throw a warning when a pool associated with a Task Instance doesn't exist instead of crashing the scheduler. Use the default value of 0 slots for non-existent pools. Closes #3002 from iansuvak/1157_nonexistent_pool > Assigning a task to a pool that doesn't exist crashes the scheduler > --- > > Key: AIRFLOW-1157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1157 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 >Reporter: John Culver >Assignee: David Klosowski >Priority: Critical > > If a dag is run that contains a task using a pool that doesn't exist, the > scheduler will crash. > Manually triggering the run of this dag on an environment without a pool > named 'a_non_existent_pool' will crash the scheduler: > {code} > from datetime import datetime > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > dag = DAG(dag_id='crash_scheduler', > start_date=datetime(2017,1,1), > schedule_interval=None) > t1 = DummyOperator(task_id='crash', >pool='a_non_existent_pool', >dag=dag) > {code} > Here is the relevant log output on the scheduler: > {noformat} > [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test-3.py finished > [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test_s3_file_move.py finished > [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process > (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log > [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process > (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py > - logging into > /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log > [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution: > 19:31:22.298893 [scheduled]> > [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in > Pool(name=None) with 128 open slots and 1 task instances in queue > [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has > 0/16 running tasks > [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) to queued > [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: > airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local > -sd /opt/airflow/dags/test_s3_file_move.py > [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor > [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process > manager > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/crash_scheduler.py finished > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/configuration/constants.py finished > [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process > (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log > [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process > (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into > /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log > [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution: > [scheduled]> > [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128 > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129 > [2017-04-27
incubator-airflow git commit: [AIRFLOW-1157] Fix missing pools crashing the scheduler
Repository: incubator-airflow Updated Branches: refs/heads/master 44551e249 -> e6973b159 [AIRFLOW-1157] Fix missing pools crashing the scheduler Throw a warning when a pool associated with a Task Instance doesn't exist instead of crashing the scheduler. Use the default value of 0 slots for non-existent pools. Closes #3002 from iansuvak/1157_nonexistent_pool Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e6973b15 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e6973b15 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e6973b15 Branch: refs/heads/master Commit: e6973b1596914e5d62567e065223e7b169d1c26c Parents: 44551e2 Author: Ian SuvakAuthored: Fri Feb 9 11:04:38 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 11:04:38 2018 +0100 -- airflow/jobs.py | 9 - tests/jobs.py | 24 2 files changed, 32 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e6973b15/airflow/jobs.py -- diff --git a/airflow/jobs.py b/airflow/jobs.py index 35a3fb6..00d6b22 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -1097,7 +1097,14 @@ class SchedulerJob(BaseJob): # non_pooled_task_slot_count per run open_slots = conf.getint('core', 'non_pooled_task_slot_count') else: -open_slots = pools[pool].open_slots(session=session) +if pool not in pools: +self.log.warning( +"Tasks using non-existent pool '%s' will not be scheduled", +pool +) +open_slots = 0 +else: +open_slots = pools[pool].open_slots(session=session) num_queued = len(task_instances) self.log.info( http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e6973b15/tests/jobs.py -- diff --git a/tests/jobs.py b/tests/jobs.py index 5771bf1..1c87b8f 100644 --- a/tests/jobs.py +++ b/tests/jobs.py @@ -1170,6 +1170,30 @@ class SchedulerJobTest(unittest.TestCase): self.assertIn(tis[1].key, res_keys) self.assertIn(tis[3].key, res_keys) +def test_nonexistent_pool(self): +dag_id = 'SchedulerJobTest.test_nonexistent_pool' +task_id = 'dummy_wrong_pool' +dag = DAG(dag_id=dag_id, start_date=DEFAULT_DATE, concurrency=16) +task = DummyOperator(dag=dag, task_id=task_id, pool="this_pool_doesnt_exist") +dagbag = self._make_simple_dag_bag([dag]) + +scheduler = SchedulerJob(**self.default_scheduler_args) +session = settings.Session() + +dr = scheduler.create_dag_run(dag) + +ti = TI(task, dr.execution_date) +ti.state = State.SCHEDULED +session.merge(ti) +session.commit() + +res = scheduler._find_executable_task_instances( +dagbag, +states=[State.SCHEDULED], +session=session) +session.commit() +self.assertEqual(0, len(res)) + def test_find_executable_task_instances_none(self): dag_id = 'SchedulerJobTest.test_find_executable_task_instances_none' task_id_1 = 'dummy'
[jira] [Commented] (AIRFLOW-713) EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not jinjafied
[ https://issues.apache.org/jira/browse/AIRFLOW-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358147#comment-16358147 ] ASF subversion and git services commented on AIRFLOW-713: - Commit 44551e249fd338f3c4d24ef95d4b9c021f3b0688 in incubator-airflow's branch refs/heads/master from [~Swalloow] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=44551e2 ] [AIRFLOW-713] Jinjafy {EmrCreateJobFlow,EmrAddSteps}Operator attributes To dynamically templat the fields of the Emr Operators, we need to pass the fields to jinja Closes #3016 from Swalloow/emr-jinjafied > EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not > jinjafied > - > > Key: AIRFLOW-713 > URL: https://issues.apache.org/jira/browse/AIRFLOW-713 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: Airflow 2.0 >Reporter: anselmo da silva >Assignee: Junyoung Park >Priority: Major > Labels: easyfix > Fix For: 2.0.0 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > (Contrib) EmrCreateJobFlowOperator 'job_flow_overrides' field) and > EmrAddStepsOperatordoes 'steps' field are not being jinjafied. > EMR jobs definitions that depends on execution context or previous tasks have > now way to use macros. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-713) EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not jinjafied
[ https://issues.apache.org/jira/browse/AIRFLOW-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-713. -- Resolution: Fixed Fix Version/s: (was: Airflow 2.0) 2.0.0 Issue resolved by pull request #3016 [https://github.com/apache/incubator-airflow/pull/3016] > EmrCreateJobFlowOperator and EmrAddStepsOperatordoes attributes are not > jinjafied > - > > Key: AIRFLOW-713 > URL: https://issues.apache.org/jira/browse/AIRFLOW-713 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: Airflow 2.0 >Reporter: anselmo da silva >Assignee: Junyoung Park >Priority: Major > Labels: easyfix > Fix For: 2.0.0 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > (Contrib) EmrCreateJobFlowOperator 'job_flow_overrides' field) and > EmrAddStepsOperatordoes 'steps' field are not being jinjafied. > EMR jobs definitions that depends on execution context or previous tasks have > now way to use macros. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-713] Jinjafy {EmrCreateJobFlow, EmrAddSteps}Operator attributes
Repository: incubator-airflow Updated Branches: refs/heads/master fd6772116 -> 44551e249 [AIRFLOW-713] Jinjafy {EmrCreateJobFlow,EmrAddSteps}Operator attributes To dynamically templat the fields of the Emr Operators, we need to pass the fields to jinja Closes #3016 from Swalloow/emr-jinjafied Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/44551e24 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/44551e24 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/44551e24 Branch: refs/heads/master Commit: 44551e249fd338f3c4d24ef95d4b9c021f3b0688 Parents: fd67721 Author: SwalloowAuthored: Fri Feb 9 10:20:02 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 10:20:06 2018 +0100 -- .../operators/emr_create_job_flow_operator.py | 2 +- .../operators/test_emr_add_steps_operator.py| 77 ++ .../test_emr_create_job_flow_operator.py| 86 3 files changed, 134 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/44551e24/airflow/contrib/operators/emr_create_job_flow_operator.py -- diff --git a/airflow/contrib/operators/emr_create_job_flow_operator.py b/airflow/contrib/operators/emr_create_job_flow_operator.py index 2544adf..8111800 100644 --- a/airflow/contrib/operators/emr_create_job_flow_operator.py +++ b/airflow/contrib/operators/emr_create_job_flow_operator.py @@ -29,7 +29,7 @@ class EmrCreateJobFlowOperator(BaseOperator): :param job_flow_overrides: boto3 style arguments to override emr_connection extra :type steps: dict """ -template_fields = [] +template_fields = ['job_flow_overrides'] template_ext = () ui_color = '#f9c915' http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/44551e24/tests/contrib/operators/test_emr_add_steps_operator.py -- diff --git a/tests/contrib/operators/test_emr_add_steps_operator.py b/tests/contrib/operators/test_emr_add_steps_operator.py index 141e986..e5ac9fe 100644 --- a/tests/contrib/operators/test_emr_add_steps_operator.py +++ b/tests/contrib/operators/test_emr_add_steps_operator.py @@ -13,10 +13,16 @@ # limitations under the License. import unittest +from datetime import timedelta + from mock import MagicMock, patch -from airflow import configuration +from airflow import DAG, configuration from airflow.contrib.operators.emr_add_steps_operator import EmrAddStepsOperator +from airflow.models import TaskInstance +from airflow.utils import timezone + +DEFAULT_DATE = timezone.datetime(2017, 1, 1) ADD_STEPS_SUCCESS_RETURN = { 'ResponseMetadata': { @@ -27,30 +33,71 @@ ADD_STEPS_SUCCESS_RETURN = { class TestEmrAddStepsOperator(unittest.TestCase): +# When +_config = [{ +'Name': 'test_step', +'ActionOnFailure': 'CONTINUE', +'HadoopJarStep': { +'Jar': 'command-runner.jar', +'Args': [ +'/usr/lib/spark/bin/run-example', +'{{ macros.ds_add(ds, -1) }}', +'{{ ds }}' +] +} +}] + def setUp(self): configuration.load_test_config() +args = { +'owner': 'airflow', +'start_date': DEFAULT_DATE +} # Mock out the emr_client (moto has incorrect response) -mock_emr_client = MagicMock() -mock_emr_client.add_job_flow_steps.return_value = ADD_STEPS_SUCCESS_RETURN +self.emr_client_mock = MagicMock() +self.operator = EmrAddStepsOperator( +task_id='test_task', +job_flow_id='j-8989898989', +aws_conn_id='aws_default', +steps=self._config, +dag=DAG('test_dag_id', default_args=args) +) -mock_emr_session = MagicMock() -mock_emr_session.client.return_value = mock_emr_client +def test_init(self): +self.assertEqual(self.operator.job_flow_id, 'j-8989898989') +self.assertEqual(self.operator.aws_conn_id, 'aws_default') -# Mock out the emr_client creator -self.boto3_session_mock = MagicMock(return_value=mock_emr_session) +def test_render_template(self): +ti = TaskInstance(self.operator, DEFAULT_DATE) +ti.render_templates() +expected_args = [{ +'Name': 'test_step', +'ActionOnFailure': 'CONTINUE', +'HadoopJarStep': { +'Jar': 'command-runner.jar', +'Args': [ +'/usr/lib/spark/bin/run-example', +(DEFAULT_DATE -
[2/2] incubator-airflow git commit: Merge pull request #3019 from hyw/master
Merge pull request #3019 from hyw/master Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/fd677211 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/fd677211 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/fd677211 Branch: refs/heads/master Commit: fd6772116b1ab4bfd4c3c9f237fad6a7cac654e8 Parents: 15b8a36 822296a Author: Fokko DriesprongAuthored: Fri Feb 9 10:15:19 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 10:15:19 2018 +0100 -- README.md | 1 + 1 file changed, 1 insertion(+) --
[1/2] incubator-airflow git commit: [AIRFLOW-XXX] add Karmic to list of companies
Repository: incubator-airflow Updated Branches: refs/heads/master 15b8a36b9 -> fd6772116 [AIRFLOW-XXX] add Karmic to list of companies Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/822296af Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/822296af Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/822296af Branch: refs/heads/master Commit: 822296af45282e3fba403e56c0c9c93e0b1a1bc0 Parents: 4751abf Author: Yang WangAuthored: Thu Feb 8 13:38:02 2018 -0800 Committer: Yang Wang Committed: Thu Feb 8 13:38:02 2018 -0800 -- README.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/822296af/README.md -- diff --git a/README.md b/README.md index 77f6410..b22bbfb 100644 --- a/README.md +++ b/README.md @@ -145,6 +145,7 @@ Currently **officially** using Airflow: 1. [Intercom](http://www.intercom.com/) [[@fox](https://github.com/fox) & [@paulvic](https://github.com/paulvic)] 1. [Jampp](https://github.com/jampp) 1. [JobTeaser](https://www.jobteaser.com) [[@stefani75](https://github.com/stefani75) & [@knil-sama](https://github.com/knil-sama)] +1. [Karmic](https://karmiclabs.com) [[@hyw](https://github.com/hyw)] 1. [Kiwi.com](https://kiwi.com/) [[@underyx](https://github.com/underyx)] 1. [Kogan.com](https://github.com/kogan) [[@geeknam](https://github.com/geeknam)] 1. [Lemann Foundation](http://fundacaolemann.org.br) [[@fernandosjp](https://github.com/fernandosjp)]
[jira] [Resolved] (AIRFLOW-2083) Incorrect usage of "it's" appears throughout the documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2083. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3020 [https://github.com/apache/incubator-airflow/pull/3020] > Incorrect usage of "it's" appears throughout the documentation > -- > > Key: AIRFLOW-2083 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2083 > Project: Apache Airflow > Issue Type: Bug >Reporter: William Pursell >Assignee: William Pursell >Priority: Trivial > Fix For: 2.0.0 > > Original Estimate: 5m > Remaining Estimate: 5m > > In several places, the word "it's" appears when it ought to be "its" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2083) Incorrect usage of "it's" appears throughout the documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358138#comment-16358138 ] ASF subversion and git services commented on AIRFLOW-2083: -- Commit 15b8a36b9011166b06f176f684b71703a4aebddd in incubator-airflow's branch refs/heads/master from [~wrp] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=15b8a36 ] [AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate Closes #3020 from wrp/spelling > Incorrect usage of "it's" appears throughout the documentation > -- > > Key: AIRFLOW-2083 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2083 > Project: Apache Airflow > Issue Type: Bug >Reporter: William Pursell >Assignee: William Pursell >Priority: Trivial > Original Estimate: 5m > Remaining Estimate: 5m > > In several places, the word "it's" appears when it ought to be "its" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate
Repository: incubator-airflow Updated Branches: refs/heads/master 2920d0475 -> 15b8a36b9 [AIRFLOW-2083] Docs: Use "its" instead of "it's" where appropriate Closes #3020 from wrp/spelling Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/15b8a36b Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/15b8a36b Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/15b8a36b Branch: refs/heads/master Commit: 15b8a36b9011166b06f176f684b71703a4aebddd Parents: 2920d04 Author: William PursellAuthored: Fri Feb 9 10:08:06 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 10:08:06 2018 +0100 -- airflow/bin/cli.py | 2 +- airflow/contrib/hooks/redshift_hook.py | 2 +- airflow/jobs.py | 6 +++--- airflow/ti_deps/deps/trigger_rule_dep.py | 2 +- airflow/utils/dates.py | 2 +- airflow/www/views.py | 10 ++ docs/plugins.rst | 2 +- tests/jobs.py| 2 +- 8 files changed, 15 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/bin/cli.py -- diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 6bfcdcc..424fcda 100755 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -1572,7 +1572,7 @@ class CLIFactory(object): 'func': test, 'help': ( "Test a task instance. This will run a task without checking for " -"dependencies or recording it's state in the database."), +"dependencies or recording its state in the database."), 'args': ( 'dag_id', 'task_id', 'execution_date', 'subdir', 'dry_run', 'task_params'), http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/contrib/hooks/redshift_hook.py -- diff --git a/airflow/contrib/hooks/redshift_hook.py b/airflow/contrib/hooks/redshift_hook.py index 70a4854..baa11e7 100644 --- a/airflow/contrib/hooks/redshift_hook.py +++ b/airflow/contrib/hooks/redshift_hook.py @@ -79,7 +79,7 @@ class RedshiftHook(AwsHook): def restore_from_cluster_snapshot(self, cluster_identifier, snapshot_identifier): """ -Restores a cluster from it's snapshot +Restores a cluster from its snapshot :param cluster_identifier: unique identifier of a cluster :type cluster_identifier: str http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/jobs.py -- diff --git a/airflow/jobs.py b/airflow/jobs.py index 172792d..35a3fb6 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -68,7 +68,7 @@ class BaseJob(Base, LoggingMixin): """ Abstract class to be derived for jobs. Jobs are processing items with state and duration that aren't task instances. For instance a BackfillJob is -a collection of task instance runs, but should have it's own state, start +a collection of task instance runs, but should have its own state, start and end time. """ @@ -1796,8 +1796,8 @@ class SchedulerJob(BaseJob): dep_context = DepContext(deps=QUEUE_DEPS, ignore_task_deps=True) # Only schedule tasks that have their dependencies met, e.g. to avoid -# a task that recently got it's state changed to RUNNING from somewhere -# other than the scheduler from getting it's state overwritten. +# a task that recently got its state changed to RUNNING from somewhere +# other than the scheduler from getting its state overwritten. # TODO(aoen): It's not great that we have to check all the task instance # dependencies twice; once to get the task scheduled, and again to actually # run the task. We should try to come up with a way to only check them once. http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15b8a36b/airflow/ti_deps/deps/trigger_rule_dep.py -- diff --git a/airflow/ti_deps/deps/trigger_rule_dep.py b/airflow/ti_deps/deps/trigger_rule_dep.py index 5a80314..30a5a13 100644 --- a/airflow/ti_deps/deps/trigger_rule_dep.py +++ b/airflow/ti_deps/deps/trigger_rule_dep.py @@ -127,7 +127,7 @@ class TriggerRuleDep(BaseTIDep): "total": upstream, "successes": successes, "skipped": skipped, "failed": failed, "upstream_failed": upstream_failed,
[jira] [Commented] (AIRFLOW-2066) Add an operator to create an Empty BigQuery Table
[ https://issues.apache.org/jira/browse/AIRFLOW-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358133#comment-16358133 ] ASF subversion and git services commented on AIRFLOW-2066: -- Commit 2920d047541c0c410e7db72c7ae81a6ee85bb08c in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2920d04 ] [AIRFLOW-2066] Add operator to create empty BQ table - Add operator that creates a new, empty table in the specified BigQuery dataset, optionally with schema. Closes #3006 from kaxil/bq_empty_table_op > Add an operator to create an Empty BigQuery Table > - > > Key: AIRFLOW-2066 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2066 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Affects Versions: 2.0.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > There are currently no operators to create an Empty BigQuery Table > with/without schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2066] Add operator to create empty BQ table
Repository: incubator-airflow Updated Branches: refs/heads/master 4751abf8a -> 2920d0475 [AIRFLOW-2066] Add operator to create empty BQ table - Add operator that creates a new, empty table in the specified BigQuery dataset, optionally with schema. Closes #3006 from kaxil/bq_empty_table_op Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2920d047 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2920d047 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2920d047 Branch: refs/heads/master Commit: 2920d047541c0c410e7db72c7ae81a6ee85bb08c Parents: 4751abf Author: Kaxil NaikAuthored: Fri Feb 9 10:04:18 2018 +0100 Committer: Fokko Driesprong Committed: Fri Feb 9 10:04:18 2018 +0100 -- airflow/contrib/hooks/bigquery_hook.py | 65 + airflow/contrib/hooks/gcs_hook.py | 22 +++ airflow/contrib/operators/bigquery_operator.py | 146 +++ docs/code.rst | 1 + docs/integration.rst| 8 + tests/contrib/hooks/test_gcs_hook.py| 46 ++ .../contrib/operators/test_bigquery_operator.py | 53 +++ 7 files changed, 341 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2920d047/airflow/contrib/hooks/bigquery_hook.py -- diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index e0dea46..653cb1b 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -207,6 +207,71 @@ class BigQueryBaseCursor(LoggingMixin): self.use_legacy_sql = use_legacy_sql self.running_job_id = None +def create_empty_table(self, + project_id, + dataset_id, + table_id, + schema_fields=None, + time_partitioning={} + ): +""" +Creates a new, empty table in the dataset. + +:param project_id: The project to create the table into. +:type project_id: str +:param dataset_id: The dataset to create the table into. +:type dataset_id: str +:param table_id: The Name of the table to be created. +:type table_id: str +:param schema_fields: If set, the schema field list as defined here: + https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema + +**Example**: :: + +schema_fields=[{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"}, + {"name": "salary", "type": "INTEGER", "mode": "NULLABLE"}] + +:type schema_fields: list +:param time_partitioning: configure optional time partitioning fields i.e. +partition by field, type and expiration as per API specifications. + +.. seealso:: + https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning +:type time_partitioning: dict + +:return: +""" +project_id = project_id if project_id is not None else self.project_id + +table_resource = { +'tableReference': { +'tableId': table_id +} +} + +if schema_fields: +table_resource['schema'] = {'fields': schema_fields} + +if time_partitioning: +table_resource['timePartitioning'] = time_partitioning + +self.log.info('Creating Table %s:%s.%s', + project_id, dataset_id, table_id) + +try: +self.service.tables().insert( +projectId=project_id, +datasetId=dataset_id, +body=table_resource).execute() + +self.log.info('Table created successfully: %s:%s.%s', + project_id, dataset_id, table_id) + +except HttpError as err: +raise AirflowException( +'BigQuery job failed. Error was: {}'.format(err.content) +) + def create_external_table(self, external_project_dataset_table, schema_fields, http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2920d047/airflow/contrib/hooks/gcs_hook.py -- diff --git a/airflow/contrib/hooks/gcs_hook.py b/airflow/contrib/hooks/gcs_hook.py index f959f95..5312daa 100644 --- a/airflow/contrib/hooks/gcs_hook.py +++ b/airflow/contrib/hooks/gcs_hook.py @@ -17,6 +17,7 @@ from apiclient.http
[jira] [Created] (AIRFLOW-2087) Scheduler Report shows incorrect "Total task number"
I don't want an account created AIRFLOW-2087: Summary: Scheduler Report shows incorrect "Total task number" Key: AIRFLOW-2087 URL: https://issues.apache.org/jira/browse/AIRFLOW-2087 Project: Apache Airflow Issue Type: Bug Affects Versions: Airflow 1.8, 1.9.0 Reporter: I don't want an account [https://github.com/apache/incubator-airflow/blob/4751abf8acad766cb576ecfe3a333d68cc693b8c/airflow/models.py#L479] This line is printing the same "Total task number" as "Number of DAGs" in the cli tool `airflow list_dags -r`. E.G. some output: {{---}} {{DagBag loading stats for /pang/service/airflow/dags}} {{---}} {{Number of DAGs: 1143}} {{Total task number: 1143}} {{DagBag parsing time: 24.900074}} {{}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)