[GitHub] XD-DENG commented on issue #4130: [AIRFLOW-3193] Pin docker requirement version
XD-DENG commented on issue #4130: [AIRFLOW-3193] Pin docker requirement version URL: https://github.com/apache/incubator-airflow/pull/4130#issuecomment-436474322 Hi @ashb , would you mind to add this commit into 1.10.1? Understand that my PR #4049 (**[AIRFLOW-3203] Fix DockerOperator & some operator test**) is already cherry-picked into branch v1-10-test. The change made in that PR is partially meant to address a breaking change in Python package `docker==3.0.0`. But I missed to pin the `docker` version in that commit. This may result in issues for users who have `docker < 3.0.0` when they upgrade to 1.10.1. Cheers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on issue #4147: [AIRFLOW-3307] Upgrade rbac node deps via `npm audit fix`.
jmcarp commented on issue #4147: [AIRFLOW-3307] Upgrade rbac node deps via `npm audit fix`. URL: https://github.com/apache/incubator-airflow/pull/4147#issuecomment-436445983 @ashb agree that there's no security issue here--I just want npm to stop emitting warnings when I run commands This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3307) Update insecure node dependencies
[ https://issues.apache.org/jira/browse/AIRFLOW-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677400#comment-16677400 ] ASF GitHub Bot commented on AIRFLOW-3307: - jmcarp opened a new pull request #4147: [AIRFLOW-3307] Upgrade rbac node deps via `npm audit fix`. URL: https://github.com/apache/incubator-airflow/pull/4147 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3307 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Update insecure dependences with `npm audit fix`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Just updating build dependencies. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update insecure node dependencies > - > > Key: AIRFLOW-3307 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3307 > Project: Apache Airflow > Issue Type: Bug >Reporter: Josh Carp >Assignee: Josh Carp >Priority: Trivial > > `npm audit` shows some node dependencies that are out of date and potentially > insecure. We should update them with `npm audit fix`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp opened a new pull request #4147: [AIRFLOW-3307] Upgrade rbac node deps via `npm audit fix`.
jmcarp opened a new pull request #4147: [AIRFLOW-3307] Upgrade rbac node deps via `npm audit fix`. URL: https://github.com/apache/incubator-airflow/pull/4147 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3307 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Update insecure dependences with `npm audit fix`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Just updating build dependencies. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231326429 ## File path: airflow/contrib/operators/sagemaker_endpoint_config_operator.py ## @@ -0,0 +1,67 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerEndpointConfigOperator(SageMakerBaseOperator): + +""" +Create a SageMaker endpoint config. + +This operator returns The ARN of the endpoint config created in Amazon SageMaker + +:param config: The configuration necessary to create an endpoint config. + +For details of the configuration parameter, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +""" # noqa Review comment: Updated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231326391 ## File path: airflow/contrib/operators/sagemaker_endpoint_config_operator.py ## @@ -0,0 +1,67 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerEndpointConfigOperator(SageMakerBaseOperator): + +""" +Create a SageMaker endpoint config. + +This operator returns The ARN of the endpoint config created in Amazon SageMaker + +:param config: The configuration necessary to create an endpoint config. + +For details of the configuration parameter, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +""" # noqa Review comment: Got it! Thanks a lot! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436433446 @ashb I'm not sure what you're referring to? are you talking about the name of the `next_execution_date` property in the model or about the fact that I'm still checking next_run_date on the tests? If it's about the property name, I think it's in line with the CLI name so I figured I'd stick with it. If it's about the tests, it's normal as I'd like an ACK on the implementation itself before adjusting all the tests at once. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4143: [AIRFLOW-689] Okta Authentication
ashb commented on issue #4143: [AIRFLOW-689] Okta Authentication URL: https://github.com/apache/incubator-airflow/pull/4143#issuecomment-436431058 https://github.com/apache/incubator-airflow/pull/4142 might obsolete this PR This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"
codecov-io edited a comment on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145#issuecomment-436413754 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=h1) Report > Merging [#4145](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/dc0eb58e97178b050b79584f18d8b9bd2c3dea5f?src=pr=desc) will **increase** coverage by `0.03%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4145/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4145 +/- ## == + Coverage 77.46% 77.49% +0.03% == Files 199 199 Lines 1627216246 -26 == - Hits1260512590 -15 + Misses 3667 3656 -11 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.8% <ø> (-0.21%)` | :arrow_down: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.38% <ø> (+0.5%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=footer). Last update [dc0eb58...a4fc042](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2866) Missing CSRF Token Error on Web RBAC UI Create/Update Operations
[ https://issues.apache.org/jira/browse/AIRFLOW-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2866: --- Affects Version/s: 2.0.0 Fix Version/s: (was: 1.10.1) 2.0.0 > Missing CSRF Token Error on Web RBAC UI Create/Update Operations > > > Key: AIRFLOW-2866 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2866 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Affects Versions: 2.0.0 >Reporter: Jasper Kahn >Priority: Major > Fix For: 2.0.0 > > > Attempting to modify or delete many resources (such as Connections or Users) > results in a 400 from the webserver: > {quote}{{Bad Request}} > {{The CSRF session token is missing.}}{quote} > Logs report: > {quote}{{[2018-08-07 18:45:15,771] \{csrf.py:251} INFO - The CSRF session > token is missing.}} > {{192.168.9.1 - - [07/Aug/2018:18:45:15 +] "POST > /admin/connection/delete/ HTTP/1.1" 400 150 > "http://localhost:8081/admin/connection/; "Mozilla/5.0 (X11; Linux x86_64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 > Safari/537.36"}}{quote} > Chrome dev tools show the CSRF token is present in the request payload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"
codecov-io commented on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145#issuecomment-436413754 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=h1) Report > Merging [#4145](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/dc0eb58e97178b050b79584f18d8b9bd2c3dea5f?src=pr=desc) will **increase** coverage by `0.03%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4145/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4145 +/- ## == + Coverage 77.46% 77.49% +0.03% == Files 199 199 Lines 1627216246 -26 == - Hits1260512590 -15 + Misses 3667 3656 -11 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.8% <ø> (-0.21%)` | :arrow_down: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.38% <ø> (+0.5%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=footer). Last update [dc0eb58...a4fc042](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2216) Cannot specify a profile for AWS Hook to load with s3 config file
[ https://issues.apache.org/jira/browse/AIRFLOW-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-2216. Resolution: Fixed > Cannot specify a profile for AWS Hook to load with s3 config file > - > > Key: AIRFLOW-2216 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2216 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.9.0 > Environment: IDE: PyCharm > Airflow 1.9 > Python 3.4.3 >Reporter: Lorena Mesa >Assignee: Lorena Mesa >Priority: Minor > Fix For: 1.10.1 > > > Currently the source code for AWS Hook doesn't permit the user to provide a > profile when their aws connection object specifies in the extra param's > information on s3_config_file: > {code:java} > def _get_credentials(self, region_name): > aws_access_key_id = None > aws_secret_access_key = None > aws_session_token = None > endpoint_url = None > if self.aws_conn_id: > try: > # Cut for brevity > elif 's3_config_file' in connection_object.extra_dejson: > aws_access_key_id, aws_secret_access_key = \ > _parse_s3_config(connection_object.extra_dejson['s3_config_file'], >connection_object.extra_dejson.get('s3_config_format'), > connection_object.extra_dejson.get('profile')){code} > The _parse_s3_config method has a param for profile set to none, so by not > providing it in the method you cannot now specify a profile credential to be > loaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"
codecov-io edited a comment on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145#issuecomment-436413754 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=h1) Report > Merging [#4145](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/dc0eb58e97178b050b79584f18d8b9bd2c3dea5f?src=pr=desc) will **increase** coverage by `0.03%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4145/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4145 +/- ## == + Coverage 77.46% 77.49% +0.03% == Files 199 199 Lines 1627216246 -26 == - Hits1260512590 -15 + Misses 3667 3656 -11 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.8% <ø> (-0.21%)` | :arrow_down: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4145/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.38% <ø> (+0.5%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=footer). Last update [dc0eb58...a4fc042](https://codecov.io/gh/apache/incubator-airflow/pull/4145?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3172: [AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters
ashb commented on issue #3172: [AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters URL: https://github.com/apache/incubator-airflow/pull/3172#issuecomment-436413661 Merged in another Pr now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb closed pull request #3172: [AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters
ashb closed pull request #3172: [AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters URL: https://github.com/apache/incubator-airflow/pull/3172 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/aws_hook.py b/airflow/contrib/hooks/aws_hook.py index 2a8fa5f823..e4020fd35b 100644 --- a/airflow/contrib/hooks/aws_hook.py +++ b/airflow/contrib/hooks/aws_hook.py @@ -100,8 +100,11 @@ def _get_credentials(self, region_name): elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ - _parse_s3_config(connection_object.extra_dejson['s3_config_file'], - connection_object.extra_dejson.get('s3_config_format')) +_parse_s3_config( +connection_object.extra_dejson['s3_config_file'], +connection_object.extra_dejson['s3_config_format'], +connection_object.extra_dejson['profile'] +) if region_name is None: region_name = connection_object.extra_dejson.get('region_name') diff --git a/tests/contrib/hooks/test_aws_hook.py b/tests/contrib/hooks/test_aws_hook.py index 086e486144..dd1b69e173 100644 --- a/tests/contrib/hooks/test_aws_hook.py +++ b/tests/contrib/hooks/test_aws_hook.py @@ -14,6 +14,7 @@ # import unittest + import boto3 from airflow import configuration @@ -141,6 +142,26 @@ def test_get_credentials_from_extra(self, mock_get_connection): self.assertEqual(credentials_from_hook.secret_key, 'aws_secret_access_key') self.assertIsNone(credentials_from_hook.token) +@mock.patch('airflow.contrib.hooks.aws_hook._parse_s3_config', +return_value=('aws_access_key_id', 'aws_secret_access_key')) +@mock.patch.object(AwsHook, 'get_connection') +def test_get_credentials_from_extra_with_s3_config_and_profile( +self, mock_get_connection, mock_parse_s3_config +): +mock_connection = Connection( +extra='{"s3_config_format": "aws", ' + '"profile": "test", ' + '"s3_config_file": "aws-credentials", ' + '"region_name": "us-east-1"}') +mock_get_connection.return_value = mock_connection +hook = AwsHook() +hook._get_credentials(region_name=None) +mock_parse_s3_config.assert_called_with( +'aws-credentials', +'aws', +'test' +) + @unittest.skipIf(mock_sts is None, 'mock_sts package not present') @mock.patch.object(AwsHook, 'get_connection') @mock_sts This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4146: [AIRFLOW-3306] Disable flask-sqlalchemy modification tracking.
ashb commented on issue #4146: [AIRFLOW-3306] Disable flask-sqlalchemy modification tracking. URL: https://github.com/apache/incubator-airflow/pull/4146#issuecomment-436400468 Thanks, I saw the warning but hadn't dug in to what the event system was or if we were using it. (Trying to fix the tests on master, so might ask you to rebase once we do that) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3307) Update insecure node dependencies
[ https://issues.apache.org/jira/browse/AIRFLOW-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677279#comment-16677279 ] Ash Berlin-Taylor commented on AIRFLOW-3307: Sure, We should update them, but the security of it doesn't concern us as they are dev-time only so don't affect our users. > Update insecure node dependencies > - > > Key: AIRFLOW-3307 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3307 > Project: Apache Airflow > Issue Type: Bug >Reporter: Josh Carp >Assignee: Josh Carp >Priority: Trivial > > `npm audit` shows some node dependencies that are out of date and potentially > insecure. We should update them with `npm audit fix`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3307) Update insecure node dependencies
Josh Carp created AIRFLOW-3307: -- Summary: Update insecure node dependencies Key: AIRFLOW-3307 URL: https://issues.apache.org/jira/browse/AIRFLOW-3307 Project: Apache Airflow Issue Type: Bug Reporter: Josh Carp Assignee: Josh Carp `npm audit` shows some node dependencies that are out of date and potentially insecure. We should update them with `npm audit fix`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3161) Log Url link does not link to task instance logs in RBAC UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-3161. Resolution: Fixed Fix Version/s: 1.10.1 2.0.0 > Log Url link does not link to task instance logs in RBAC UI > --- > > Key: AIRFLOW-3161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3161 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > Attachments: image-2018-10-04-17-33-33-616.png, > image-2018-10-04-17-34-12-135.png, image-2018-10-04-17-35-14-224.png > > > In the new RBAC UI, the "Log Url" link (0) for Task instances don't link to > the log for the task instances (1). Instead, they link to the DAG log list > (2). > (0) > !image-2018-10-04-17-35-14-224.png|width=172,height=172! > (1) > !image-2018-10-04-17-34-12-135.png|width=660,height=376! > (2) > !image-2018-10-04-17-33-33-616.png|width=478,height=238! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (AIRFLOW-3161) Log Url link does not link to task instance logs in RBAC UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor reopened AIRFLOW-3161: Reopening to change fix versions > Log Url link does not link to task instance logs in RBAC UI > --- > > Key: AIRFLOW-3161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3161 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > Attachments: image-2018-10-04-17-33-33-616.png, > image-2018-10-04-17-34-12-135.png, image-2018-10-04-17-35-14-224.png > > > In the new RBAC UI, the "Log Url" link (0) for Task instances don't link to > the log for the task instances (1). Instead, they link to the DAG log list > (2). > (0) > !image-2018-10-04-17-35-14-224.png|width=172,height=172! > (1) > !image-2018-10-04-17-34-12-135.png|width=660,height=376! > (2) > !image-2018-10-04-17-33-33-616.png|width=478,height=238! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3306) Disable unused flask-sqlalchemy modification tracking
[ https://issues.apache.org/jira/browse/AIRFLOW-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677212#comment-16677212 ] ASF GitHub Bot commented on AIRFLOW-3306: - jmcarp opened a new pull request #4146: [AIRFLOW-3306] Disable flask-sqlalchemy modification tracking. URL: https://github.com/apache/incubator-airflow/pull/4146 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3306 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: By default, flask-sqlalchemy tracks model changes for its event system, which adds some overhead. Since I don't think we're using the flask-sqlalchemy event system, we should be able to turn off modification tracking and improve performance. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Just a config change; existing tests should cover it. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable unused flask-sqlalchemy modification tracking > - > > Key: AIRFLOW-3306 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3306 > Project: Apache Airflow > Issue Type: Bug >Reporter: Josh Carp >Assignee: Josh Carp >Priority: Trivial > > By default, flask-sqlalchemy tracks model changes for its event system, which > adds some overhead. Since I don't think we're using the flask-sqlalchemy > event system, we should be able to turn off modification tracking and improve > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp opened a new pull request #4146: [AIRFLOW-3306] Disable flask-sqlalchemy modification tracking.
jmcarp opened a new pull request #4146: [AIRFLOW-3306] Disable flask-sqlalchemy modification tracking. URL: https://github.com/apache/incubator-airflow/pull/4146 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3306 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: By default, flask-sqlalchemy tracks model changes for its event system, which adds some overhead. Since I don't think we're using the flask-sqlalchemy event system, we should be able to turn off modification tracking and improve performance. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Just a config change; existing tests should cover it. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3306) Disable unused flask-sqlalchemy modification tracking
Josh Carp created AIRFLOW-3306: -- Summary: Disable unused flask-sqlalchemy modification tracking Key: AIRFLOW-3306 URL: https://issues.apache.org/jira/browse/AIRFLOW-3306 Project: Apache Airflow Issue Type: Bug Reporter: Josh Carp Assignee: Josh Carp By default, flask-sqlalchemy tracks model changes for its event system, which adds some overhead. Since I don't think we're using the flask-sqlalchemy event system, we should be able to turn off modification tracking and improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ashb commented on issue #4140: [AIRFLOW-3302] Small CSS fixes
ashb commented on issue #4140: [AIRFLOW-3302] Small CSS fixes URL: https://github.com/apache/incubator-airflow/pull/4140#issuecomment-436343506 Probably okay, but could you include a before and after screenshot? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4133: [AIRFLOW-3270] Allow passwordless-binding for LDAP auth backend
ashb commented on issue #4133: [AIRFLOW-3270] Allow passwordless-binding for LDAP auth backend URL: https://github.com/apache/incubator-airflow/pull/4133#issuecomment-436341469 If we're deprecating the old non-RBAC api this might not matter on master anymore anyway :D This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3160) Load latest_dagruns asynchronously
[ https://issues.apache.org/jira/browse/AIRFLOW-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677020#comment-16677020 ] ASF GitHub Bot commented on AIRFLOW-3160: - ashb opened a new pull request #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145 This reverts commit 0287cceed8137823743497b7e11f19ef35bacd9d. Testing to see if the tests pass with this change reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Load latest_dagruns asynchronously > --- > > Key: AIRFLOW-3160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3160 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Affects Versions: 1.10.0 >Reporter: Dan Davydov >Assignee: Dan Davydov >Priority: Major > Fix For: 2.0.0 > > > The front page loads very slowly when the DB has latency because one blocking > query is made per DAG against the DB. > > The latest dagruns should be loaded asynchronously and in batch like the > other UI elements that query the database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ashb commented on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"
ashb commented on issue #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145#issuecomment-436331547 Don't merge this before Travis has run the tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb edited a comment on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time
ashb edited a comment on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time URL: https://github.com/apache/incubator-airflow/pull/4005#issuecomment-436330620 Examples that (we think) started happening after this PR was merged: https://travis-ci.org/apache/incubator-airflow/jobs/451428747#L4660 in `ERROR: test_success (tests.www_rbac.test_views.TestAirflowBaseViews)` (postgres this time) And same build on Mysql https://travis-ci.org/apache/incubator-airflow/jobs/451428748#L4660 Going to try reverting this PR and see if it fixes things, even though the error doesn't make any sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb opened a new pull request #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"
ashb opened a new pull request #4145: Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)" URL: https://github.com/apache/incubator-airflow/pull/4145 This reverts commit 0287cceed8137823743497b7e11f19ef35bacd9d. Testing to see if the tests pass with this change reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time
ashb commented on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time URL: https://github.com/apache/incubator-airflow/pull/4005#issuecomment-436330620 Examples that (we think) started happening after this PR was merged: https://travis-ci.org/apache/incubator-airflow/jobs/451428747#L4660 in `ERROR: test_success (tests.www_rbac.test_views.TestAirflowBaseViews)` (postgres this time) And same build on Mysql https://travis-ci.org/apache/incubator-airflow/jobs/451428748#L4660 Going to try reverting this PR and see if it fixes things. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3285) lazy marking of upstream_failed task state
[ https://issues.apache.org/jira/browse/AIRFLOW-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677006#comment-16677006 ] Ash Berlin-Taylor commented on AIRFLOW-3285: The lazy feature as you have described it isn't something we'd accept as it's quite a behaviour change and a little bit of a work-ardound, but a combo trigger rule so we could do say {{trigger_rule=\{'all_done','one_failed',\}}} to say "trigger on any of these conditions" would be acceptable > lazy marking of upstream_failed task state > -- > > Key: AIRFLOW-3285 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3285 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kevin McHale >Priority: Minor > > Airflow aggressively applies the {{upstream_failed}} task state: as soon as a > task fails, all of its downstream dependencies get marked. This sometimes > creates problems for us at Etsy. > In particular, we use a pattern for our hadoop Airflow DAGs along these lines: > # the DAG creates a hadoop cluster in GCP/Dataproc > # the DAG executes its tasks on the cluster > # the DAG deletes the cluster once all tasks are done > There are some cases in which the tasks immediately upstream of the > cluster-delete step get marked as {{upstream_failed}}, triggering the > cluster-delete step, even while other tasks continue to execute without > problems on the cluster. The cluster-delete step of course kills all of the > running tasks, requiring all of them to be re-run once the problem with the > failed task is mitigated. > As an example, a DAG that looks like this can exhibit the problem: > {code:java} > Cluster = ClusterCreateOperator(...) > A = Job1Operator(...) > Cluster << A > B = Job2Operator(...) > Cluster << B > C = Job3Operator(...) > A << C > B << C > ClusterDelete = DeleteClusterOperator(trigger_rule="all_done", ...) > D << ClusterDelete{code} > In a DAG like this, suppose task A fails while task B is running. Task C > will immediately be marked as {{upstream_failed}}, which will cause > ClusterDelete to run while task B is still running, which will cause task B > to also fail. > Our solution to this problem has been to implement something like [this > diff|https://github.com/mchalek/incubator-airflow/commit/585349018656cd9b2e3e3e113db6412345485dde], > which lazily applies the {{upstream_failed}} state only to tasks for which > all upstream tasks have already completed. > The consequence in terms of the example above is that task C will not be > marked {{upstream_failed}} in response to task A failing until task B > completes, ensuring that the cluster is not deleted while any upstream tasks > are running. > We find this not to have any adverse behavior on our airflow instances, so we > run all of them with this lazy-marking feature enabled. However, we > recognize that a change in behavior like this may be something that existing > users will want to opt-in for, so we included a config flag in the diff that > defaults to the original behavior. > We would appreciate your consideration of incorporating this diff, or > something like it, to allow us to configure this behavior in unmodified, > upstream airflow. > Thanks! > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3300) Frequent crash of scheduler while interacting with Airflow Metadata (Mysql)
[ https://issues.apache.org/jira/browse/AIRFLOW-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676990#comment-16676990 ] Ash Berlin-Taylor commented on AIRFLOW-3300: Might be fixed by AIRFLOW-2703, but a deadlock is possibly a sign of a bigger issue. > Frequent crash of scheduler while interacting with Airflow Metadata (Mysql) > --- > > Key: AIRFLOW-3300 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3300 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tanuj Gupta >Priority: Major > > It's been very frequent when scheduler tries to update the task instance > table and ends up with scheduler crash due to deadlock occurrence. Following > is the stack-trace for the same - > > {noformat} > sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, > 'Deadlock found when trying to get lock; try restarting transaction') [SQL: > u'UPDATE task_instance, dag_run SET task_instance.state=%s WHERE > task_instance.dag_id IN (%s, %s, %s, %s, %s) AND task_instance.state IN (%s, > %s) AND dag_run.dag_id = task_instance.dag_id AND dag_run.execution_date = > task_instance.execution_date AND dag_run.state != %s'] [parameters: (None, > 'org_test0802_h9zrva', > '27bd514b5ab9854b0a494110_45aa7868_1799_4046_ad20_f35e3de1a4ec_p8bkfg', > 'org_e2e_trainman1528457521430', 'org_blockerretryissue_6kiafp', > 'org_e2e_compute_v21540610294106_svnfmr', u'queued', u'scheduled', > u'running')] (Background on this error at: > http://sqlalche.me/e/e3q8){noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3305) KubernetesPodOperator has a race condition for log output
James Meickle created AIRFLOW-3305: -- Summary: KubernetesPodOperator has a race condition for log output Key: AIRFLOW-3305 URL: https://issues.apache.org/jira/browse/AIRFLOW-3305 Project: Apache Airflow Issue Type: Bug Components: kubernetes Affects Versions: 1.10.0 Reporter: James Meickle The KubernetesPodOperator follows logs from the container in the pod that it launches: [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/kubernetes/pod_launcher.py#L96] This is set to "follow" mode, which streams logs. However, it is possible (but not guaranteed) for the pod's container to have started before the log stream call reaches the cluster. In this case, re-running the same task may result in very different-looking logs, with no notification that there was any truncation. This is a confusing experience for operators who are not familiar with Kubernetes. My recommendation is to remove "tail_lines" which should have the effect of fetching all previous logs when streaming starts: https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#read_namespaced_pod_log -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3304) Kubernetes pod operator does not capture init container logs
James Meickle created AIRFLOW-3304: -- Summary: Kubernetes pod operator does not capture init container logs Key: AIRFLOW-3304 URL: https://issues.apache.org/jira/browse/AIRFLOW-3304 Project: Apache Airflow Issue Type: Improvement Components: kubernetes Affects Versions: 1.10.0 Reporter: James Meickle The KubernetesPodOperator attempts to stream logs from the created pod. However, it only gets logs from the 'base' container. If you subclass this operator and modify the pod to also have init containers, their logs are not streamed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436303989 OK @ashb @XD-DENG , I've update the PR to reuse the scheduler logic directly so that we are fully in line with how the scheduler does it. No magic, no debate on how it should be done. I've updated the code as you can see it's very easy and clean that way. Also, tested with (fixed) documentation example and it handles catchup correctly! Hope you like it :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow
oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/4068#discussion_r231178088 ## File path: airflow/contrib/hooks/aws_glue_job_hook.py ## @@ -0,0 +1,130 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +import time + + +class AwsGlueJobHook(AwsHook): +""" +Interact with AWS Glue - create job, trigger, crawler + +:param job_name: unique job name per AWS account +:type str +:param desc: job description +:type str +:param region_name: aws region name (example: us-east-1) +:type region_name: str Review comment: Sure This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mikemole commented on issue #4112: [AIRFLOW-3212] Add AwsGlueCatalogPartitionSensor
mikemole commented on issue #4112: [AIRFLOW-3212] Add AwsGlueCatalogPartitionSensor URL: https://github.com/apache/incubator-airflow/pull/4112#issuecomment-436302237 @ashb I incorporated your feedback, rebased, and squashed. Please let me know if there is anything else you need. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] janhicken commented on issue #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates
janhicken commented on issue #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates URL: https://github.com/apache/incubator-airflow/pull/4139#issuecomment-436299237 Do you mean some documentation? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436298904 To be complete: now ofc if I add a DummyOperator task to the example DAG, it does not fail any more since it can find out an actual start_date thanks to the logic from jobs.py... which leads me to think that this "find start_date from tasks" logic is important to keep. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ultrabug edited a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug edited a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436295821 @XD-DENG bad news, the example DAG in the documentation is breaking the scheduler on master so even the documentation is wrong. Fresh installation, if I run the scheduler using the DAG: ```python """ Code that goes along with the Airflow tutorial located at: https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py """ from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 12, 1), 'email': ['airf...@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), 'schedule_interval': '@hourly', } dag = DAG('tutorial', catchup=False, default_args=default_args) ``` nothing happens, the scheduler does not pick up anything now if I change the catchup parameter to `True` ```python dag = DAG('tutorial', catchup=True, default_args=default_args) ``` I get the scheduler failing with ``` Process DagFileProcessor1-Process: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 395, in helper pickle_dags) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1726, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1426, in _process_dags dag_run = self.create_dag_run(dag) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 872, in create_dag_run if next_run_date > timezone.utcnow(): TypeError: can't compare datetime.datetime to NoneType ``` That None result is annoying even the scheduler :) EDIT: quoting the documentation for expected behavior ``` In the example above, if the DAG is picked up by the scheduler daemon on 2016-01-02 at 6 AM, (or from the command line), a single DAG Run will be created, with an execution_date of 2016-01-01, and the next one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02. If the dag.catchup value had been True instead, the scheduler would have created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval hasn’t completed) and the scheduler will execute them sequentially. This behavior is great for atomic datasets that can easily be split into periods. Turning catchup off is great if your DAG Runs perform backfill internally. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436295821 @XD-DENG bad news, the example DAG in the documentation is breaking the scheduler on master so even the documentation is wrong. Fresh installation, if I run the scheduler using the DAG: ```python """ Code that goes along with the Airflow tutorial located at: https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py """ from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 12, 1), 'email': ['airf...@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), 'schedule_interval': '@hourly', } dag = DAG('tutorial', catchup=False, default_args=default_args) ``` nothing happens, the scheduler does not pick up anything now if I change the catchup parameter to `True` ```python dag = DAG('tutorial', catchup=True, default_args=default_args) ``` I get the scheduler failing with ``` Process DagFileProcessor1-Process: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 395, in helper pickle_dags) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1726, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1426, in _process_dags dag_run = self.create_dag_run(dag) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 872, in create_dag_run if next_run_date > timezone.utcnow(): TypeError: can't compare datetime.datetime to NoneType ``` That None result is annoying even the scheduler :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4129: [AIRFLOW-3294] Update connections form and integration docs
kaxil commented on a change in pull request #4129: [AIRFLOW-3294] Update connections form and integration docs URL: https://github.com/apache/incubator-airflow/pull/4129#discussion_r231166490 ## File path: docs/integration.rst ## @@ -1011,3 +1011,13 @@ QuboleFileSensor .. autoclass:: airflow.contrib.sensors.qubole_sensor.QuboleFileSensor + +QuboleCheckOperator +''' + +.. autoclass:: airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator + +QuboleValueCheckOperator + + +.. autoclass:: airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator Review comment: Ya, I am happy with that given the link points to the same class in code.rst :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests
ashb commented on issue #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests URL: https://github.com/apache/incubator-airflow/pull/4144#issuecomment-436290384 Whoops I flake8'd up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #4129: [AIRFLOW-3294] Update connections form and integration docs
ashb commented on a change in pull request #4129: [AIRFLOW-3294] Update connections form and integration docs URL: https://github.com/apache/incubator-airflow/pull/4129#discussion_r231161317 ## File path: docs/integration.rst ## @@ -1011,3 +1011,13 @@ QuboleFileSensor .. autoclass:: airflow.contrib.sensors.qubole_sensor.QuboleFileSensor + +QuboleCheckOperator +''' + +.. autoclass:: airflow.contrib.operators.qubole_check_operator.QuboleCheckOperator + +QuboleValueCheckOperator + + +.. autoclass:: airflow.contrib.operators.qubole_check_operator.QuboleValueCheckOperator Review comment: Having looked at how this is rendered how about a middle ground - we list the classes in this doc, but just as links, rather than including the code docstrings? For example, in this screen shot the list and short description would stay, but the EmrAddStepsOperator wouldn't be in this doc? https://user-images.githubusercontent.com/34150/48073272-50d22380-e1d6-11e8-801a-5a2e5401dbc3.png;> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3289) BashOperator mangles {{\}} escapes in commands
[ https://issues.apache.org/jira/browse/AIRFLOW-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676871#comment-16676871 ] Nikolay Semyachkin commented on AIRFLOW-3289: - The workaround suggested above didn't work (I still have `N` in the output). What worked is to put {code:java} cat example.csv | sed 's;,,;,\\N,;g' > example_processed.csv{code} in .sh file and call it from aiflow BashOperator like {code:java} bash process.sh{code} > BashOperator mangles {{\}} escapes in commands > -- > > Key: AIRFLOW-3289 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3289 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Nikolay Semyachkin >Priority: Major > Attachments: example.csv, issue_proof.py > > > I want to call a sed command on csv file to replace empty values (,,) with \N. > I can do it with the following bash command > {code:java} > cat example.csv | sed 's;,,;,\\N,;g' > example_processed.csv{code} > But when I try to do the same with airflow BashOperator, it substitutes ,, > with N (instead of \N). > > I attached the code and csv file to reproduce. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1632) MySQL to GCS fails for date/datetime before ~1850
[ https://issues.apache.org/jira/browse/AIRFLOW-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-1632. Resolution: Duplicate > MySQL to GCS fails for date/datetime before ~1850 > - > > Key: AIRFLOW-1632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1632 > Project: Apache Airflow > Issue Type: Bug > Components: gcp > Environment: Google Cloud Platform >Reporter: Michael Ghen >Assignee: Michael Ghen >Priority: Minor > > For tables in MySQL that use a "date" or "datetime" type, a dag that exports > from MySQL to Google Cloud Storage and then loads from GCS to BigQuery will > fail when the dates are before 1970. > When the table is exported as JSON to a GCS bucket, dates and datetimes are > converted to timestamps using: > {code} > time.mktime(value.timetuple()) > {code} > This creates a problem when you try parse a date that can't be converted to a > UNIX timestamp. For example: > {code} > >>> value = datetime.date(1850,1,1) > >>> time.mktime(value.timetuple()) > Traceback (most recent call last): > File "", line 1, in > ValueError: year out of range > {code} > *Steps to reproduce* > 0. Set up a MySQL connection and GCP connection in Airflow. > 1. Create a MySQL table with a "date" field and put some data into the table. > {code} > CREATE TABLE table_with_date ( > date_field date, > datetime_field datetime > ); > INSERT INTO table_with_date (date_field, datetime_field) VALUES > ('1850-01-01',NOW()); > {code} > 2. Create a DAG that will export the data from the MySQL to GCS and then load > from GCS to BigQuery (use the schema file). For example: > {code} > extract = MySqlToGoogleCloudStorageOperator( > task_id="extract_table", > mysql_conn_id='mysql_connection', > google_cloud_storage_conn_id='gcp_connection', > sql="SELECT * FROM table_with_date", > bucket='gcs-bucket', > filename='table_with_date.json', > schema_filename='schemas/table_with_date.json', > dag=dag) > load = GoogleCloudStorageToBigQueryOperator( > task_id="load_table", > bigquery_conn_id='gcp_connection', > google_cloud_storage_conn_id='gcp_connection', > bucket='gcs-bucket', > destination_project_dataset_table="dataset.table_with_date", > source_objects=['table_with_date.json'], > schema_object='schemas/table_with_date.json', > source_format='NEWLINE_DELIMITED_JSON', > create_disposition='CREATE_IF_NEEDED', > write_disposition='WRITE_TRUNCATE', > dag=dag) > load.set_upstream(extract) > {code} > 3. Run the DAG > Expected: The DAG runs successfully. > Actual: The `extract_table` task fails with error: > {code} > ... > ERROR - year out of range > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1374, in run > result = task_copy.execute(context=context) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py", > line 91, in execute > files_to_upload = self._write_local_data_files(cursor) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py", > line 132, in _write_local_data_files > row = map(self.convert_types, row) > File > "/usr/lib/python2.7/site-packages/airflow/contrib/operators/mysql_to_gcs.py", > line 196, in convert_types > return time.mktime(value.timetuple()) > ValueError: year out of range > ... > {code} > *Comments:* > This is really a problem with Python not being able to handle years before > like 1850. Bigquery timestamp seems to be able to take years all the way to > year 0001. From, > https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp-type, > Timestamp range is: > {quote} > 0001-01-01 00:00:00 to -12-31 23:59:59.99 UTC. > {quote} > I think the fix is probably to keep date/datetime converting to timestamp but > use `calendar.timegm` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests
kaxil closed pull request #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests URL: https://github.com/apache/incubator-airflow/pull/4144 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ultrabug commented on a change in pull request #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on a change in pull request #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#discussion_r231150858 ## File path: airflow/models.py ## @@ -3055,6 +3055,37 @@ def latest_execution_date(self): session.close() return execution_date +@property +def next_run_date(self): +""" +Returns the next run date for which the dag will be scheduled +""" +next_run_date = None +if not self.latest_execution_date: +# First run +task_start_dates = [t.start_date for t in self.tasks] +if task_start_dates: +next_run_date = self.normalize_schedule(min(task_start_dates)) +else: +next_run_date = self.following_schedule(self.latest_execution_date) +return next_run_date + +@property +def next_execution_date(self): +""" +Returns the next execution date at which the dag will be scheduled by Review comment: This is exactly the logic I've followed and you can see that my proposed implementation is based on the scheduler's jobs.py code indeed I can shrink the next_run_date function into one to make it simple tho indeed, gonna update @XD-DENG as you see the scheduler does otherwise and having None as a result looks strange to me since there's always a **period end** at which the scheduler itself will execute the DAG (if it has a start_date that is). So it's not because no previous execution has happened that there won't be any right. And the scheduler code above shows how it does it. Still, @XD-DENG I think your link has a nice example and I'll make sure to validate the fact that this PR behaves exactly as the documentation says. Sound good to you? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sbilinski commented on issue #4123: [AIRFLOW-3288] Add SNS integration
sbilinski commented on issue #4123: [AIRFLOW-3288] Add SNS integration URL: https://github.com/apache/incubator-airflow/pull/4123#issuecomment-436275126 @ashb 1. Sorry about that - I'm going to open a PR fixing this shortly. 2. I'd suggest to include this info in the PR template, as the following is not clear enough in my opinion: > When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time
ashb commented on issue #4005: [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time URL: https://github.com/apache/incubator-airflow/pull/4005#issuecomment-436270852 @aoen Speak of the devil :D this seems to be causing test failures after this PR was merged, but only on MySQL somewhat odly. Can you take a look please, (we might revert this PR temporarily) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb opened a new pull request #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests
ashb opened a new pull request #4144: [AIRFLOW-XXX] Use mocking in SimpleHttpOperator tests URL: https://github.com/apache/incubator-airflow/pull/4144 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] This changes thee tests in [AIRFLOW-3262]/(#4135) to use requests_mock rather than making actual HTTP requests as we have had this test fail on Travis with connection refused. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented
ron819 commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented URL: https://github.com/apache/incubator-airflow/pull/4075#discussion_r231133812 ## File path: airflow/operators/bash_operator.py ## @@ -49,6 +49,15 @@ class BashOperator(BaseOperator): :type env: dict :param output_encoding: Output encoding of bash command :type output_encoding: str + +On execution of the operator the task will up for retry when exception is raised. +However if a command exists with non-zero value Airflow will not recognize Review comment: @ashb added the requested changes This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4069: [AIRFLOW-3233] Fix deletion of DAGs in the UI
ashb commented on issue #4069: [AIRFLOW-3233] Fix deletion of DAGs in the UI URL: https://github.com/apache/incubator-airflow/pull/4069#issuecomment-436264005 Should be possible to test this by creating a Dag object in the DB and ensuring the correct delete link appears in the output This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #4069: [AIRFLOW-3233] Fix deletion of DAGs in the UI
ashb commented on a change in pull request #4069: [AIRFLOW-3233] Fix deletion of DAGs in the UI URL: https://github.com/apache/incubator-airflow/pull/4069#discussion_r231132887 ## File path: airflow/www/templates/airflow/dags.html ## @@ -191,11 +191,11 @@ DAGs - + Review comment: Minor nit: this includes the comment in thee HTML which we don't need. If you think the comment is useful then: ```suggestion {# Use dag_id instead of dag.dag_id, because the DAG might not exist in the webserver's DagBag #} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2865) Race condition between on_success_callback and LocalTaskJob's cleanup
[ https://issues.apache.org/jira/browse/AIRFLOW-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2865: --- Fix Version/s: 1.10.1 > Race condition between on_success_callback and LocalTaskJob's cleanup > - > > Key: AIRFLOW-2865 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2865 > Project: Apache Airflow > Issue Type: Bug >Reporter: Marcin Mejran >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > > The TaskInstance's run_raw_task method first records SUCCESS for the task > instance and then runs the on_success_callback function. > The LocalTaskJob's heartbeat_callback checks for any TI's with a SUCCESS > state and terminates their processes. > As such it's possible for the TI process to be terminated before the > on_success_callback function finishes running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3299) Logs for currently running sensors not visible in the UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3299: --- Summary: Logs for currently running sensors not visible in the UI (was: Logs for currently running tasks fail to load) > Logs for currently running sensors not visible in the UI > > > Key: AIRFLOW-3299 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3299 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Reporter: Brad Holmes >Priority: Major > > When a task is actively running, the logs are not appearing. I have tracked > this down to the {{next_try_number}} logic of task-instances. > In [the view at line > 836|https://github.com/apache/incubator-airflow/blame/master/airflow/www/views.py#L836], > we have > {code:java} > logs = [''] * (ti.next_try_number - 1 if ti is not None else 0) > {code} > The length of the {{logs}} array informs the frontend on the number of > {{attempts}} that exist, and thus how many AJAX calls to make to load the > logs. > Here is the current logic I have observed > ||Task State||Current length of 'logs'||Needed length of 'logs'|| > |Successfully completed in 1 attempt|1|1| > |Successfully completed in 2 attempt|2|2| > |Not yet attempted|0|0| > |Actively running task, first time|0|1| > That last case is the bug. Perhaps task-instance needs a method like > {{most_recent_try_number}} ? I don't see how to make use of {{try_number()}} > or {{next_try_number()}} to meet the need here. > ||Task State||try_number()||next_try_number()||Number of Attempts _Should_ > Display|| > |Successfully completed in 1 attempt|2|2|1| > |Successfully completed in 2 attempt|3|3|2| > |Not yet attempted|1|1|0| > |Actively running task, first time|0|1|1| > [~ashb] : You implemented this portion of task-instance 11 months ago. Any > suggestions? Or perhaps the problem is elsewhere? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3279) Documentation for Google Logging unclear
[ https://issues.apache.org/jira/browse/AIRFLOW-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-3279. Resolution: Information Provided > Documentation for Google Logging unclear > > > Key: AIRFLOW-3279 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3279 > Project: Apache Airflow > Issue Type: Bug > Components: configuration, Documentation, gcp, logging >Reporter: Paul Velthuis >Priority: Blocker > > The documentation of how to install logging to a Google Cloud bucket is > unclear. > I am now following the tutorial on the airflow page: > [https://airflow.apache.org/howto/write-logs.html] > Here I find it unclear what part of the 'logger' I have to adjust in the > `{{airflow/config_templates/airflow_local_settings.py}}`. > > The adjustment states: > > # Update the airflow.task and airflow.tas_runner blocks to be 'gcs.task' > instead of 'file.task'. 'loggers': > Unknown macro: \{ 'airflow.task'} > > However what I find in the template is: > |'loggers': \{\| \|'airflow.processor': { | |'handlers': > ['processor'], | |'level': LOG_LEVEL, | > |'propagate': False, | |},| > |'airflow.task': { > \| > \|'handlers': ['task'], > \| > \|'level': LOG_LEVEL, > \| > \|'propagate': False, > \| > \|},| > |'flask_appbuilder': { > \| > \|'handler': ['console'], > \| > \|'level': FAB_LOG_LEVEL, > \| > \|'propagate': True, > \| > \|}| > }, > > Since for me it is very important to do it right at the first time I hope > some clarity can be provided in what has to be adjusted in the logger. Is it > only the 'airflow.task' or more? > Furthermore, at step 6 it is a little unclear what remote_log_conn_id means. > I would propose to add a little more information to make this more clear. > > The current error I am facing is: > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 16, in > from airflow import configuration > File "/usr/local/lib/python2.7/site-packages/airflow/__init__.py", line 31, > in > from airflow import settings > File "/usr/local/lib/python2.7/site-packages/airflow/settings.py", line 198, > in > configure_logging() > File "/usr/local/lib/python2.7/site-packages/airflow/logging_config.py", > line 71, in configure_logging > dictConfig(logging_config) > File "/usr/local/lib/python2.7/logging/config.py", line 794, in dictConfig > dictConfigClass(config).configure() > File "/usr/local/lib/python2.7/logging/config.py", line 568, in configure > handler = self.configure_handler(handlers[name]) > File "/usr/local/lib/python2.7/logging/config.py", line 733, in > configure_handler > result = factory(**kwargs) > File > "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py", > line 30, in __init__ > super(GCSTaskHandler, self).__init__(base_log_folder, filename_template) > File > "/usr/local/lib/python2.7/site-packages/airflow/utils/log/file_task_handler.py", > line 46, in __init__ > self.filename_jinja_template = Template(self.filename_template) > File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line > 926, in __new__ > return env.from_string(source, template_class=cls) > File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line > 862, in from_string > return cls.from_code(self, self.compile(source), globals, None) > File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line > 565, in compile > self.handle_exception(exc_info, source_hint=source_hint) > File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line > 754, in handle_exception > reraise(exc_type, exc_value, tb) > File "", line 1, in template > jinja2.exceptions.TemplateSyntaxError: expected token ':', got '}' > Error in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/usr/local/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/local/lib/python2.7/logging/__init__.py", line 1676, in shutdown > h.close() > File > "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py", > line 73, in close > if self.closed: > AttributeError: 'GCSTaskHandler' object has no attribute 'closed' > Error in sys.exitfunc: > Traceback (most recent call last): > File "/usr/local/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > func(*targs, **kargs) > File "/usr/local/lib/python2.7/logging/__init__.py", line 1676, in shutdown > h.close() > File > "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py", > line 73, in close > if self.closed: > AttributeError: 'GCSTaskHandler' object has no attribute 'closed' > If I look at the Airflow code I see the following code for the >
[GitHub] ashb commented on issue #4123: [AIRFLOW-3288] Add SNS integration
ashb commented on issue #4123: [AIRFLOW-3288] Add SNS integration URL: https://github.com/apache/incubator-airflow/pull/4123#issuecomment-436260007 These new classes are not linked from the docs - please add them to at least docs/code.rst This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3293) Rename TimeDeltaSensor to ScheduleTimeDeltaSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676783#comment-16676783 ] Ash Berlin-Taylor commented on AIRFLOW-3293: AIRFLOW-2747 and AIRFLOW-850 would help with your second point. And as of the current release even if the sensor behaved how you wanted it would still take up an executor slot as that is how sensors work. I am uncertain on if this is a common enough use case to support directly given the two tickets mentioned above. > Rename TimeDeltaSensor to ScheduleTimeDeltaSensor > - > > Key: AIRFLOW-3293 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3293 > Project: Apache Airflow > Issue Type: Wish >Reporter: Darren Weber >Priority: Major > > The TimeDeltaSensor has baked-in lookups for the schedule and > schedule_interval lurking in the class init, it's not a pure time delta. It > would be ideal to have a TimeDelta that is purely relative to the time that > an upstream task triggers it. If there is a way to do this, please note it > here or suggest some implementation alternative that could achieve this > easily. > The implementation below using a PythonOperator works, but it consumes a > worker for 5min needlessly. It would be much better to have a TimeDelta that > accepts the time when an upstream sensor triggers it and then waits for a > timedelta, with options from the base sensor for poke interval (and timeout). > This could be used without consuming a worker as much with the reschedule > option. Something like this can help with adding jitter to downstream tasks > that could otherwise hit an HTTP endpoint too hard all at once. > {code:python} > def wait5(*args, **kwargs): > import random > import time as t > minutes = random.randint(3,6) > t.sleep(minutes * 60) > return True > wait5_task = PythonOperator( > task_id="python_op_wait_5min", > python_callable=wait5, > dag=a_dag) > upstream_http_sensor >> wait5_task > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented
ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented URL: https://github.com/apache/incubator-airflow/pull/4075#discussion_r229991618 ## File path: airflow/operators/bash_operator.py ## @@ -49,6 +49,15 @@ class BashOperator(BaseOperator): :type env: dict :param output_encoding: Output encoding of bash command :type output_encoding: str + +On execution of the operator the task will up for retry when exception is raised. +However if a command exists with non-zero value Airflow will not recognize +it as failure unless explicitly specified in the beggining of the script. Review comment: ```suggestion it as failure unless the whole shell exits with a failure. The easiest way of achieving this is to prefix the command with ``set -e;`` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented
ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented URL: https://github.com/apache/incubator-airflow/pull/4075#discussion_r231123170 ## File path: airflow/operators/bash_operator.py ## @@ -49,6 +49,15 @@ class BashOperator(BaseOperator): :type env: dict :param output_encoding: Output encoding of bash command :type output_encoding: str + +On execution of the operator the task will up for retry when exception is raised. +However if a command exists with non-zero value Airflow will not recognize Review comment: ```suggestion However if a sub-command exists with non-zero value Airflow will not recognize ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented
ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented URL: https://github.com/apache/incubator-airflow/pull/4075#discussion_r231123082 ## File path: airflow/operators/bash_operator.py ## @@ -49,6 +49,15 @@ class BashOperator(BaseOperator): :type env: dict :param output_encoding: Output encoding of bash command :type output_encoding: str + +On execution of the operator the task will up for retry when exception is raised. +However if a command exists with non-zero value Airflow will not recognize +it as failure unless explicitly specified in the beggining of the script. +Example: +bash_command = "python3 script.py '{{ next_execution_date }}'" +when executing command exit(1) the task will be marked as success. +bash_command = "set -e; python3 script.py '{{ next_execution_date }}'" +when executing command exit(1) the task will be marked as up for retry. Review comment: Also this so that exit(1) is rendered as code/mono-spaced ```suggestion when executing command ``exit(1)`` the task will be marked as up for retry. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented
ashb commented on a change in pull request #4075: [AIRFLOW-502] BashOperator success/failure conditions not documented URL: https://github.com/apache/incubator-airflow/pull/4075#discussion_r231095783 ## File path: airflow/operators/bash_operator.py ## @@ -49,6 +49,15 @@ class BashOperator(BaseOperator): :type env: dict :param output_encoding: Output encoding of bash command :type output_encoding: str + +On execution of the operator the task will up for retry when exception is raised. +However if a command exists with non-zero value Airflow will not recognize +it as failure unless explicitly specified in the beggining of the script. +Example: +bash_command = "python3 script.py '{{ next_execution_date }}'" +when executing command exit(1) the task will be marked as success. +bash_command = "set -e; python3 script.py '{{ next_execution_date }}'" +when executing command exit(1) the task will be marked as up for retry. Review comment: I suspect these aren't going to render quite right - can you run `make -C docs html` then check the rendering of this (I think it writs to docs/build/html/index.html or similar) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zackmeso commented on issue #4114: [AIRFLOW-3259] Fix internal server error when displaying charts
zackmeso commented on issue #4114: [AIRFLOW-3259] Fix internal server error when displaying charts URL: https://github.com/apache/incubator-airflow/pull/4114#issuecomment-436248564 @Fokko You can try and create a chart on airflow. It won't work. As I said earlier it's hard to test the whole code of the `chart_data` function (Check comment above). However, we can apply both `sort` and `sort_values` against some fake dataframe and see that both behave in the same way (using an older pandas version). Except that one is no longer a part of pandas and one still is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] msumit commented on issue #1933: [AIRFLOW-689] Okta Authentication
msumit commented on issue #1933: [AIRFLOW-689] Okta Authentication URL: https://github.com/apache/incubator-airflow/pull/1933#issuecomment-436243611 Folks who were interested in Okta's api based authentication, I've raised another PR (https://github.com/apache/incubator-airflow/pull/4143). Give it a try and see how it goes for you. @ashb agrees with your thought, but folks who want simplicity, there is no harm in having one more auth provider in the contrib section. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-689) Okta Authentication
[ https://issues.apache.org/jira/browse/AIRFLOW-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676725#comment-16676725 ] ASF GitHub Bot commented on AIRFLOW-689: msumit opened a new pull request #4143: [AIRFLOW-689] Okta Authentication URL: https://github.com/apache/incubator-airflow/pull/4143 Dear Airflow Maintainers, Please accept this PR that addresses the following issues: https://issues.apache.org/jira/browse/AIRFLOW-689 ### Description - [ ] Ability to use Okta's api based authentication mechanism ### Tests - Install Okta SDK with pip onto your airflow instance - Add in your Okta API key and organization URL (usually your_org.okta.com) in the config - Replace backend with okta_auth in the config - Log in ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Okta Authentication > --- > > Key: AIRFLOW-689 > URL: https://issues.apache.org/jira/browse/AIRFLOW-689 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Brian Yang >Assignee: Brian Yang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] msumit opened a new pull request #4143: [AIRFLOW-689] Okta Authentication
msumit opened a new pull request #4143: [AIRFLOW-689] Okta Authentication URL: https://github.com/apache/incubator-airflow/pull/4143 Dear Airflow Maintainers, Please accept this PR that addresses the following issues: https://issues.apache.org/jira/browse/AIRFLOW-689 ### Description - [ ] Ability to use Okta's api based authentication mechanism ### Tests - Install Okta SDK with pip onto your airflow instance - Add in your Okta API key and organization URL (usually your_org.okta.com) in the config - Replace backend with okta_auth in the config - Log in ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators
kaxil commented on issue #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators URL: https://github.com/apache/incubator-airflow/pull/4137#issuecomment-436231517 @ashb Yes, I am going to give that plugin a good review and play-around and then discuss with you guys on the best approach we can take. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators
kaxil closed pull request #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators URL: https://github.com/apache/incubator-airflow/pull/4137 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/spark_submit_hook.py b/airflow/contrib/hooks/spark_submit_hook.py index 65bb6134e6..197b84a7b6 100644 --- a/airflow/contrib/hooks/spark_submit_hook.py +++ b/airflow/contrib/hooks/spark_submit_hook.py @@ -33,14 +33,15 @@ class SparkSubmitHook(BaseHook, LoggingMixin): This hook is a wrapper around the spark-submit binary to kick off a spark-submit job. It requires that the "spark-submit" binary is in the PATH or the spark_home to be supplied. + :param conf: Arbitrary Spark configuration properties :type conf: dict :param conn_id: The connection id as configured in Airflow administration. When an -invalid connection_id is supplied, it will default to yarn. +invalid connection_id is supplied, it will default to yarn. :type conn_id: str :param files: Upload additional files to the executor running the job, separated by a - comma. Files will be placed in the working directory of each executor. - For example, serialized objects. +comma. Files will be placed in the working directory of each executor. +For example, serialized objects. :type files: str :param py_files: Additional python files used by the job, can be .zip, .egg or .py. :type py_files: str @@ -51,19 +52,19 @@ class SparkSubmitHook(BaseHook, LoggingMixin): :param java_class: the main class of the Java application :type java_class: str :param packages: Comma-separated list of maven coordinates of jars to include on the -driver and executor classpaths +driver and executor classpaths :type packages: str :param exclude_packages: Comma-separated list of maven coordinates of jars to exclude -while resolving the dependencies provided in 'packages' +while resolving the dependencies provided in 'packages' :type exclude_packages: str :param repositories: Comma-separated list of additional remote repositories to search -for the maven coordinates given with 'packages' +for the maven coordinates given with 'packages' :type repositories: str :param total_executor_cores: (Standalone & Mesos only) Total cores for all executors -(Default: all the available cores on the worker) +(Default: all the available cores on the worker) :type total_executor_cores: int :param executor_cores: (Standalone, YARN and Kubernetes only) Number of cores per -executor (Default: 2) +executor (Default: 2) :type executor_cores: int :param executor_memory: Memory per executor (e.g. 1000M, 2G) (Default: 1G) :type executor_memory: str @@ -80,7 +81,7 @@ class SparkSubmitHook(BaseHook, LoggingMixin): :param application_args: Arguments for the application being submitted :type application_args: list :param env_vars: Environment variables for spark-submit. It - supports yarn and k8s mode too. +supports yarn and k8s mode too. :type env_vars: dict :param verbose: Whether to pass the verbose flag to spark-submit process for debugging :type verbose: bool diff --git a/airflow/contrib/hooks/sqoop_hook.py b/airflow/contrib/hooks/sqoop_hook.py index 74cddc2b21..f4bad83144 100644 --- a/airflow/contrib/hooks/sqoop_hook.py +++ b/airflow/contrib/hooks/sqoop_hook.py @@ -36,13 +36,14 @@ class SqoopHook(BaseHook, LoggingMixin): Additional arguments that can be passed via the 'extra' JSON field of the sqoop connection: -* job_tracker: Job tracker local|jobtracker:port. -* namenode: Namenode. -* lib_jars: Comma separated jar files to include in the classpath. -* files: Comma separated files to be copied to the map reduce cluster. -* archives: Comma separated archives to be unarchived on the compute -machines. -* password_file: Path to file containing the password. + +* ``job_tracker``: Job tracker local|jobtracker:port. +* ``namenode``: Namenode. +* ``lib_jars``: Comma separated jar files to include in the classpath. +* ``files``: Comma separated files to be copied to the map reduce cluster. +* ``archives``: Comma separated archives to be unarchived on the compute +machines. +* ``password_file``: Path to file containing the password. :param conn_id: Reference to the sqoop connection. :type conn_id: str @@ -205,6 +206,7 @@ def import_table(self, table, target_dir=None, append=False, file_type="text", """
[GitHub] kaxil commented on issue #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates
kaxil commented on issue #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates URL: https://github.com/apache/incubator-airflow/pull/4139#issuecomment-436224075 Can you add this to `DataflowTemplateOperator`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3184) AwsHook with a conn_id that doesn't exist doesn't cause an error
[ https://issues.apache.org/jira/browse/AIRFLOW-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3184: --- Labels: easy-fix (was: ) Looking at the code the fix is probably in _get_credentials inside aws_hook - the try block should only re-raise the error if {{self.aws_conn_id != 'aws_default'}} > AwsHook with a conn_id that doesn't exist doesn't cause an error > > > Key: AIRFLOW-3184 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3184 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 1.9.0 >Reporter: Ash Berlin-Taylor >Priority: Minor > Labels: easy-fix > > It is possible to create an S3Hook (which is a subclass of the AwsHook) with > an invalid connection ID, and rather than it causing an error of "connection > not found" or similar, it falls back to something, and continues > execution anyway. > Simple repro code: > {code} > h = S3Hook('i-dontexist') > h.list_keys(bucket_name="bucket", prefix="folder/") > {code} > Ideally the first line here should throw an exception of some form or other > (possibly _except_ in the case where the {{conn_id}} is the default value of > "aws_default") rather than it's current behaviour, as this made it more > difficult to track down the source of our problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2679) GoogleCloudStorageToBigQueryOperator to support MERGE
[ https://issues.apache.org/jira/browse/AIRFLOW-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676624#comment-16676624 ] Daniel Lamblin commented on AIRFLOW-2679: - The operator uses the Google Cloud Storage Hook to download the schema, and the Big Query Hook to create the table, either as external or by loading. It does this by setting a table insert job with a configuration that includes the write disposition. As you can see from the Google Cloud Big Query API [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs] configuration.copy.writeDisposition only supports the three modes you listed that Airflow in turn supports. Merge is a query statement. It requires extra clauses to identify how to merge for a match and no match. Using it correctly involves two steps: loading the table, and merging the loaded table with your target table. As, in this scenario, the loaded table is likely just a staging table about to be discarded after the merge statement, it would make sense to load it as an external table, possibly saving time overall. > GoogleCloudStorageToBigQueryOperator to support MERGE > - > > Key: AIRFLOW-2679 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2679 > Project: Apache Airflow > Issue Type: Improvement >Reporter: jack >Priority: Major > > Currently the > {color:#22}GoogleCloudStorageToBigQueryOp{color}{color:#22}erator > support the write_disposition parameter which can be : WRITE_TRUNCATE, > WRITE_APPEND , WRITE_EMPTY{color} > > {color:#22}However Google has another very useful writing method > MERGE:{color} > {color:#22}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color} > {color:#22}{color:#22}Support MERGE statement will be extremely > useful.{color}{color} > {color:#22}{color:#22}The idea behind this request is to do it > directly from Google Storage file rather than load the file into a table and > then run another MERGE statement.{color}{color} > > {color:#22}{color:#22}The MERGE statement is really helpful when one > wants his records to be updated rather than appended or replaced. > {color}{color} > > {color:#22} {color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on issue #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators
kaxil commented on issue #4137: [AIRFLOW-XXX] Fix Docstrings in Hooks, Sensors & Operators URL: https://github.com/apache/incubator-airflow/pull/4137#issuecomment-436215443 @r39132 There is a huge list of issues but some of them are what we don't need like `D401 First line should be in imperative mood`. ![image](https://user-images.githubusercontent.com/8811558/48060340-a0ebbe80-e1b3-11e8-9593-b493814dc114.png) We decreased from 7641 to 7634. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb edited a comment on issue #4006: [AIRFLOW-3164] Verify server certificate when connecting to LDAP
ashb edited a comment on issue #4006: [AIRFLOW-3164] Verify server certificate when connecting to LDAP URL: https://github.com/apache/incubator-airflow/pull/4006#issuecomment-436214512 Fair point, they can create a custom auth backend if they want to,. I'll put that back. @bolkedebruin Have confirmed with `tshark` that not specifying a version uses TLSv1.2 by default. (Couldn't think of any way of unit testing this.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4006: [AIRFLOW-3164] Verify server certificate when connecting to LDAP
ashb commented on issue #4006: [AIRFLOW-3164] Verify server certificate when connecting to LDAP URL: https://github.com/apache/incubator-airflow/pull/4006#issuecomment-436214512 Fair point, they can create a custom auth backend if they want to,. I'll put that back. @bolkedebruin Have confirmed with `tshark` that not specifying a version uses TLSv1.2 by default. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure
XD-DENG commented on issue #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure URL: https://github.com/apache/incubator-airflow/pull/4138#issuecomment-436214350 Thank you @kaxil :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli
codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli URL: https://github.com/apache/incubator-airflow/pull/3586#issuecomment-403631506 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=h1) Report > Merging [#3586](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/e703d6beeb379ee88ef5e7df495e8a785666f8af?src=pr=desc) will **increase** coverage by `0.88%`. > The diff coverage is `50%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3586/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3586 +/- ## == + Coverage 76.67% 77.56% +0.88% == Files 199 204 +5 Lines 1618615767 -419 == - Hits1241012229 -181 + Misses 3776 3538 -238 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.43% <50%> (-0.4%)` | :arrow_down: | | [airflow/operators/slack\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc2xhY2tfb3BlcmF0b3IucHk=) | `0% <0%> (-97.37%)` | :arrow_down: | | [airflow/sensors/s3\_key\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3MzX2tleV9zZW5zb3IucHk=) | `31.03% <0%> (-68.97%)` | :arrow_down: | | [airflow/sensors/s3\_prefix\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3MzX3ByZWZpeF9zZW5zb3IucHk=) | `41.17% <0%> (-58.83%)` | :arrow_down: | | [airflow/example\_dags/example\_python\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9weXRob25fb3BlcmF0b3IucHk=) | `78.94% <0%> (-15.79%)` | :arrow_down: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `71.34% <0%> (-13.04%)` | :arrow_down: | | [airflow/hooks/mysql\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9teXNxbF9ob29rLnB5) | `78% <0%> (-12%)` | :arrow_down: | | [airflow/sensors/sql\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3NxbF9zZW5zb3IucHk=) | `90.47% <0%> (-9.53%)` | :arrow_down: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `73.91% <0%> (-7.52%)` | :arrow_down: | | [airflow/configuration.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9jb25maWd1cmF0aW9uLnB5) | `83.95% <0%> (-5.47%)` | :arrow_down: | | ... and [94 more](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=footer). Last update [e703d6b...e7e5a68](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure
kaxil commented on issue #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure URL: https://github.com/apache/incubator-airflow/pull/4138#issuecomment-436212845 Thanks @XD-DENG This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3301) Update CI test for [AIRFLOW-3132] (PR #3977)
[ https://issues.apache.org/jira/browse/AIRFLOW-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676608#comment-16676608 ] ASF GitHub Bot commented on AIRFLOW-3301: - kaxil closed pull request #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure URL: https://github.com/apache/incubator-airflow/pull/4138 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/operators/test_docker_operator.py b/tests/operators/test_docker_operator.py index a7d63e4ebc..7ab27c1aeb 100644 --- a/tests/operators/test_docker_operator.py +++ b/tests/operators/test_docker_operator.py @@ -80,6 +80,7 @@ def test_execute(self, client_class_mock, mkdtemp_mock): shm_size=1000, cpu_shares=1024, mem_limit=None, + auto_remove=False, dns=None, dns_search=None) client_mock.images.assert_called_with(name='ubuntu:latest') This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update CI test for [AIRFLOW-3132] (PR #3977) > > > Key: AIRFLOW-3301 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3301 > Project: Apache Airflow > Issue Type: Test > Components: tests >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > In PR [https://github.com/apache/incubator-airflow/pull/3977,] test is not > updated accordingly, and it results in CI failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure
kaxil closed pull request #4138: [AIRFLOW-3301] Update DockerOperator unit test for PR #3977 to fix CI failure URL: https://github.com/apache/incubator-airflow/pull/4138 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/operators/test_docker_operator.py b/tests/operators/test_docker_operator.py index a7d63e4ebc..7ab27c1aeb 100644 --- a/tests/operators/test_docker_operator.py +++ b/tests/operators/test_docker_operator.py @@ -80,6 +80,7 @@ def test_execute(self, client_class_mock, mkdtemp_mock): shm_size=1000, cpu_shares=1024, mem_limit=None, + auto_remove=False, dns=None, dns_search=None) client_mock.images.assert_called_with(name='ubuntu:latest') This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2842) GCS rsync operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676582#comment-16676582 ] Daniel Lamblin commented on AIRFLOW-2842: - Do you think it would not be possible with a simple BashOperator call to the utility? > GCS rsync operator > -- > > Key: AIRFLOW-2842 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2842 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Vikram Oberoi >Priority: Major > > The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects > from one bucket to another using a wildcard. > As long you don't delete anything in the source bucket, the destination > bucket will end up synchronized on every run. > However, each object gets copied over even if it exists at the destination, > which makes this operation inefficient, time-consuming, and potentially > costly. > I'd love an operator that behaves like `gsutil rsync` for when I need to > synchronize two buckets, supporting `gsutil rsync -d` behavior as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3303) Deprecate old UI in favor of new FAB RBAC
[ https://issues.apache.org/jira/browse/AIRFLOW-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676513#comment-16676513 ] ASF GitHub Bot commented on AIRFLOW-3303: - verdan opened a new pull request #4142: [AIRFLOW-3303] Deprecate old UI in favor of FAB URL: https://github.com/apache/incubator-airflow/pull/4142 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3303) issues and references them in the PR title. ### Description - [ ] We are using two different versions of UI in Apache Airflow. Idea is to deprecate and remove the older version of UI and use the new Flask App Builder (RBAC) version as the default UI from now on. (most probably in release 2.0.x) This PR removes the old UI and renames the references of `www_rbac` to `www`. ### Tests - [ ] Skipped some of the test case classes as these were purely using the older version of application and configurations. ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Deprecate old UI in favor of new FAB RBAC > - > > Key: AIRFLOW-3303 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3303 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Reporter: Verdan Mahmood >Assignee: Verdan Mahmood >Priority: Major > > It's hard to maintain two multiple UIs in parallel. > The idea is to remove the old UI in favor of the new FAB RBAC version. > Make sure to verify all the REST APIs are in place, and working. > All test cases should pass. Skip the tests related to the old UI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] verdan opened a new pull request #4142: [AIRFLOW-3303] Deprecate old UI in favor of FAB
verdan opened a new pull request #4142: [AIRFLOW-3303] Deprecate old UI in favor of FAB URL: https://github.com/apache/incubator-airflow/pull/4142 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3303) issues and references them in the PR title. ### Description - [ ] We are using two different versions of UI in Apache Airflow. Idea is to deprecate and remove the older version of UI and use the new Flask App Builder (RBAC) version as the default UI from now on. (most probably in release 2.0.x) This PR removes the old UI and renames the references of `www_rbac` to `www`. ### Tests - [ ] Skipped some of the test case classes as these were purely using the older version of application and configurations. ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3303) Deprecate old UI in favor of new FAB RBAC
Verdan Mahmood created AIRFLOW-3303: --- Summary: Deprecate old UI in favor of new FAB RBAC Key: AIRFLOW-3303 URL: https://issues.apache.org/jira/browse/AIRFLOW-3303 Project: Apache Airflow Issue Type: Improvement Components: ui Reporter: Verdan Mahmood Assignee: Verdan Mahmood It's hard to maintain two multiple UIs in parallel. The idea is to remove the old UI in favor of the new FAB RBAC version. Make sure to verify all the REST APIs are in place, and working. All test cases should pass. Skip the tests related to the old UI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] phani8996 edited a comment on issue #4111: [AIRFLOW-3266] Add AWS Athena Operator and hook
phani8996 edited a comment on issue #4111: [AIRFLOW-3266] Add AWS Athena Operator and hook URL: https://github.com/apache/incubator-airflow/pull/4111#issuecomment-436176981 @ashb requested changes have been made. Please review and share if anything else need to be done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] phani8996 commented on issue #4111: [AIRFLOW-3266] Add AWS Athena Operator and hook
phani8996 commented on issue #4111: [AIRFLOW-3266] Add AWS Athena Operator and hook URL: https://github.com/apache/incubator-airflow/pull/4111#issuecomment-436176981 @ashb all requested changes have been made. Please review and share if anything else can be done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3302) Small CSS fixes
[ https://issues.apache.org/jira/browse/AIRFLOW-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676369#comment-16676369 ] ASF GitHub Bot commented on AIRFLOW-3302: - msumit opened a new pull request #4140: [AIRFLOW-3302] Small CSS fixes URL: https://github.com/apache/incubator-airflow/pull/4140 ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AIRFLOW-3302 ### Description - [ ] 2 small CSS fixes - Don't highlight logout button when viewing *Log* tab of a task run - Align Airflow logo to the center of the login page ### Tests - [ ] Tested manually ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Small CSS fixes > --- > > Key: AIRFLOW-3302 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3302 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jie8357IOII opened a new pull request #4141: [bugfix] minikube enviroment lack of init airflow db step
jie8357IOII opened a new pull request #4141: [bugfix] minikube enviroment lack of init airflow db step URL: https://github.com/apache/incubator-airflow/pull/4141 Make sure you have checked _all_ steps below. ### Jira No jira. ### Description When use kubenetes/kube/deplopy to deploy airflow on minikube, always failure in init container. Because lack of 'airflow' database in postgresql, I add init 'airflow' db step in python. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] msumit opened a new pull request #4140: [AIRFLOW-3302] Small CSS fixes
msumit opened a new pull request #4140: [AIRFLOW-3302] Small CSS fixes URL: https://github.com/apache/incubator-airflow/pull/4140 ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AIRFLOW-3302 ### Description - [ ] 2 small CSS fixes - Don't highlight logout button when viewing *Log* tab of a task run - Align Airflow logo to the center of the login page ### Tests - [ ] Tested manually ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2715) Dataflow template operator dosenot support region parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676330#comment-16676330 ] ASF GitHub Bot commented on AIRFLOW-2715: - janhicken opened a new pull request #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates URL: https://github.com/apache/incubator-airflow/pull/4139 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2715) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: To launch an instance of a Dataflow template in the configured region, the API service.projects().locations().teplates() instead of service.projects().templates() has to be used. Otherwise, all jobs will always be started in us-central1. In case there is no region configured, the default region `us-central1` will get picked up. To make it even worse, the polling for the job status already honors the region parameter and will search for the job in the wrong region in the current implementation. Because the job's status is not found, the corresponding Airflow task will hang. This PR is a second approach and follow-up of #4125 ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: `tests.contrib.hooks.test_gcp_dataflow_hook.DataFlowTemplateHookTest` has been modified ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Dataflow template operator dosenot support region parameter > --- > > Key: AIRFLOW-2715 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2715 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 1.9.0 >Reporter: Mohammed Tameem >Priority: Critical > Fix For: 2.0.0 > > > The DataflowTemplateOperator uses dataflow.projects.templates.launch which > has a region parameter but only supports execution of the dataflow job in the > us-central1 region. Alternatively there is another api, > dataflow.projects.locations.templates.launch which supports execution of the > template in all regional endpoints provided by google cloud. > It would be great if, > # The base REST API of this operator could be changed from > "dataflow.projects.templates.launch" to > "dataflow.projects.locations.templates.launch" > # A templated region paramter was included in the operator to run the > dataflow job in the requested regional endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] janhicken opened a new pull request #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates
janhicken opened a new pull request #4139: [AIRFLOW-2715] Pick up the region setting while launching Dataflow templates URL: https://github.com/apache/incubator-airflow/pull/4139 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2715) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: To launch an instance of a Dataflow template in the configured region, the API service.projects().locations().teplates() instead of service.projects().templates() has to be used. Otherwise, all jobs will always be started in us-central1. In case there is no region configured, the default region `us-central1` will get picked up. To make it even worse, the polling for the job status already honors the region parameter and will search for the job in the wrong region in the current implementation. Because the job's status is not found, the corresponding Airflow task will hang. This PR is a second approach and follow-up of #4125 ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: `tests.contrib.hooks.test_gcp_dataflow_hook.DataFlowTemplateHookTest` has been modified ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231030413 ## File path: airflow/contrib/operators/sagemaker_endpoint_operator.py ## @@ -0,0 +1,151 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerEndpointOperator(SageMakerBaseOperator): + +""" +Create a SageMaker endpoint. + +This operator returns The ARN of the endpoint created in Amazon SageMaker + +:param config: +The configuration necessary to create an endpoint. + +If you need to create a SageMaker endpoint based on an existed SageMaker model and an existed SageMaker +endpoint config, + +config = endpoint_configuration; + +If you need to create all of SageMaker model, SageMaker endpoint-config and SageMaker endpoint, + +config = { +'Model': model_configuration, + +'EndpointConfig': endpoint_config_configuration, + +'Endpoint': endpoint_configuration +} + +For details of the configuration parameter of model_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model + +For details of the configuration parameter of endpoint_config_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config + +For details of the configuration parameter of endpoint_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +:param wait_for_completion: Whether the operator should wait until the endpoint creation finishes. +:type wait_for_completion: bool +:param check_interval: If wait is set to True, this is the time interval, in seconds, that this operation waits +before polling the status of the endpoint creation. +:type check_interval: int +:param max_ingestion_time: If wait is set to True, this operation fails if the endpoint creation doesn't finish +within max_ingestion_time seconds. If you set this parameter to None it never times out. +:type max_ingestion_time: int +:param operation: Whether to create an endpoint or update an endpoint. Must be either 'create or 'update'. +:type operation: str +""" # noqa + +@apply_defaults +def __init__(self, + config, + wait_for_completion=True, + check_interval=30, + max_ingestion_time=None, + operation='create', + *args, **kwargs): +super(SageMakerEndpointOperator, self).__init__(config=config, +*args, **kwargs) + +self.config = config +self.wait_for_completion = wait_for_completion +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.operation = operation.lower() +if self.operation not in ['create', 'update']: +raise AirflowException('Invalid value! Argument operation has to be one of "create" and "update"') Review comment: Updated. Thanks for explanation for AirflowException. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231030278 ## File path: tests/contrib/sensors/test_sagemaker_endpoint_sensor.py ## @@ -0,0 +1,110 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import unittest + +try: +from unittest import mock +except ImportError: +try: +import mock +except ImportError: +mock = None + +from airflow import configuration +from airflow.contrib.sensors.sagemaker_endpoint_sensor \ +import SageMakerEndpointSensor +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.exceptions import AirflowException + +DESCRIBE_ENDPOINT_CREATING_RESPONSE = { +'EndpointStatus': 'Creating', +'ResponseMetadata': { +'HTTPStatusCode': 200, +} +} +DESCRIBE_ENDPOINT_INSERVICE_RESPONSE = { +'EndpointStatus': 'InService', +'ResponseMetadata': { +'HTTPStatusCode': 200, +} +} + +DESCRIBE_ENDPOINT_FAILED_RESPONSE = { +'EndpointStatus': 'Failed', +'ResponseMetadata': { +'HTTPStatusCode': 200, +}, +'FailureReason': 'Unknown' +} + +DESCRIBE_ENDPOINT_UPDATING_RESPONSE = { +'EndpointStatus': 'Updating', +'ResponseMetadata': { +'HTTPStatusCode': 200, +} +} + + +class TestSageMakerEndpointSensor(unittest.TestCase): +def setUp(self): +configuration.load_test_config() + +@mock.patch.object(SageMakerHook, 'get_conn') +@mock.patch.object(SageMakerHook, 'describe_endpoint') +def test_sensor_with_failure(self, mock_describe, mock_client): +mock_describe.side_effect = [DESCRIBE_ENDPOINT_FAILED_RESPONSE] +sensor = SageMakerEndpointSensor( +task_id='test_task', +poke_interval=1, +aws_conn_id='aws_test', +endpoint_name='test_job_name' +) +self.assertRaises(AirflowException, sensor.execute, None) +mock_describe.assert_called_once_with('test_job_name') + +@mock.patch.object(SageMakerHook, 'get_conn') +@mock.patch.object(SageMakerHook, '__init__') +@mock.patch.object(SageMakerHook, 'describe_endpoint') +def test_sensor(self, mock_describe, hook_init, mock_client): +hook_init.return_value = None + +mock_describe.side_effect = [ +DESCRIBE_ENDPOINT_CREATING_RESPONSE, +DESCRIBE_ENDPOINT_UPDATING_RESPONSE, +DESCRIBE_ENDPOINT_INSERVICE_RESPONSE +] +sensor = SageMakerEndpointSensor( +task_id='test_task', +poke_interval=1, +aws_conn_id='aws_test', +endpoint_name='test_job_name' +) + +sensor.execute(None) + +# make sure we called 4 times(terminated when its compeleted) Review comment: Nice catch! Updated all sensor tests with this inaccurate comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-957) the execution_date of dagrun that is created by TriggerDagRunOperator is not euqal the execution_date of TriggerDagRunOperator's task instance
[ https://issues.apache.org/jira/browse/AIRFLOW-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676321#comment-16676321 ] ASF GitHub Bot commented on AIRFLOW-957: anxodio closed pull request #2238: [AIRFLOW-957] Add execution_date parameter to TriggerDagRunOperator URL: https://github.com/apache/incubator-airflow/pull/2238 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/operators/dagrun_operator.py b/airflow/operators/dagrun_operator.py index c3ffa1ada7..7094c50071 100644 --- a/airflow/operators/dagrun_operator.py +++ b/airflow/operators/dagrun_operator.py @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from datetime import datetime import logging from airflow.models import BaseOperator, DagBag @@ -23,9 +22,27 @@ class DagRunOrder(object): -def __init__(self, run_id=None, payload=None): -self.run_id = run_id +def __init__(self, execution_date, run_id=None, payload=None): +self._run_id = run_id self.payload = payload +self.execution_date = execution_date + +@property +def run_id(self): +return self._run_id or self._auto_run_id + +@run_id.setter +def run_id(self, value): +self._run_id = value + +@property +def execution_date(self): +return self._execution_date + +@execution_date.setter +def execution_date(self, dt): +self._execution_date = dt +self._auto_run_id = 'trig__%s' % dt.isoformat() class TriggerDagRunOperator(BaseOperator): @@ -37,8 +54,11 @@ class TriggerDagRunOperator(BaseOperator): :param python_callable: a reference to a python function that will be called while passing it the ``context`` object and a placeholder object ``obj`` for your callable to fill and return if you want -a DagRun created. This ``obj`` object contains a ``run_id`` and -``payload`` attribute that you can modify in your function. +a DagRun created. This ``obj`` object contains an +``execution_date``, a ``run_id`` and ``payload`` attribute that +you can modify in your function. +The ``execution_date`` is by default the current +task's instance ``execution_date``. The ``run_id`` should be a unique identifier for that DAG run, and the payload has to be a picklable object that will be made available to your tasks while executing that DAG run. Your function header @@ -60,7 +80,7 @@ def __init__( self.trigger_dag_id = trigger_dag_id def execute(self, context): -dro = DagRunOrder(run_id='trig__' + datetime.now().isoformat()) +dro = DagRunOrder(context['execution_date']) dro = self.python_callable(context, dro) if dro: session = settings.Session() @@ -70,6 +90,7 @@ def execute(self, context): run_id=dro.run_id, state=State.RUNNING, conf=dro.payload, +execution_date=dro.execution_date, external_trigger=True) logging.info("Creating DagRun {}".format(dr)) session.add(dr) diff --git a/tests/core.py b/tests/core.py index 353b847c6b..3c6e841bb9 100644 --- a/tests/core.py +++ b/tests/core.py @@ -446,6 +446,7 @@ def test_bash_operator_kill(self): def test_trigger_dagrun(self): def trigga(context, obj): +trigga.run_id = obj.run_id if True: return obj @@ -456,6 +457,40 @@ def trigga(context, obj): dag=self.dag) t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) +session = settings.Session() +new_dag_run = session.query(models.DagRun).filter( +models.DagRun.run_id == trigga.run_id).first() +self.assertEqual(new_dag_run.execution_date, DEFAULT_DATE) + +def test_trigger_dagrun_order_modified(self): +""" +Test TriggerDagRunOperator with changes in DagRunOrder +""" +new_execution_date = datetime(2016, 1, 1) +new_dag_run_id = 'manual_run_id' +payload_key = 'message' +payload = {payload_key: 'Hello World'} + +def trigga(context, obj): +obj.run_id = new_dag_run_id +obj.execution_date = new_execution_date +obj.payload = payload +if True: +return obj + +t = TriggerDagRunOperator( +task_id='test_trigger_dagrun', +trigger_dag_id='example_bash_operator', +python_callable=trigga, +dag=self.dag) +
[GitHub] anxodio commented on a change in pull request #2238: [AIRFLOW-957] Add execution_date parameter to TriggerDagRunOperator
anxodio commented on a change in pull request #2238: [AIRFLOW-957] Add execution_date parameter to TriggerDagRunOperator URL: https://github.com/apache/incubator-airflow/pull/2238#discussion_r231029186 ## File path: tests/core.py ## @@ -456,6 +457,40 @@ def trigga(context, obj): dag=self.dag) t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) +session = settings.Session() +new_dag_run = session.query(models.DagRun).filter( +models.DagRun.run_id == trigga.run_id).first() +self.assertEqual(new_dag_run.execution_date, DEFAULT_DATE) + +def test_trigger_dagrun_order_modified(self): +""" +Test TriggerDagRunOperator with changes in DagRunOrder +""" +new_execution_date = datetime(2016, 1, 1) +new_dag_run_id = 'manual_run_id' +payload_key = 'message' +payload = {payload_key: 'Hello World'} + +def trigga(context, obj): +obj.run_id = new_dag_run_id +obj.execution_date = new_execution_date +obj.payload = payload +if True: Review comment: @ron819 yes, I will close it. I think it is a good idea to implement something like this, but it's a breaking change and I think that must be accorded with the community before. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anxodio closed pull request #2238: [AIRFLOW-957] Add execution_date parameter to TriggerDagRunOperator
anxodio closed pull request #2238: [AIRFLOW-957] Add execution_date parameter to TriggerDagRunOperator URL: https://github.com/apache/incubator-airflow/pull/2238 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/operators/dagrun_operator.py b/airflow/operators/dagrun_operator.py index c3ffa1ada7..7094c50071 100644 --- a/airflow/operators/dagrun_operator.py +++ b/airflow/operators/dagrun_operator.py @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from datetime import datetime import logging from airflow.models import BaseOperator, DagBag @@ -23,9 +22,27 @@ class DagRunOrder(object): -def __init__(self, run_id=None, payload=None): -self.run_id = run_id +def __init__(self, execution_date, run_id=None, payload=None): +self._run_id = run_id self.payload = payload +self.execution_date = execution_date + +@property +def run_id(self): +return self._run_id or self._auto_run_id + +@run_id.setter +def run_id(self, value): +self._run_id = value + +@property +def execution_date(self): +return self._execution_date + +@execution_date.setter +def execution_date(self, dt): +self._execution_date = dt +self._auto_run_id = 'trig__%s' % dt.isoformat() class TriggerDagRunOperator(BaseOperator): @@ -37,8 +54,11 @@ class TriggerDagRunOperator(BaseOperator): :param python_callable: a reference to a python function that will be called while passing it the ``context`` object and a placeholder object ``obj`` for your callable to fill and return if you want -a DagRun created. This ``obj`` object contains a ``run_id`` and -``payload`` attribute that you can modify in your function. +a DagRun created. This ``obj`` object contains an +``execution_date``, a ``run_id`` and ``payload`` attribute that +you can modify in your function. +The ``execution_date`` is by default the current +task's instance ``execution_date``. The ``run_id`` should be a unique identifier for that DAG run, and the payload has to be a picklable object that will be made available to your tasks while executing that DAG run. Your function header @@ -60,7 +80,7 @@ def __init__( self.trigger_dag_id = trigger_dag_id def execute(self, context): -dro = DagRunOrder(run_id='trig__' + datetime.now().isoformat()) +dro = DagRunOrder(context['execution_date']) dro = self.python_callable(context, dro) if dro: session = settings.Session() @@ -70,6 +90,7 @@ def execute(self, context): run_id=dro.run_id, state=State.RUNNING, conf=dro.payload, +execution_date=dro.execution_date, external_trigger=True) logging.info("Creating DagRun {}".format(dr)) session.add(dr) diff --git a/tests/core.py b/tests/core.py index 353b847c6b..3c6e841bb9 100644 --- a/tests/core.py +++ b/tests/core.py @@ -446,6 +446,7 @@ def test_bash_operator_kill(self): def test_trigger_dagrun(self): def trigga(context, obj): +trigga.run_id = obj.run_id if True: return obj @@ -456,6 +457,40 @@ def trigga(context, obj): dag=self.dag) t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) +session = settings.Session() +new_dag_run = session.query(models.DagRun).filter( +models.DagRun.run_id == trigga.run_id).first() +self.assertEqual(new_dag_run.execution_date, DEFAULT_DATE) + +def test_trigger_dagrun_order_modified(self): +""" +Test TriggerDagRunOperator with changes in DagRunOrder +""" +new_execution_date = datetime(2016, 1, 1) +new_dag_run_id = 'manual_run_id' +payload_key = 'message' +payload = {payload_key: 'Hello World'} + +def trigga(context, obj): +obj.run_id = new_dag_run_id +obj.execution_date = new_execution_date +obj.payload = payload +if True: +return obj + +t = TriggerDagRunOperator( +task_id='test_trigger_dagrun', +trigger_dag_id='example_bash_operator', +python_callable=trigga, +dag=self.dag) +t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) + +session = settings.Session() +new_dag_run = session.query(models.DagRun).filter( +models.DagRun.run_id == new_dag_run_id).first() +
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231028458 ## File path: airflow/contrib/operators/sagemaker_training_operator.py ## @@ -29,23 +29,26 @@ class SageMakerTrainingOperator(SageMakerBaseOperator): This operator returns The ARN of the training job created in Amazon SageMaker. -:param config: The configuration necessary to start a training job (templated) +:param config: The configuration necessary to start a training job (templated). + +For details of the configuration parameter, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job :type config: dict :param aws_conn_id: The AWS connection ID to use. :type aws_conn_id: str -:param wait_for_completion: if the operator should block until training job finishes +:param wait_for_completion: If wait is set to True, the time interval, in seconds, +that the operation waits to check the status of the training job. :type wait_for_completion: bool :param print_log: if the operator should print the cloudwatch log during training :type print_log: bool :param check_interval: if wait is set to be true, this is the time interval in seconds which the operator will check the status of the training job :type check_interval: int -:param max_ingestion_time: if wait is set to be true, the operator will fail -if the training job hasn't finish within the max_ingestion_time in seconds -(Caution: be careful to set this parameters because training can take very long) -Setting it to None implies no timeout. +:param max_ingestion_time: If wait is set to True, the operation fails if the training job +doesn't finish within max_ingestion_time seconds. If you set this parameter to None, +the operation does not timeout. :type max_ingestion_time: int -""" +""" # noqa Review comment: Just the external link is too long. I am not sure if there's a way to separate a long link to multiple lines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
codecov-io commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436165897 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=h1) Report > Merging [#2460](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/80a3d6ac78c5c13abb8826b9dcbe0529f60fed81?src=pr=desc) will **increase** coverage by `0.02%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/2460/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=tree) ```diff @@Coverage Diff @@ ## master#2460 +/- ## == + Coverage 76.67% 76.69% +0.02% == Files 199 199 Lines 1621216233 +21 == + Hits1243012450 +20 - Misses 3782 3783 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/2460/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.11% <100%> (+0.02%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=footer). Last update [80a3d6a...da9b738](https://codecov.io/gh/apache/incubator-airflow/pull/2460?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint
yangaws commented on a change in pull request #4126: [AIRFLOW-2524] More AWS SageMaker operators, sensors for model, endpoint-config and endpoint URL: https://github.com/apache/incubator-airflow/pull/4126#discussion_r231028384 ## File path: airflow/contrib/operators/sagemaker_endpoint_config_operator.py ## @@ -0,0 +1,67 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerEndpointConfigOperator(SageMakerBaseOperator): + +""" +Create a SageMaker endpoint config. + +This operator returns The ARN of the endpoint config created in Amazon SageMaker + +:param config: The configuration necessary to create an endpoint config. + +For details of the configuration parameter, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +""" # noqa Review comment: Just the external link is too long. I am not sure if there's a way to separate a long link to multiple lines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services