[jira] [Created] (AIRFLOW-3087) Task stuck in UP_FOR_RETRY and continuously showing Not In Retry Period
Chandu Kavar created AIRFLOW-3087: - Summary: Task stuck in UP_FOR_RETRY and continuously showing Not In Retry Period Key: AIRFLOW-3087 URL: https://issues.apache.org/jira/browse/AIRFLOW-3087 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.9.0 Reporter: Chandu Kavar Attachments: Screen Shot 2018-09-19 at 10.27.25 AM.png Hi, We are facing issues with "up_for_retry" of the task in few DAGs. When the task failed and scheduler picks up for "up_for_retry", it got stuck. In task instance details we see this log when the retry time appears: {{All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless: - The scheduler is down or under heavy load If this task instance does not start soon please contact your Airflow administrator for assistance.}} after retry delay again it shows (and it keep showing this log): {{Not In Retry Period Task is not ready for retry yet but will be retried automatically. Current date is 2018-08-29T15:xx: and task will be retrieve. }} After an hour task is able to retry. Code: {code:java} from datetime import * from airflow import DAG from airflow.operators.python_operator import PythonOperator from airflow.operators.bash_operator import BashOperator default_args = { 'owner': 'Chandu', 'depends_on_past': False, 'retries': 3, 'retry_delay': timedelta(minutes=1), 'queue': 'worker_test' } dag = DAG('airflow-examples.test_failed_dag_v3', description='Failed DAG', schedule_interval='*/10 * * * *', start_date=datetime(2018, 9, 7), default_args=default_args) b = BashOperator( task_id="ls_command", bash_command="mdr", dag=dag ) {code} Tree view of the DAG: !Screen Shot 2018-09-19 at 10.27.25 AM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] criccomini commented on issue #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI
criccomini commented on issue #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI URL: https://github.com/apache/incubator-airflow/pull/3916#issuecomment-422627689 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py.
codecov-io commented on issue #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py. URL: https://github.com/apache/incubator-airflow/pull/3917#issuecomment-422625841 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=h1) Report > Merging [#3917](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3917/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3917 +/- ## == - Coverage 77.55% 77.53% -0.02% == Files 198 198 Lines 1584715847 == - Hits1229012287 -3 - Misses 3557 3560 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3917/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.48% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=footer). Last update [2f50083...c249828](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py.
codecov-io commented on issue #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py. URL: https://github.com/apache/incubator-airflow/pull/3917#issuecomment-422625851 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=h1) Report > Merging [#3917](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3917/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3917 +/- ## == - Coverage 77.55% 77.53% -0.02% == Files 198 198 Lines 1584715847 == - Hits1229012287 -3 - Misses 3557 3560 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3917/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.48% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=footer). Last update [2f50083...c249828](https://codecov.io/gh/apache/incubator-airflow/pull/3917?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3086) Add extras group in setup.py for google oauth
[ https://issues.apache.org/jira/browse/AIRFLOW-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619966#comment-16619966 ] ASF GitHub Bot commented on AIRFLOW-3086: - jmcarp opened a new pull request #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py. URL: https://github.com/apache/incubator-airflow/pull/3917 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: To clarify installation instructions for the google auth backend, add an install group to `setup.py` that installs dependencies google auth via `pip install apache-airflow[google_auth]`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: This patch just adds an item to `setup.py`. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add extras group in setup.py for google oauth > - > > Key: AIRFLOW-3086 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3086 > Project: Apache Airflow > Issue Type: Bug >Reporter: Josh Carp >Priority: Major > > Since the google auth backend requires Flask-OAuthlib, it would be helpful to > add an extras group to setup.py for google auth that installs this dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp opened a new pull request #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py.
jmcarp opened a new pull request #3917: [AIRFLOW-3086] Add extras group for google auth to setup.py. URL: https://github.com/apache/incubator-airflow/pull/3917 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: To clarify installation instructions for the google auth backend, add an install group to `setup.py` that installs dependencies google auth via `pip install apache-airflow[google_auth]`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: This patch just adds an item to `setup.py`. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3086) Add extras group in setup.py for google oauth
Josh Carp created AIRFLOW-3086: -- Summary: Add extras group in setup.py for google oauth Key: AIRFLOW-3086 URL: https://issues.apache.org/jira/browse/AIRFLOW-3086 Project: Apache Airflow Issue Type: Bug Reporter: Josh Carp Since the google auth backend requires Flask-OAuthlib, it would be helpful to add an extras group to setup.py for google auth that installs this dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI
codecov-io commented on issue #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI URL: https://github.com/apache/incubator-airflow/pull/3916#issuecomment-422598348 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=h1) Report > Merging [#3916](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3916/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3916 +/- ## == - Coverage 77.55% 77.53% -0.02% == Files 198 198 Lines 1584715847 == - Hits1229012287 -3 - Misses 3557 3560 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/security.py](https://codecov.io/gh/apache/incubator-airflow/pull/3916/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9zZWN1cml0eS5weQ==) | `91.27% <ø> (ø)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3916/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.48% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=footer). Last update [2f50083...16d3c23](https://codecov.io/gh/apache/incubator-airflow/pull/3916?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3085) Log viewing not possible in default RBAC setting
[ https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619874#comment-16619874 ] ASF GitHub Bot commented on AIRFLOW-3085: - jgao54 opened a new pull request #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI URL: https://github.com/apache/incubator-airflow/pull/3916 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-AIRFLOW-3085 ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Tested via UI ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Log viewing not possible in default RBAC setting > > > Key: AIRFLOW-3085 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3085 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Priority: Major > > Aside from Admin role, all other roles are not able to view logs right now > due to a missing permission in the default setting. The permission should be > added to Viewer/User/Op as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jgao54 opened a new pull request #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI
jgao54 opened a new pull request #3916: [AIRFLOW-3085] Bug fix to allow log display in RBAC UI URL: https://github.com/apache/incubator-airflow/pull/3916 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-AIRFLOW-3085 ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Tested via UI ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3085) Log viewing not possible in default RBAC setting
Joy Gao created AIRFLOW-3085: Summary: Log viewing not possible in default RBAC setting Key: AIRFLOW-3085 URL: https://issues.apache.org/jira/browse/AIRFLOW-3085 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Aside from Admin role, all other roles are not able to view logs right now due to a missing permission in the default setting. The permission should be added to Viewer/User/Op as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs
codecov-io commented on issue #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs URL: https://github.com/apache/incubator-airflow/pull/3915#issuecomment-422586261 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=h1) Report > Merging [#3915](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3915/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3915 +/- ## === Coverage 77.55% 77.55% === Files 198 198 Lines 1584715847 === Hits1229012290 Misses 3557 3557 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=footer). Last update [2f50083...d200895](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs
codecov-io commented on issue #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs URL: https://github.com/apache/incubator-airflow/pull/3915#issuecomment-422586211 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=h1) Report > Merging [#3915](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3915/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3915 +/- ## === Coverage 77.55% 77.55% === Files 198 198 Lines 1584715847 === Hits1229012290 Misses 3557 3557 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=footer). Last update [2f50083...d200895](https://codecov.io/gh/apache/incubator-airflow/pull/3915?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sbilinski opened a new pull request #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs
sbilinski opened a new pull request #3915: [AIRFLOW-XXX] Fix SlackWebhookOperator docs URL: https://github.com/apache/incubator-airflow/pull/3915 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. -> ✅ ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: The docs refer to `conn_id` while the actual argument is `http_conn_id`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: No tests - documentation fix only. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3914: [AIRFLOW-3069] Log all output of the S3 file transform script
codecov-io commented on issue #3914: [AIRFLOW-3069] Log all output of the S3 file transform script URL: https://github.com/apache/incubator-airflow/pull/3914#issuecomment-422548293 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=h1) Report > Merging [#3914](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3914/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3914 +/- ## == - Coverage 77.55% 77.54% -0.02% == Files 198 198 Lines 1584715852 +5 == + Hits1229012292 +2 - Misses 3557 3560 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3914/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=) | `94.44% <100%> (+0.56%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3914/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.48% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=footer). Last update [2f50083...48d1d3b](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3914: [AIRFLOW-3069] Log all output of the S3 file transform script
codecov-io commented on issue #3914: [AIRFLOW-3069] Log all output of the S3 file transform script URL: https://github.com/apache/incubator-airflow/pull/3914#issuecomment-422548218 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=h1) Report > Merging [#3914](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/2f50083c8dfcd79ad569216a78b67f7568347628?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3914/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3914 +/- ## == - Coverage 77.55% 77.54% -0.02% == Files 198 198 Lines 1584715852 +5 == + Hits1229012292 +2 - Misses 3557 3560 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3914/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=) | `94.44% <100%> (+0.56%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3914/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.48% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=footer). Last update [2f50083...48d1d3b](https://codecov.io/gh/apache/incubator-airflow/pull/3914?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3084) Webserver Returns 404 When base_url is set via Environment
Toby Jennings created AIRFLOW-3084: -- Summary: Webserver Returns 404 When base_url is set via Environment Key: AIRFLOW-3084 URL: https://issues.apache.org/jira/browse/AIRFLOW-3084 Project: Apache Airflow Issue Type: Bug Components: webserver Affects Versions: 1.10.0 Environment: Docker Reporter: Toby Jennings Attempting to mount Airflow at a subpath beneath root (see AIRFLOW-1755). When Airflow is configured via the environment variable "AIRFLOW__WEBSERVER__BASE_URL" the web server returns a 404 for all paths. When "base_url" is set directly in airflow.cfg, the web server works as expected. Documentation suggests that using env to configure Airflow should be sufficient. Steps to reproduce: # Install Airflow. # Set AIRFLOW__WEBSERVER__BASE_URL to "http://localhost:8080/airflow; # Access Airflow at "/", "/airflow" or any other path. # Webserver returns 404 ("Apache Airflow is not at this location.") Workaround: # Install Airflow # Set "base_url" to "http://localhost:8080/airflow; in airflow.cfg # Access Airflow at "/airflow" # Webserver works as intended. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3083) Trigger Dag Returns Redirect to Incorrect Path when Airflow is mounted under root
Toby Jennings created AIRFLOW-3083: -- Summary: Trigger Dag Returns Redirect to Incorrect Path when Airflow is mounted under root Key: AIRFLOW-3083 URL: https://issues.apache.org/jira/browse/AIRFLOW-3083 Project: Apache Airflow Issue Type: Bug Components: webserver Affects Versions: 1.10.0 Reporter: Toby Jennings Steps to reproduce: # Configure Airflow 1.10.0 for operation mounted at a subpath under root by setting "base_url" to, e.g., "http://localhost:8080/airflow; # Use web server UI to trigger a dag run. # A 404 error is returned. This may be caused by several routes in www/views.py including /trigger/ returning a "redirect(origin)" where "origin" is set to "/admin/" instead of getting the appropriate subpath using url_for(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
codecov-io commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#issuecomment-422536512 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=h1) Report > Merging [#3596](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/0e5eee83b14b2a57345370b14e91404d518f0bf4?src=pr=desc) will **increase** coverage by `0.02%`. > The diff coverage is `82.4%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3596/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3596 +/- ## == + Coverage 77.52% 77.55% +0.02% == Files 198 199 +1 Lines 1584215963 +121 == + Hits1228212380 +98 - Misses 3560 3583 +23 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `89.01% <100%> (+0.2%)` | :arrow_up: | | [airflow/sensors/base\_sensor\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2Jhc2Vfc2Vuc29yX29wZXJhdG9yLnB5) | `97.87% <100%> (+1.2%)` | :arrow_up: | | [airflow/exceptions.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy9leGNlcHRpb25zLnB5) | `100% <100%> (ø)` | :arrow_up: | | [airflow/ti\_deps/dep\_context.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcF9jb250ZXh0LnB5) | `100% <100%> (ø)` | :arrow_up: | | [airflow/ti\_deps/deps/ready\_to\_reschedule.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvcmVhZHlfdG9fcmVzY2hlZHVsZS5weQ==) | `100% <100%> (ø)` | | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.04% <35.29%> (-0.59%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.85% <35.29%> (-0.46%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.74% <0%> (ø)` | :arrow_up: | | ... and [3 more](https://codecov.io/gh/apache/incubator-airflow/pull/3596/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=footer). Last update [0e5eee8...cdd8f89](https://codecov.io/gh/apache/incubator-airflow/pull/3596?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3069) Decode output of S3 file transform operator
[ https://issues.apache.org/jira/browse/AIRFLOW-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619625#comment-16619625 ] ASF GitHub Bot commented on AIRFLOW-3069: - sbilinski opened a new pull request #3914: [AIRFLOW-3069] Log all output of the S3 file transform script URL: https://github.com/apache/incubator-airflow/pull/3914 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - [https://jira.apache.org/jira/browse/AIRFLOW-3069](https://jira.apache.org/jira/browse/AIRFLOW-3069) ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: The output of the process spawned by `S3FileTransformOperator` is not properly decoded, which makes reading logs rather difficult. Additionally, the `stderr` stream is only shown when process exit code is not equal to `0`. I would like to propose the following changes to `S3FileTransformOperator`: - Send both output streams (stdout & stderr) to the logger, regardless of the outcome (if any data is present) - Decode the output, so that new lines can be displayed correctly. - Include process exit code in the exception message, if the process fails. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've added a separate case for testing `transform_script` with output present. Since logging is essential in this case, the test checks if a valid message was passed to the logging module. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Decode output of S3 file transform operator > --- > > Key: AIRFLOW-3069 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3069 > Project: Apache Airflow > Issue Type: Improvement > Components: aws >Affects Versions: 1.10.0 >Reporter: Szymon Bilinski >Assignee: Szymon Bilinski >Priority: Trivial > > h3. Current behaviour > {{S3FileTransformOperator}} logs {{stdout}} of the underlying process as such: > {code} > [2018-09-15 23:17:13,850] {{s3_file_transform_operator.py:122}} INFO - > Transform script stdout b'Copying /tmp/tmpd5rjo8g0 to > /tmp/tmpd3vkhzte\nDone\n' > {code} > While {{stderr}} is omitted entirely, unless exit code is not {{0}} (in this > case it's included in the exception message only). > h3. Proposed behaviour > 1. Both streams are logged, regardless of the underlying process outcome > (i.e. success or failure). > 2. Stream output is decoded before logging (e.g. {{\n}} is replaced with an > actual new line). > 3. If {{transform_script}} fails, the exception message contains return code > of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] sbilinski opened a new pull request #3914: [AIRFLOW-3069] Log all output of the S3 file transform script
sbilinski opened a new pull request #3914: [AIRFLOW-3069] Log all output of the S3 file transform script URL: https://github.com/apache/incubator-airflow/pull/3914 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - [https://jira.apache.org/jira/browse/AIRFLOW-3069](https://jira.apache.org/jira/browse/AIRFLOW-3069) ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: The output of the process spawned by `S3FileTransformOperator` is not properly decoded, which makes reading logs rather difficult. Additionally, the `stderr` stream is only shown when process exit code is not equal to `0`. I would like to propose the following changes to `S3FileTransformOperator`: - Send both output streams (stdout & stderr) to the logger, regardless of the outcome (if any data is present) - Decode the output, so that new lines can be displayed correctly. - Include process exit code in the exception message, if the process fails. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've added a separate case for testing `transform_script` with output present. Since logging is essential in this case, the test checks if a valid message was passed to the logging module. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218567791 ## File path: airflow/sensors/base_sensor_operator.py ## @@ -65,6 +89,11 @@ def poke(self, context): def execute(self, context): started_at = timezone.utcnow() +if self.reschedule: +# If reschedule, use first start date of current try +task_reschedules = TaskReschedule.find_for_task_instance(context['ti']) +if task_reschedules: +started_at = task_reschedules[0].start_date while not self.poke(context): if (timezone.utcnow() - started_at).total_seconds() > self.timeout: Review comment: Normally that should not be the case because `started_at` is always in the past. Of course clocks are never in sync (except at Google) so it may happen. But also in such a case `datetime.timedelta.total_seconds()` doesn't throw an exception but returns a negative number which is still smaller than the configured timeout (if the user configured a positive timeout, lol) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218564723 ## File path: airflow/models.py ## @@ -56,8 +56,8 @@ from sqlalchemy import ( Column, Integer, String, DateTime, Text, Boolean, ForeignKey, PickleType, -Index, Float, LargeBinary, UniqueConstraint) -from sqlalchemy import func, or_, and_, true as sqltrue +Index, Float, LargeBinary, UniqueConstraint, ForeignKeyConstraint) +from sqlalchemy import func, or_, and_, true as sqltrue, asc Review comment: I changed it like this, let me know what you think. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218564370 ## File path: airflow/sensors/base_sensor_operator.py ## @@ -75,11 +104,24 @@ def execute(self, context): raise AirflowSkipException('Snap. Time is OUT.') else: raise AirflowSensorTimeout('Snap. Time is OUT.') -sleep(self.poke_interval) +if self.reschedule: +reschedule_date = timezone.utcnow() + timedelta( +seconds=self.poke_interval) +raise AirflowRescheduleException(reschedule_date) +else: +sleep(self.poke_interval) self.log.info("Success criteria met. Exiting.") def _do_skip_downstream_tasks(self, context): downstream_tasks = context['task'].get_flat_relatives(upstream=False) self.log.debug("Downstream task_ids %s", downstream_tasks) if downstream_tasks: self.skip(context['dag_run'], context['ti'].execution_date, downstream_tasks) + +@property +def reschedule(self): +return self.mode == 'reschedule' + +@property +def deps(self): Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218564334 ## File path: airflow/ti_deps/deps/ready_to_reschedule.py ## @@ -0,0 +1,62 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.ti_deps.deps.base_ti_dep import BaseTIDep +from airflow.utils import timezone +from airflow.utils.db import provide_session +from airflow.utils.state import State + + +class ReadyToRescheduleDep(BaseTIDep): +NAME = "Ready To Reschedule" +IGNOREABLE = True +IS_TASK_DEP = True + +@provide_session +def _get_dep_statuses(self, ti, session, dep_context): Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218563478 ## File path: airflow/sensors/base_sensor_operator.py ## @@ -75,11 +104,24 @@ def execute(self, context): raise AirflowSkipException('Snap. Time is OUT.') else: raise AirflowSensorTimeout('Snap. Time is OUT.') -sleep(self.poke_interval) +if self.reschedule: +reschedule_date = timezone.utcnow() + timedelta( Review comment: No. In "reschedule" mode `started_at` is always set to the initial schedule time (when the task instance was scheduled the first time). `started_at` is only used to determine if timeout is reached. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3082) Task Status lags behind actual status in DAG: Tree View
Damon Cool created AIRFLOW-3082: --- Summary: Task Status lags behind actual status in DAG: Tree View Key: AIRFLOW-3082 URL: https://issues.apache.org/jira/browse/AIRFLOW-3082 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.0 Reporter: Damon Cool Since upgrading to 1.10.0 (from 1.9) I have noticed that the tasks don't show the current status. I noticed this by checking the logs, tasks that have completed according to the logs are still showing a clear status (not even in running state). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-26) GCP hook naming alignment
[ https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618958#comment-16618958 ] Kaxil Naik edited comment on AIRFLOW-26 at 9/18/18 11:33 AM: - Currently, we are also proposing to move the contrib hooks/and operators in a separate repository. So not sure what is the best naming convention. [~fenglu] Any thoughts? `gcp_service` seems logical to me. was (Author: kaxilnaik): Currently, we are also proposing to move the contrib hooks/and operators in a separate repository. So not sure what is the best naming convention. [~fenglu] Any thoughts? But yes the `gcp_service` seems logical to me. > GCP hook naming alignment > - > > Key: AIRFLOW-26 > URL: https://issues.apache.org/jira/browse/AIRFLOW-26 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Labels: gcp > > Because we have quite a few GCP services, it's better to align the naming to > not confuse new users using Google Cloud Platform: > gcp_storage > renamed from gcs > gcp_bigquery > renamed from bigquery > gcp_datastore > rename from datastore > gcp_dataflow > TBD > gcp_dataproc > TBD > gcp_bigtable > TBD > Note: this could break 'custom' operators if they use the hooks. > Can be assigned to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-26) GCP hook naming alignment
[ https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618965#comment-16618965 ] Ash Berlin-Taylor commented on AIRFLOW-26: -- bq. Note: this could break 'custom' operators if they use the hooks. You can create an import shim (a mostly empty python module) that issues a deprecation warning and makes the old names available. > GCP hook naming alignment > - > > Key: AIRFLOW-26 > URL: https://issues.apache.org/jira/browse/AIRFLOW-26 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Labels: gcp > > Because we have quite a few GCP services, it's better to align the naming to > not confuse new users using Google Cloud Platform: > gcp_storage > renamed from gcs > gcp_bigquery > renamed from bigquery > gcp_datastore > rename from datastore > gcp_dataflow > TBD > gcp_dataproc > TBD > gcp_bigtable > TBD > Note: this could break 'custom' operators if they use the hooks. > Can be assigned to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-26) GCP hook naming alignment
[ https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618958#comment-16618958 ] Kaxil Naik edited comment on AIRFLOW-26 at 9/18/18 11:23 AM: - Currently, we are also proposing to move the contrib hooks/and operators in a separate repository. So not sure what is the best naming convention. [~fenglu] Any thoughts? But yes the `gcp_service` seems logical to me. was (Author: kaxilnaik): Currently, we are also proposing to move the contrib hooks/and operators in a separate repository. So not sure what is the best naming convention. [~fenglu] Any thoughts? > GCP hook naming alignment > - > > Key: AIRFLOW-26 > URL: https://issues.apache.org/jira/browse/AIRFLOW-26 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Labels: gcp > > Because we have quite a few GCP services, it's better to align the naming to > not confuse new users using Google Cloud Platform: > gcp_storage > renamed from gcs > gcp_bigquery > renamed from bigquery > gcp_datastore > rename from datastore > gcp_dataflow > TBD > gcp_dataproc > TBD > gcp_bigtable > TBD > Note: this could break 'custom' operators if they use the hooks. > Can be assigned to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-26) GCP hook naming alignment
[ https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618958#comment-16618958 ] Kaxil Naik commented on AIRFLOW-26: --- Currently, we are also proposing to move the contrib hooks/and operators in a separate repository. So not sure what is the best naming convention. [~fenglu] Any thoughts? > GCP hook naming alignment > - > > Key: AIRFLOW-26 > URL: https://issues.apache.org/jira/browse/AIRFLOW-26 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Labels: gcp > > Because we have quite a few GCP services, it's better to align the naming to > not confuse new users using Google Cloud Platform: > gcp_storage > renamed from gcs > gcp_bigquery > renamed from bigquery > gcp_datastore > rename from datastore > gcp_dataflow > TBD > gcp_dataproc > TBD > gcp_bigtable > TBD > Note: this could break 'custom' operators if they use the hooks. > Can be assigned to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3078) Basic operators for Google Compute Engine
[ https://issues.apache.org/jira/browse/AIRFLOW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618954#comment-16618954 ] Kaxil Naik commented on AIRFLOW-3078: - Hey [~higrys], Not sure if Feng Lu would talk to me about this but I am happy for you to work on this task as you seem to already have a design. I am definitely happy to review it and have requested access on the same. Thanks. > Basic operators for Google Compute Engine > - > > Key: AIRFLOW-3078 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3078 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Trivial > > In order to be able to interact with raw Google Compute Engine, we need an > operator that should be able to: > For managing individual machines: > * Start Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/start]) > * Set Machine Type > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/setMachineType]) > > * Stop Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop]) > Also we should be able to manipulate instance groups: > * Get instance group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/get]) > * Insert Group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/insert]) > * Update Group: > ([https://cloud.google.com/compute/docs/reference/rest/beta/instanceGroupManagers/update]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3081) Support automated integration tests in Travis CI
[ https://issues.apache.org/jira/browse/AIRFLOW-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618930#comment-16618930 ] Jarek Potiuk commented on AIRFLOW-3081: --- [~ashb]. Sure, I will convert the doc to Proposal :) Re: thought about who pays - I think we could approach it in two steps (that's part of our proposal as well): Typically when you have a fork of Airflow you setup your own Travis CI project (we have one) and for our fork we could use our own GCP project (that is our plan). Then the integration tests for our changes will be run on our project. We could run those tests conditionally (only when credentials are passed via Travis environment variables) - so then in the main project we would not run them, until the second stage - where the community would agree on some kind of sponsorship for the project and implement some security measures. For now I would like to experiment with it on our environment only (it's very useful for us to make sure that the operators actually work) and if we see how it works and see that it works fine we can think about next steps. In our case we base the integration tests on the example dags we provide for the operators, which is nice because then we have all that nicely linked (and proven to work!) -> examples, user documentation, integration test. > Support automated integration tests in Travis CI > > > Key: AIRFLOW-3081 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3081 > Project: Apache Airflow > Issue Type: New Feature > Components: ci >Reporter: Jarek Potiuk >Priority: Minor > > I think it would be great to have a way to run integration tests > automatically for some of the operators. We've started to work on some GCP > operators (Cloud Functions is the first one). We have a proposal on how Cloud > Functions (and later other GCP operators) could have integration tests that > could run on GCP infrastructure. Here is the link to the proposal Doc > [https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit|https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit?usp=sharing] > Maybe it's a good time to start discussion on that :). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-26) GCP hook naming alignment
[ https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618921#comment-16618921 ] Jarek Potiuk commented on AIRFLOW-26: - We are working on implementation of Cloud Functions operators (AIRFLOW-2912) as well as some basic Compute Engine (AIRFLOW-3078) very soon and it would be great to align with the "future" consistent way. Currently we have separate python file for separate operators (Delete/Deploy + Invoke in the future). From what I understand from this issue and AIRFLOW-2056, preferred way is to put together related "GC*" operators into single file (in our case they should be named gcp_functions and gcp_compute respectively). Is my understanding correct ? > GCP hook naming alignment > - > > Key: AIRFLOW-26 > URL: https://issues.apache.org/jira/browse/AIRFLOW-26 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Labels: gcp > > Because we have quite a few GCP services, it's better to align the naming to > not confuse new users using Google Cloud Platform: > gcp_storage > renamed from gcs > gcp_bigquery > renamed from bigquery > gcp_datastore > rename from datastore > gcp_dataflow > TBD > gcp_dataproc > TBD > gcp_bigtable > TBD > Note: this could break 'custom' operators if they use the hooks. > Can be assigned to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3078) Basic operators for Google Compute Engine
[ https://issues.apache.org/jira/browse/AIRFLOW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618913#comment-16618913 ] Jarek Potiuk commented on AIRFLOW-3078: --- ([~kaxilnaik] - I hope it's ok that I assigned the issue to myself :). Please let me know if you have any issue with it) > Basic operators for Google Compute Engine > - > > Key: AIRFLOW-3078 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3078 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Trivial > > In order to be able to interact with raw Google Compute Engine, we need an > operator that should be able to: > For managing individual machines: > * Start Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/start]) > * Set Machine Type > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/setMachineType]) > > * Stop Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop]) > Also we should be able to manipulate instance groups: > * Get instance group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/get]) > * Insert Group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/insert]) > * Update Group: > ([https://cloud.google.com/compute/docs/reference/rest/beta/instanceGroupManagers/update]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3078) Basic operators for Google Compute Engine
[ https://issues.apache.org/jira/browse/AIRFLOW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk reassigned AIRFLOW-3078: - Assignee: Jarek Potiuk (was: Kaxil Naik) > Basic operators for Google Compute Engine > - > > Key: AIRFLOW-3078 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3078 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Trivial > > In order to be able to interact with raw Google Compute Engine, we need an > operator that should be able to: > For managing individual machines: > * Start Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/start]) > * Set Machine Type > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/setMachineType]) > > * Stop Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop]) > Also we should be able to manipulate instance groups: > * Get instance group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/get]) > * Insert Group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/insert]) > * Update Group: > ([https://cloud.google.com/compute/docs/reference/rest/beta/instanceGroupManagers/update]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3078) Basic operators for Google Compute Engine
[ https://issues.apache.org/jira/browse/AIRFLOW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618909#comment-16618909 ] Jarek Potiuk commented on AIRFLOW-3078: --- Hello [~kaxilnaik] - I know [~fenglu] will talk to you about it, but we are currently working on implementation of those basic operators. We even have a draft design doc that explains what we are planning to do in [Airflow "Compute Engine" operators|https://docs.google.com/document/d/17cjZeu4ov_ZrVH3qCa-g8olW4DjRi-YG83Z-WjnbXhk/edit#heading=h.6w8wok2now8f] - so maybe instead of implementing it, we can involve you in reviewing (both doc and implementation). I am happy to collaborate on it (I am starting to work on it this week). > Basic operators for Google Compute Engine > - > > Key: AIRFLOW-3078 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3078 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp >Reporter: Jarek Potiuk >Assignee: Kaxil Naik >Priority: Trivial > > In order to be able to interact with raw Google Compute Engine, we need an > operator that should be able to: > For managing individual machines: > * Start Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/start]) > * Set Machine Type > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/setMachineType]) > > * Stop Instance: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop]) > Also we should be able to manipulate instance groups: > * Get instance group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/get]) > * Insert Group: > ([https://cloud.google.com/compute/docs/reference/rest/v1/instanceGroupManagers/insert]) > * Update Group: > ([https://cloud.google.com/compute/docs/reference/rest/beta/instanceGroupManagers/update]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2912) Add operators for Google Cloud Functions
[ https://issues.apache.org/jira/browse/AIRFLOW-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618894#comment-16618894 ] Jarek Potiuk commented on AIRFLOW-2912: --- We have a design doc describing proposed architecture and properties of the implementation. We think the most important operators are: * *GCFFunctionDelete* - deletes an existing function, specified by name. Succeeds when there is no function to delete with the name specified (it is idempotent) * *GCFFunctionDeploy* - creates new or updates existing function. * *GCFFunctionInvoke* - invokes existing function (Note that it’s being implemented now and not supposed to - will be available when API controlling access to Invoke API calls will be published) The document is here: https://docs.google.com/document/d/1Wj46--jco47Ju-5-OuSxG3RZj6qvfRUscn62VEBXJAc/edit#heading=h.b48kcm9s7ymv > Add operators for Google Cloud Functions > > > Key: AIRFLOW-2912 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2912 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Dariusz Aniszewski >Assignee: Jarek Potiuk >Priority: Major > > It would be nice to be able to create, delete and call Cloud Functions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3081) Support automated integration tests in Travis CI
[ https://issues.apache.org/jira/browse/AIRFLOW-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618890#comment-16618890 ] Ash Berlin-Taylor commented on AIRFLOW-3081: Could you create an AIP (https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals) for this, rather than in a google doc please, and then email the dev@ mailing list to start a discussion about it. First thought: who will pay for the GCP costs? For many of the AWS operators we mock the AWS calls (using the "moto" python library - "mock boto") which while not perfect doesn't incur any extra cost, and is usually quicker to boot. > Support automated integration tests in Travis CI > > > Key: AIRFLOW-3081 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3081 > Project: Apache Airflow > Issue Type: New Feature > Components: ci >Reporter: Jarek Potiuk >Priority: Minor > > I think it would be great to have a way to run integration tests > automatically for some of the operators. We've started to work on some GCP > operators (Cloud Functions is the first one). We have a proposal on how Cloud > Functions (and later other GCP operators) could have integration tests that > could run on GCP infrastructure. Here is the link to the proposal Doc > [https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit|https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit?usp=sharing] > Maybe it's a good time to start discussion on that :). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2912) Add operators for Google Cloud Functions
[ https://issues.apache.org/jira/browse/AIRFLOW-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk reassigned AIRFLOW-2912: - Assignee: Jarek Potiuk > Add operators for Google Cloud Functions > > > Key: AIRFLOW-2912 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2912 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Dariusz Aniszewski >Assignee: Jarek Potiuk >Priority: Major > > It would be nice to be able to create, delete and call Cloud Functions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3081) Support automated integration tests in Travis CI
Jarek Potiuk created AIRFLOW-3081: - Summary: Support automated integration tests in Travis CI Key: AIRFLOW-3081 URL: https://issues.apache.org/jira/browse/AIRFLOW-3081 Project: Apache Airflow Issue Type: New Feature Components: ci Reporter: Jarek Potiuk I think it would be great to have a way to run integration tests automatically for some of the operators. We've started to work on some GCP operators (Cloud Functions is the first one). We have a proposal on how Cloud Functions (and later other GCP operators) could have integration tests that could run on GCP infrastructure. Here is the link to the proposal Doc [https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit|https://docs.google.com/document/d/1-763cYrOs37Sj77RzSQP5hy1GSvZ7I7MPOOG2Q86Osc/edit?usp=sharing] Maybe it's a good time to start discussion on that :). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG commented on issue #3693: [AIRFLOW-2848] Ensure dag_id in metadata "job" for LocalTaskJob
XD-DENG commented on issue #3693: [AIRFLOW-2848] Ensure dag_id in metadata "job" for LocalTaskJob URL: https://github.com/apache/incubator-airflow/pull/3693#issuecomment-422314417 Thanks @ashb . It was me who changed the fix version to 1.10.1 in JIRA, as you suggested in an mail in the emailist. I have changed a few other tickets to 1.10.1 as well. All of them are sort of bug fix or enhancement. Please check https://issues.apache.org/jira/browse/AIRFLOW-2855?filter=-1=resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%201.10.1%20AND%20assignee%20in%20(XD-DENG)%20order%20by%20updated%20DESC Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3913: [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role
ashb commented on issue #3913: [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role URL: https://github.com/apache/incubator-airflow/pull/3913#issuecomment-422314238 I'm not sure I like this as a default permisson. On the one hand needing admin to view logs is wrong, but conversely there could be passwords or other sensitive info in the logs, so maybe just "viewer" shouldn't have access to the logs? Not sure basically. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3693: [AIRFLOW-2848] Ensure dag_id in metadata "job" for LocalTaskJob
ashb commented on issue #3693: [AIRFLOW-2848] Ensure dag_id in metadata "job" for LocalTaskJob URL: https://github.com/apache/incubator-airflow/pull/3693#issuecomment-422311835 @XD-DENG it's marked for 1.10.1 so will be included, yup :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-307) Code cleanup. There is no __neq__ python magic method.
[ https://issues.apache.org/jira/browse/AIRFLOW-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-307. --- Resolution: Fixed Fix Version/s: 1.8.0 > Code cleanup. There is no __neq__ python magic method. > -- > > Key: AIRFLOW-307 > URL: https://issues.apache.org/jira/browse/AIRFLOW-307 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.7.1, 1.7.1.3, 1.8.0 >Reporter: Oleksandr Vilchynskyy >Assignee: Oleksandr Vilchynskyy >Priority: Minor > Fix For: 1.8.0 > > > There is small mistype in class BaseOperator(object) which is decorated > with functools.total_ordering: > def __neq__was used instead of __ne__, which breaks logic of later > objects comparison. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-407) Sensors all have the same ui color, making them hard to distinguish on the web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-407. --- Resolution: Fixed Fix Version/s: 1.8.0 > Sensors all have the same ui color, making them hard to distinguish on the > web UI > - > > Key: AIRFLOW-407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-407 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Li Xuanji >Assignee: Li Xuanji >Priority: Minor > Fix For: 1.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-298) incubator disclaimer isn't proper on documentation website
[ https://issues.apache.org/jira/browse/AIRFLOW-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-298. --- Resolution: Fixed > incubator disclaimer isn't proper on documentation website > -- > > Key: AIRFLOW-298 > URL: https://issues.apache.org/jira/browse/AIRFLOW-298 > Project: Apache Airflow > Issue Type: Bug >Reporter: Maxime Beauchemin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-443) Code from DAGs with same __name__ show up on each other's code view in the web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-443. --- Resolution: Fixed > Code from DAGs with same __name__ show up on each other's code view in the > web UI > - > > Key: AIRFLOW-443 > URL: https://issues.apache.org/jira/browse/AIRFLOW-443 > Project: Apache Airflow > Issue Type: Bug >Reporter: Li Xuanji >Assignee: Bolke de Bruin >Priority: Major > > With a dags folder containing 2 files, `bash_bash_bash/dag.py` and > `bash_bash_bash_2/dag.py`, with the following contents > bash_bash_bash/dag.py > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > from datetime import datetime, timedelta > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 1, 1, 1, 0), > 'email': ['xua...@gmail.com'], > 'email_on_failure': True, > 'email_on_retry': False, > 'retries': 3, > 'retry_delay': timedelta(minutes=1), > 'concurrency': 1, > } > dag = DAG('bash_bash_bash', default_args=default_args, > schedule_interval=timedelta(seconds=10)) > # t1, t2 and t3 are examples of tasks created by instatiating operators > t1 = BashOperator( > task_id='print_date', > bash_command='date', > dag=dag > ) > t2 = BashOperator( > task_id='sleep', > bash_command='sleep 1', > retries=3, > dag=dag > ) > templated_command = """ > {% for i in range(5) %} > echo "{{ ds }}" > echo "{{ macros.ds_add(ds, 7)}}" > echo "{{ params.my_param }}" > {% endfor %} > """ > t3 = BashOperator( > task_id='templated', > bash_command=templated_command, > params={'my_param': 'Parameter I passed in'}, > dag=dag > ) > t2.set_upstream(t1) > t3.set_upstream(t1) > ``` > bash_bash_bash_2/dag.py > ``` > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > from datetime import datetime, timedelta > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 1, 1, 1, 0), > 'email': ['xua...@gmail.com'], > 'email_on_failure': True, > 'email_on_retry': False, > 'retries': 3, > 'retry_delay': timedelta(minutes=1), > 'concurrency': 1, > } > dag = DAG('bash_bash_bash_2', default_args=default_args, > schedule_interval=timedelta(seconds=10)) > t1 = BashOperator( > task_id='print_date', > bash_command='date', > dag=dag > ) > ``` > The code view in the web UI shows the contents of bash_bash_bash_2/dag.py > for both DAGs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-306) Spark-sql hook and operator required
[ https://issues.apache.org/jira/browse/AIRFLOW-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-306. --- Resolution: Fixed > Spark-sql hook and operator required > > > Key: AIRFLOW-306 > URL: https://issues.apache.org/jira/browse/AIRFLOW-306 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, hooks, operators >Reporter: Daniel van der Ende >Assignee: Daniel van der Ende >Priority: Minor > > It would be nice to have a Spark-sql hook and operator for Spark-sql which > can execute Spark-sql queries (instead of having to run them via the > Bash-operator). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-233) Detached DagRun error in scheduler loop
[ https://issues.apache.org/jira/browse/AIRFLOW-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-233. --- Resolution: Fixed > Detached DagRun error in scheduler loop > --- > > Key: AIRFLOW-233 > URL: https://issues.apache.org/jira/browse/AIRFLOW-233 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler > Environment: Airflow master (git log below), Postgres backend, > LocalExecutor > {code} > b7def7f1f9a97d584e9076cdad48287e652a2d41 [AIRFLOW-142] setup_env.sh doesn't > download hive tarball if hdp is specified as distro > 0bd5515a42f7912b0d4ac8bf33dec2f01539b555 [AIRFLOW-218] Added option to enable > webserver gunicorn access/err logs > 80210b2bd768668e55e498995a3820900d9119ba Merge pull request #1569 from > mistercrunch/docs > {code} >Reporter: Jeremiah Lowin >Assignee: Bolke de Bruin >Priority: Major > > Running Airflow master, every scheduler loop has at least one detached DagRun > error. This is the output: > {code} > [2016-06-10 09:41:54,772] {jobs.py:669} ERROR - Instance 0x10ab80dd8> is not bound to a Session; attribute refresh operation cannot > proceed > Traceback (most recent call last): > File "/Users/jlowin/git/airflow/airflow/jobs.py", line 666, in _do_dags > self.process_dag(dag, tis_out) > File "/Users/jlowin/git/airflow/airflow/jobs.py", line 524, in process_dag > State.UP_FOR_RETRY)) > File "/Users/jlowin/git/airflow/airflow/utils/db.py", line 53, in wrapper > result = func(*args, **kwargs) > File "/Users/jlowin/git/airflow/airflow/models.py", line 3387, in > get_task_instances > TI.dag_id == self.dag_id, > File > "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File > "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", > line 578, in get > value = state._load_expired(state, passive) > File > "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/state.py", > line 474, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File > "/Users/jlowin/anaconda3/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", > line 610, in load_scalar_attributes > (state_str(state))) > sqlalchemy.orm.exc.DetachedInstanceError: Instance is > not bound to a Session; attribute refresh operation cannot proceed > {code} > This is the test DAG in question: > {code} > from airflow import DAG > from airflow.operators import PythonOperator > from datetime import datetime > import logging > import time > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 4, 24), > } > dag_name = 'dp_test' > dag = DAG( > dag_name, > default_args=default_args, > schedule_interval='*/2 * * * *') > def cb(**kw): > time.sleep(2) > logging.info('Done %s' % kw['ds']) > d = PythonOperator(task_id="delay", provide_context=True, python_callable=cb, > dag=dag) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-249) Refactor the SLA mechanism
[ https://issues.apache.org/jira/browse/AIRFLOW-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand reassigned AIRFLOW-249: --- Assignee: (was: dud) > Refactor the SLA mechanism > -- > > Key: AIRFLOW-249 > URL: https://issues.apache.org/jira/browse/AIRFLOW-249 > Project: Apache Airflow > Issue Type: Improvement >Reporter: dud >Priority: Major > > Hello > I've noticed the SLA feature is currently behaving as follow : > - it doesn't work on DAG scheduled @once or None because they have no > dag.followwing_schedule property > - it keeps endlessly checking for SLA misses without ever worrying about any > end_date. Worse I noticed that emails are still being sent for runs that are > never happening because of end_date > - it keeps checking for recent TIs even if SLA notification has been already > been sent for them > - the SLA logic is only being fired after following_schedule + sla has > elapsed, in other words one has to wait for the next TI before having a > chance of getting any email. Also the email reports dag.following_schedule > time (I guess because it is close of TI.start_date), but unfortunately that > doesn't match what the task instances shows nor the log filename > - the SLA logic is based on max(TI.execution_date) for the starting point of > its checks, that means that for a DAG whose SLA is longer than its schedule > period if half of the TIs are running longer than expected it will go > unnoticed. This could be demonstrated with a DAG like this one : > {code} > from airflow import DAG > from airflow.operators import * > from datetime import datetime, timedelta > from time import sleep > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 6, 16, 12, 20), > 'email': my_email > 'sla': timedelta(minutes=2), > } > dag = DAG('unnoticed_sla', default_args=default_args, > schedule_interval=timedelta(minutes=1)) > def alternating_sleep(**kwargs): > minute = kwargs['execution_date'].strftime("%M") > is_odd = int(minute) % 2 > if is_odd: > sleep(300) > else: > sleep(10) > return True > PythonOperator( > task_id='sla_miss', > python_callable=alternating_sleep, > provide_context=True, > dag=dag) > {code} > I've tried to rework the SLA triggering mechanism by addressing the above > points., please [have a look on > it|https://github.com/dud225/incubator-airflow/commit/972260354075683a8d55a1c960d839c37e629e7d] > I made some tests with this patch : > - the fluctuent DAG shown above no longer make Airflow skip any SLA event : > {code} > task_id |dag_id | execution_date| email_sent | > timestamp | description | notification_sent > --+---+-+++-+--- > sla_miss | dag_sla_miss1 | 2016-06-16 15:05:00 | t | 2016-06-16 > 15:08:26.058631 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:07:00 | t | 2016-06-16 > 15:10:06.093253 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:09:00 | t | 2016-06-16 > 15:12:06.241773 | | t > {code} > - on a normal DAG, the SLA is being triggred more quickly : > {code} > // start_date = 2016-06-16 15:55:00 > // end_date = 2016-06-16 16:00:00 > // schedule_interval = timedelta(minutes=1) > // sla = timedelta(minutes=2) > task_id |dag_id | execution_date| email_sent | > timestamp | description | notification_sent > --+---+-+++-+--- > sla_miss | dag_sla_miss1 | 2016-06-16 15:55:00 | t | 2016-06-16 > 15:58:11.832299 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:56:00 | t | 2016-06-16 > 15:59:09.663778 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:57:00 | t | 2016-06-16 > 16:00:13.651422 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:58:00 | t | 2016-06-16 > 16:01:08.576399 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 15:59:00 | t | 2016-06-16 > 16:02:08.523486 | | t > sla_miss | dag_sla_miss1 | 2016-06-16 16:00:00 | t | 2016-06-16 > 16:03:08.538593 | | t > (6 rows) > {code} > than before (current master branch) : > {code} > // start_date = 2016-06-16 15:40:00 > // end_date = 2016-06-16 15:45:00 > // schedule_interval = timedelta(minutes=1) > // sla = timedelta(minutes=2) > task_id |dag_id | execution_date| email_sent | > timestamp | description | notification_sent >
[GitHub] r39132 closed pull request #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver
r39132 closed pull request #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver URL: https://github.com/apache/incubator-airflow/pull/1869 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 21e1d23878..8fda8f5dc1 100755 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -45,7 +45,7 @@ from airflow import api from airflow import jobs, settings from airflow import configuration as conf -from airflow.exceptions import AirflowException +from airflow.exceptions import AirflowException, AirflowConfigException from airflow.executors import DEFAULT_EXECUTOR from airflow.models import (DagModel, DagBag, TaskInstance, DagPickle, DagRun, Variable, DagStat, @@ -699,6 +699,11 @@ def webserver(args): if ssl_cert and not ssl_key: raise AirflowException( 'An SSL key must also be provided for use with ' + ssl_cert) +try: +forwarded_allow_ips = (args.forwarded_allow_ips or + conf.get('webserver', 'forwarded_allow_ips')) +except AirflowConfigException: +forwarded_allow_ips = None if args.debug: print( @@ -740,6 +745,9 @@ def webserver(args): if ssl_cert: run_args += ['--certfile', ssl_cert, '--keyfile', ssl_key] +if forwarded_allow_ips: +run_args += ['--forwarded-allow-ips', forwarded_allow_ips] + run_args += ["airflow.www.app:cached_app()"] gunicorn_master_proc = subprocess.Popen(run_args) @@ -1294,6 +1302,10 @@ class CLIFactory(object): default=conf.get('webserver', 'ERROR_LOGFILE'), help="The logfile to store the webserver error log. Use '-' to print to " "stderr."), +'forwarded_allow_ips': Arg( +("--forwarded_allow_ips", ), +default=None, +help="Pass gunicorn front-end IPs allowed to handle set secure headers."), # resetdb 'yes': Arg( ("-y", "--yes"), @@ -1469,7 +1481,8 @@ class CLIFactory(object): 'help': "Start a Airflow webserver instance", 'args': ('port', 'workers', 'workerclass', 'worker_timeout', 'hostname', 'pid', 'daemon', 'stdout', 'stderr', 'access_logfile', - 'error_logfile', 'log_file', 'ssl_cert', 'ssl_key', 'debug'), + 'error_logfile', 'log_file', 'ssl_cert', 'ssl_key', + 'forwarded_allow_ips', 'debug'), }, { 'func': resetdb, 'help': "Burn down and rebuild the metadata database", diff --git a/airflow/configuration.py b/airflow/configuration.py index 265f7289ea..a86f629493 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -211,6 +211,12 @@ def run_command(command): web_server_ssl_cert = web_server_ssl_key = +# Pass gunicorn front-end IPs allowed to handle set secure headers. +# Multiple IPs should be comma separated. Set to * to disable checking. +# Useful if you are running gunicorn behind a load balancer. +# See http://docs.gunicorn.org/en/stable/settings.html#forwarded-allow-ips +# forwarded_allow_ips = * + # Number of seconds the gunicorn webserver waits before timing out on a worker web_server_worker_timeout = 120 @@ -454,6 +460,7 @@ def run_command(command): dag_orientation = LR log_fetch_timeout_sec = 5 hide_paused_dags_by_default = False +forwarded_allow_ips = * [email] email_backend = airflow.utils.email.send_email_smtp This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-571) allow gunicorn config to be passed to airflow webserver
[ https://issues.apache.org/jira/browse/AIRFLOW-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618620#comment-16618620 ] ASF GitHub Bot commented on AIRFLOW-571: r39132 closed pull request #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver URL: https://github.com/apache/incubator-airflow/pull/1869 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 21e1d23878..8fda8f5dc1 100755 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -45,7 +45,7 @@ from airflow import api from airflow import jobs, settings from airflow import configuration as conf -from airflow.exceptions import AirflowException +from airflow.exceptions import AirflowException, AirflowConfigException from airflow.executors import DEFAULT_EXECUTOR from airflow.models import (DagModel, DagBag, TaskInstance, DagPickle, DagRun, Variable, DagStat, @@ -699,6 +699,11 @@ def webserver(args): if ssl_cert and not ssl_key: raise AirflowException( 'An SSL key must also be provided for use with ' + ssl_cert) +try: +forwarded_allow_ips = (args.forwarded_allow_ips or + conf.get('webserver', 'forwarded_allow_ips')) +except AirflowConfigException: +forwarded_allow_ips = None if args.debug: print( @@ -740,6 +745,9 @@ def webserver(args): if ssl_cert: run_args += ['--certfile', ssl_cert, '--keyfile', ssl_key] +if forwarded_allow_ips: +run_args += ['--forwarded-allow-ips', forwarded_allow_ips] + run_args += ["airflow.www.app:cached_app()"] gunicorn_master_proc = subprocess.Popen(run_args) @@ -1294,6 +1302,10 @@ class CLIFactory(object): default=conf.get('webserver', 'ERROR_LOGFILE'), help="The logfile to store the webserver error log. Use '-' to print to " "stderr."), +'forwarded_allow_ips': Arg( +("--forwarded_allow_ips", ), +default=None, +help="Pass gunicorn front-end IPs allowed to handle set secure headers."), # resetdb 'yes': Arg( ("-y", "--yes"), @@ -1469,7 +1481,8 @@ class CLIFactory(object): 'help': "Start a Airflow webserver instance", 'args': ('port', 'workers', 'workerclass', 'worker_timeout', 'hostname', 'pid', 'daemon', 'stdout', 'stderr', 'access_logfile', - 'error_logfile', 'log_file', 'ssl_cert', 'ssl_key', 'debug'), + 'error_logfile', 'log_file', 'ssl_cert', 'ssl_key', + 'forwarded_allow_ips', 'debug'), }, { 'func': resetdb, 'help': "Burn down and rebuild the metadata database", diff --git a/airflow/configuration.py b/airflow/configuration.py index 265f7289ea..a86f629493 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -211,6 +211,12 @@ def run_command(command): web_server_ssl_cert = web_server_ssl_key = +# Pass gunicorn front-end IPs allowed to handle set secure headers. +# Multiple IPs should be comma separated. Set to * to disable checking. +# Useful if you are running gunicorn behind a load balancer. +# See http://docs.gunicorn.org/en/stable/settings.html#forwarded-allow-ips +# forwarded_allow_ips = * + # Number of seconds the gunicorn webserver waits before timing out on a worker web_server_worker_timeout = 120 @@ -454,6 +460,7 @@ def run_command(command): dag_orientation = LR log_fetch_timeout_sec = 5 hide_paused_dags_by_default = False +forwarded_allow_ips = * [email] email_backend = airflow.utils.email.send_email_smtp This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > allow gunicorn config to be passed to airflow webserver > --- > > Key: AIRFLOW-571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-571 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Dennis O'Brien >Priority: Major > > I have run into an issue when running airflow webserver behind a load > balancer where redirects result in https requests forwarded to http. I ran > into a similar issue with Caravel which also uses gunicorn. >
[jira] [Commented] (AIRFLOW-249) Refactor the SLA mechanism
[ https://issues.apache.org/jira/browse/AIRFLOW-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618619#comment-16618619 ] ASF GitHub Bot commented on AIRFLOW-249: r39132 closed pull request #1601: [AIRFLOW-249] Refactor the SLA mechanism URL: https://github.com/apache/incubator-airflow/pull/1601 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index 1e583ac41b..d6f32cd52a 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -258,39 +258,33 @@ def manage_slas(self, dag, session=None): tasks that should have succeeded in the past hour. """ TI = models.TaskInstance -sq = ( -session -.query( -TI.task_id, -func.max(TI.execution_date).label('max_ti')) -.filter(TI.dag_id == dag.dag_id) -.filter(TI.state == State.SUCCESS) -.filter(TI.task_id.in_(dag.task_ids)) -.group_by(TI.task_id).subquery('sq') +SlaMiss = models.SlaMiss + +sla_missed = ( +session.query(SlaMiss) +.filter(SlaMiss.email_sent == 't') +.subquery('sla_missed') ) -max_tis = session.query(TI).filter( -TI.dag_id == dag.dag_id, -TI.task_id == sq.c.task_id, -TI.execution_date == sq.c.max_ti, -).all() +sq = session.query(TI).outerjoin( +sla_missed, +sla_missed.c.execution_date == TI.execution_date).filter( +sla_missed.c.execution_date == None, +TI.dag_id == dag.dag_id, +TI.state == State.RUNNING, +TI.task_id.in_(dag.task_ids) +).all() ts = datetime.now() -SlaMiss = models.SlaMiss -for ti in max_tis: +for ti in sq: task = dag.get_task(ti.task_id) -dttm = ti.execution_date if task.sla: -dttm = dag.following_schedule(dttm) -while dttm < datetime.now(): -following_schedule = dag.following_schedule(dttm) -if following_schedule + task.sla < datetime.now(): -session.merge(models.SlaMiss( -task_id=ti.task_id, -dag_id=ti.dag_id, -execution_date=dttm, -timestamp=ts)) -dttm = dag.following_schedule(dttm) +if ti.start_date + task.sla < ts: +session.merge(models.SlaMiss( +task_id=ti.task_id, +dag_id=ti.dag_id, +execution_date=ti.execution_date, +timestamp=ts)) session.commit() slas = ( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the SLA mechanism > -- > > Key: AIRFLOW-249 > URL: https://issues.apache.org/jira/browse/AIRFLOW-249 > Project: Apache Airflow > Issue Type: Improvement >Reporter: dud >Assignee: dud >Priority: Major > > Hello > I've noticed the SLA feature is currently behaving as follow : > - it doesn't work on DAG scheduled @once or None because they have no > dag.followwing_schedule property > - it keeps endlessly checking for SLA misses without ever worrying about any > end_date. Worse I noticed that emails are still being sent for runs that are > never happening because of end_date > - it keeps checking for recent TIs even if SLA notification has been already > been sent for them > - the SLA logic is only being fired after following_schedule + sla has > elapsed, in other words one has to wait for the next TI before having a > chance of getting any email. Also the email reports dag.following_schedule > time (I guess because it is close of TI.start_date), but unfortunately that > doesn't match what the task instances shows nor the log filename > - the SLA logic is based on max(TI.execution_date) for the starting point of > its checks, that means that for a DAG whose SLA is longer than its schedule > period if half of the TIs are running longer than expected it will go > unnoticed. This could be demonstrated with a DAG like this one : > {code} > from airflow import DAG > from airflow.operators
[GitHub] r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism
r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism URL: https://github.com/apache/incubator-airflow/pull/1601#issuecomment-422290032 Closing for now. Please reopen once you have updated the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver
r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver URL: https://github.com/apache/incubator-airflow/pull/1869#issuecomment-422290179 @dennisobrien Please reopen when you are ready to proceed! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 closed pull request #1601: [AIRFLOW-249] Refactor the SLA mechanism
r39132 closed pull request #1601: [AIRFLOW-249] Refactor the SLA mechanism URL: https://github.com/apache/incubator-airflow/pull/1601 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index 1e583ac41b..d6f32cd52a 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -258,39 +258,33 @@ def manage_slas(self, dag, session=None): tasks that should have succeeded in the past hour. """ TI = models.TaskInstance -sq = ( -session -.query( -TI.task_id, -func.max(TI.execution_date).label('max_ti')) -.filter(TI.dag_id == dag.dag_id) -.filter(TI.state == State.SUCCESS) -.filter(TI.task_id.in_(dag.task_ids)) -.group_by(TI.task_id).subquery('sq') +SlaMiss = models.SlaMiss + +sla_missed = ( +session.query(SlaMiss) +.filter(SlaMiss.email_sent == 't') +.subquery('sla_missed') ) -max_tis = session.query(TI).filter( -TI.dag_id == dag.dag_id, -TI.task_id == sq.c.task_id, -TI.execution_date == sq.c.max_ti, -).all() +sq = session.query(TI).outerjoin( +sla_missed, +sla_missed.c.execution_date == TI.execution_date).filter( +sla_missed.c.execution_date == None, +TI.dag_id == dag.dag_id, +TI.state == State.RUNNING, +TI.task_id.in_(dag.task_ids) +).all() ts = datetime.now() -SlaMiss = models.SlaMiss -for ti in max_tis: +for ti in sq: task = dag.get_task(ti.task_id) -dttm = ti.execution_date if task.sla: -dttm = dag.following_schedule(dttm) -while dttm < datetime.now(): -following_schedule = dag.following_schedule(dttm) -if following_schedule + task.sla < datetime.now(): -session.merge(models.SlaMiss( -task_id=ti.task_id, -dag_id=ti.dag_id, -execution_date=dttm, -timestamp=ts)) -dttm = dag.following_schedule(dttm) +if ti.start_date + task.sla < ts: +session.merge(models.SlaMiss( +task_id=ti.task_id, +dag_id=ti.dag_id, +execution_date=ti.execution_date, +timestamp=ts)) session.commit() slas = ( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218308398 ## File path: airflow/models.py ## @@ -1744,6 +1749,29 @@ def dry_run(self): self.render_templates() task_copy.dry_run() +@provide_session +def handle_reschedule(self, reschedule_exception, test_mode=False, context=None, Review comment: Yes, should be private This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218308245 ## File path: airflow/models.py ## @@ -1744,6 +1749,29 @@ def dry_run(self): self.render_templates() task_copy.dry_run() +@provide_session +def handle_reschedule(self, reschedule_exception, test_mode=False, context=None, + session=None): +self.end_date = timezone.utcnow() +self.set_duration() + +# Log reschedule request +session.add(TaskReschedule(self.task, self.execution_date, self._try_number, +self.start_date, self.end_date, +reschedule_exception.reschedule_date)) + +# set state +self.state = State.NONE + +# Decrement try_number so subsequent runs will use the same try number and write +# to same log file. +self._try_number -= 1 + +if not test_mode: +session.merge(self) +session.commit() Review comment: It's the same pattern as in `handle_failure`, I didn't think much about it, I'll think a bit more about it... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on a change in pull request #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#discussion_r218307927 ## File path: airflow/models.py ## @@ -56,8 +56,8 @@ from sqlalchemy import ( Column, Integer, String, DateTime, Text, Boolean, ForeignKey, PickleType, -Index, Float, LargeBinary, UniqueConstraint) -from sqlalchemy import func, or_, and_, true as sqltrue +Index, Float, LargeBinary, UniqueConstraint, ForeignKeyConstraint) +from sqlalchemy import func, or_, and_, true as sqltrue, asc Review comment: There are already two `from sqlalchemy import ...` statements: the first imports types, the second imports (SQL) expressions. I can introduce a third one. Or combine all into one like this (lexicographically sorted): ``` from sqlalchemy import ( Boolean, Column, DateTime, Float, ForeignKey, ForeignKeyConstraint, Index, Integer, LargeBinary, PickleType, String, Text, UniqueConstraint, and_, asc, func, or_, true as sqltrue ) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services