[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing
[ https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617096#comment-16617096 ] ASF GitHub Bot commented on AIRFLOW-2761: - yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor URL: https://github.com/apache/incubator-airflow/pull/3910 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index 4fd45b2c0e..124da34690 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -31,6 +31,11 @@ some bugs. The new `sync_parallelism` config option will control how many processes CeleryExecutor will use to fetch celery task state in parallel. Default value is max(1, number of cores - 1) +### New `log_processor_manager_location` config option + +The DAG parsing manager log now by default will be log into a file, where its location is +controlled by the new `log_processor_manager_location` config option in core section. + ## Airflow 1.10 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` in your environment or diff --git a/airflow/config_templates/airflow_local_settings.py b/airflow/config_templates/airflow_local_settings.py index 95150ab3bb..634b21747e 100644 --- a/airflow/config_templates/airflow_local_settings.py +++ b/airflow/config_templates/airflow_local_settings.py @@ -20,6 +20,7 @@ import os from airflow import configuration as conf +from airflow.utils.file import mkdirs # TODO: Logging format and level should be configured # in this file instead of from airflow.cfg. Currently @@ -38,7 +39,10 @@ PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'CHILD_PROCESS_LOG_DIRECTORY') +LOG_PROCESSOR_MANAGER_LOCATION = conf.get('core', 'LOG_PROCESSOR_MANAGER_LOCATION') + FILENAME_TEMPLATE = conf.get('core', 'LOG_FILENAME_TEMPLATE') + PROCESSOR_FILENAME_TEMPLATE = conf.get('core', 'LOG_PROCESSOR_FILENAME_TEMPLATE') # Storage bucket url for remote logging @@ -79,7 +83,7 @@ 'formatter': 'airflow', 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER), 'filename_template': PROCESSOR_FILENAME_TEMPLATE, -}, +} }, 'loggers': { 'airflow.processor': { @@ -104,6 +108,26 @@ } } +DEFAULT_DAG_PARSING_LOGGING_CONFIG = { +'handlers': { +'processor_manager': { +'class': 'logging.handlers.RotatingFileHandler', +'formatter': 'airflow', +'filename': LOG_PROCESSOR_MANAGER_LOCATION, +'mode': 'a', +'maxBytes': 104857600, # 100MB +'backupCount': 5 +} +}, +'loggers': { +'airflow.processor_manager': { +'handlers': ['processor_manager'], +'level': LOG_LEVEL, +'propagate': False, +} +} +} + REMOTE_HANDLERS = { 's3': { 'task': { @@ -172,6 +196,20 @@ REMOTE_LOGGING = conf.get('core', 'remote_logging') +if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True': +DEFAULT_LOGGING_CONFIG['handlers'] \ +.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers']) +DEFAULT_LOGGING_CONFIG['loggers'] \ +.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers']) + +# Manually create log directory for processor_manager handler as RotatingFileHandler +# will only create file but not the directory. +processor_manager_handler_config = DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][ +'processor_manager'] +directory = os.path.dirname(processor_manager_handler_config['filename']) +if not os.path.exists(directory): +mkdirs(directory, 0o777) + if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'): DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3']) elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('gs://'): diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 000dd67a13..367527b1c1 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -70,6 +70,7 @@ simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s # we need to escape the curly braces by adding an additional curly brace log_filename_template = ti.dag_id / ti.task_id / ts / try_number .log log_processor_filename_template = filename .log +log_processor_manager_location = {AIRFLOW_HOME}/logs/dag_processor_manager/dag_processor_manager.log # Hostname by providing a path to a callable, which will resolve the hostname hostname_callable = socket:getfqdn diff --git a/airflow/config_templates/default_test.cfg
[GitHub] yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor
yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor URL: https://github.com/apache/incubator-airflow/pull/3910 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index 4fd45b2c0e..124da34690 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -31,6 +31,11 @@ some bugs. The new `sync_parallelism` config option will control how many processes CeleryExecutor will use to fetch celery task state in parallel. Default value is max(1, number of cores - 1) +### New `log_processor_manager_location` config option + +The DAG parsing manager log now by default will be log into a file, where its location is +controlled by the new `log_processor_manager_location` config option in core section. + ## Airflow 1.10 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` in your environment or diff --git a/airflow/config_templates/airflow_local_settings.py b/airflow/config_templates/airflow_local_settings.py index 95150ab3bb..634b21747e 100644 --- a/airflow/config_templates/airflow_local_settings.py +++ b/airflow/config_templates/airflow_local_settings.py @@ -20,6 +20,7 @@ import os from airflow import configuration as conf +from airflow.utils.file import mkdirs # TODO: Logging format and level should be configured # in this file instead of from airflow.cfg. Currently @@ -38,7 +39,10 @@ PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'CHILD_PROCESS_LOG_DIRECTORY') +LOG_PROCESSOR_MANAGER_LOCATION = conf.get('core', 'LOG_PROCESSOR_MANAGER_LOCATION') + FILENAME_TEMPLATE = conf.get('core', 'LOG_FILENAME_TEMPLATE') + PROCESSOR_FILENAME_TEMPLATE = conf.get('core', 'LOG_PROCESSOR_FILENAME_TEMPLATE') # Storage bucket url for remote logging @@ -79,7 +83,7 @@ 'formatter': 'airflow', 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER), 'filename_template': PROCESSOR_FILENAME_TEMPLATE, -}, +} }, 'loggers': { 'airflow.processor': { @@ -104,6 +108,26 @@ } } +DEFAULT_DAG_PARSING_LOGGING_CONFIG = { +'handlers': { +'processor_manager': { +'class': 'logging.handlers.RotatingFileHandler', +'formatter': 'airflow', +'filename': LOG_PROCESSOR_MANAGER_LOCATION, +'mode': 'a', +'maxBytes': 104857600, # 100MB +'backupCount': 5 +} +}, +'loggers': { +'airflow.processor_manager': { +'handlers': ['processor_manager'], +'level': LOG_LEVEL, +'propagate': False, +} +} +} + REMOTE_HANDLERS = { 's3': { 'task': { @@ -172,6 +196,20 @@ REMOTE_LOGGING = conf.get('core', 'remote_logging') +if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True': +DEFAULT_LOGGING_CONFIG['handlers'] \ +.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers']) +DEFAULT_LOGGING_CONFIG['loggers'] \ +.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers']) + +# Manually create log directory for processor_manager handler as RotatingFileHandler +# will only create file but not the directory. +processor_manager_handler_config = DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][ +'processor_manager'] +directory = os.path.dirname(processor_manager_handler_config['filename']) +if not os.path.exists(directory): +mkdirs(directory, 0o777) + if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'): DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3']) elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('gs://'): diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 000dd67a13..367527b1c1 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -70,6 +70,7 @@ simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s # we need to escape the curly braces by adding an additional curly brace log_filename_template = ti.dag_id / ti.task_id / ts / try_number .log log_processor_filename_template = filename .log +log_processor_manager_location = {AIRFLOW_HOME}/logs/dag_processor_manager/dag_processor_manager.log # Hostname by providing a path to a callable, which will resolve the hostname hostname_callable = socket:getfqdn diff --git a/airflow/config_templates/default_test.cfg b/airflow/config_templates/default_test.cfg index f9279cce54..7f8f350970 100644 --- a/airflow/config_templates/default_test.cfg +++ b/airflow/config_templates/default_test.cfg @@ -39,6 +39,7 @@ logging_level = INFO fab_logging_level = WARN
[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing
[ https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617087#comment-16617087 ] ASF GitHub Bot commented on AIRFLOW-2761: - yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor URL: https://github.com/apache/incubator-airflow/pull/3910 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Parallelize Celery Executor enqueuing > - > > Key: AIRFLOW-2761 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2761 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kevin Yang >Priority: Major > > Currently celery executor enqueues in an async fashion but still doing that > in a single process loop. This can slows down scheduler loop and creates > scheduling delay if we have large # of task to schedule in a short time, e.g. > UTC midnight we need to schedule large # of sensors in a short period. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor
yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor URL: https://github.com/apache/incubator-airflow/pull/3910 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3074) Missing options in ECS operator
Josh Carp created AIRFLOW-3074: -- Summary: Missing options in ECS operator Key: AIRFLOW-3074 URL: https://issues.apache.org/jira/browse/AIRFLOW-3074 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.10.0 Reporter: Josh Carp The ECS operator exposes some options passed to the boto3 run_task method, but omits a few other useful options. For example, the networkConfiguration option, which is useful for setting security groups on a Fargate container, isn't exposed. The ECS operator should expose all run_task options that are relevant. I wrote up a quick patch for this issue; I'll amend the pull request to reference this report. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616988#comment-16616988 ] ASF GitHub Bot commented on AIRFLOW-3073: - XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling feature not supported in new webserver URL: https://github.com/apache/incubator-airflow/pull/3909 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3073 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: `Adhoc queries` and `Charts` features are no longer supported in new FAB-based webserver and UI. But this is not mentioned at all in the doc "Data Profiling" (https://airflow.incubator.apache.org/profiling.html). This may confuse users. Ref: https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes This commit adds a note to remind users for this. (Later after the `www` is removed, this `profiling.rst` should be removed) ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Documentation chagne. ### Commits - [] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > A note is needed in 'Data Profiling' doc page to reminder users it's no > longer supported in new webserver UI > > > Key: AIRFLOW-3073 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3073 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned > at all that these features are no longer supported in new webser (FAB-based) > due to security concern > (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling feature not supported in new webserver
XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling feature not supported in new webserver URL: https://github.com/apache/incubator-airflow/pull/3909 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3073 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: `Adhoc queries` and `Charts` features are no longer supported in new FAB-based webserver and UI. But this is not mentioned at all in the doc "Data Profiling" (https://airflow.incubator.apache.org/profiling.html). This may confuse users. Ref: https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes This commit adds a note to remind users for this. (Later after the `www` is removed, this `profiling.rst` should be removed) ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Documentation chagne. ### Commits - [] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG updated AIRFLOW-3073: --- Description: In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned at all that these features are no longer supported in new webser (FAB-based) due to security concern (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes). (was: In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned at all that these features are no longer supported in new webser (FAB-based) due to security concern.) > A note is needed in 'Data Profiling' doc page to reminder users it's no > longer supported in new webserver UI > > > Key: AIRFLOW-3073 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3073 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned > at all that these features are no longer supported in new webser (FAB-based) > due to security concern > (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI
Xiaodong DENG created AIRFLOW-3073: -- Summary: A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI Key: AIRFLOW-3073 URL: https://issues.apache.org/jira/browse/AIRFLOW-3073 Project: Apache Airflow Issue Type: Improvement Components: Documentation Reporter: Xiaodong DENG Assignee: Xiaodong DENG In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned at all that these features are no longer supported in new webser (FAB-based) due to security concern. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp opened a new pull request #3908: [WIP] Add relevant ECS options to ECS operator.
jmcarp opened a new pull request #3908: [WIP] Add relevant ECS options to ECS operator. URL: https://github.com/apache/incubator-airflow/pull/3908 The ECS operator currently supports only a subset of available options for running ECS tasks. This patch adds all ECS options that could be relevant to airflow; options that wouldn't make sense here, like `count`, were skipped. Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger Parent dag_runs
[ https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-1195: Summary: Cleared tasks in SubDagOperator do not trigger Parent dag_runs (was: Cleared tasks in SubDagOperator do not trigger upstream dag_runs) > Cleared tasks in SubDagOperator do not trigger Parent dag_runs > -- > > Key: AIRFLOW-1195 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1195 > Project: Apache Airflow > Issue Type: Bug > Components: subdag >Affects Versions: 1.8.1 >Reporter: Paul Zaczkieiwcz >Assignee: Kaxil Naik >Priority: Minor > Attachments: example_subdag_operator.not-cleared.png, > example_subdag_operator.section-2.cleared.png > > > Let's say that you had a task fail in a SubDag. You fix the underlying issue > and want Airflow to resume the DagRun where it left off. If this were a flat > DAG, then all you need to do is clear the failed TaskInstance and its > downstream dependencies. The GUI will happily clear all of them for you in a > single PUT request! In order to resume a SubDag, you must clear the > TaskInstance + downstream dependencies AND you must clear the SubDagOperator > + downstream depencies WITHOUT clearing its recursive dependencies. There > should be an option to recursively clear task instances in upstream SubDags. > The attached files use the example_subdag_operator DAG to illustrate the > problem. Before the screenshot, I ran the operator to completion, then > cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that > {{example_subdag_operator.section-2}} is in the `running` state, but > {{example_subdag_operator}} is still in the `success` state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger upstream dag_runs
[ https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik reassigned AIRFLOW-1195: --- Assignee: Kaxil Naik > Cleared tasks in SubDagOperator do not trigger upstream dag_runs > > > Key: AIRFLOW-1195 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1195 > Project: Apache Airflow > Issue Type: Bug > Components: subdag >Affects Versions: 1.8.1 >Reporter: Paul Zaczkieiwcz >Assignee: Kaxil Naik >Priority: Minor > Attachments: example_subdag_operator.not-cleared.png, > example_subdag_operator.section-2.cleared.png > > > Let's say that you had a task fail in a SubDag. You fix the underlying issue > and want Airflow to resume the DagRun where it left off. If this were a flat > DAG, then all you need to do is clear the failed TaskInstance and its > downstream dependencies. The GUI will happily clear all of them for you in a > single PUT request! In order to resume a SubDag, you must clear the > TaskInstance + downstream dependencies AND you must clear the SubDagOperator > + downstream depencies WITHOUT clearing its recursive dependencies. There > should be an option to recursively clear task instances in upstream SubDags. > The attached files use the example_subdag_operator DAG to illustrate the > problem. Before the screenshot, I ran the operator to completion, then > cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that > {{example_subdag_operator.section-2}} is in the `running` state, but > {{example_subdag_operator}} is still in the `success` state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
feng-tao commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421856616 I will try to spend some time on the pr tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear tasks in Parent Dag
kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear tasks in Parent Dag URL: https://github.com/apache/incubator-airflow/pull/3907 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1195 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Details on Jira Let's say that you had a task fail in a SubDag. You fix the underlying issue and want Airflow to resume the DagRun where it left off. If this were a flat DAG, then all you need to do is clear the failed TaskInstance and its downstream dependencies. The GUI will happily clear all of them for you in a single PUT request! In order to resume a SubDag, you must clear the TaskInstance + downstream dependencies AND you must clear the SubDagOperator + downstream depencies WITHOUT clearing its recursive dependencies. There should be an option to recursively clear task instances in upstream SubDags. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger upstream dag_runs
[ https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616902#comment-16616902 ] ASF GitHub Bot commented on AIRFLOW-1195: - kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear tasks in Parent Dag URL: https://github.com/apache/incubator-airflow/pull/3907 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1195 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Details on Jira Let's say that you had a task fail in a SubDag. You fix the underlying issue and want Airflow to resume the DagRun where it left off. If this were a flat DAG, then all you need to do is clear the failed TaskInstance and its downstream dependencies. The GUI will happily clear all of them for you in a single PUT request! In order to resume a SubDag, you must clear the TaskInstance + downstream dependencies AND you must clear the SubDagOperator + downstream depencies WITHOUT clearing its recursive dependencies. There should be an option to recursively clear task instances in upstream SubDags. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Cleared tasks in SubDagOperator do not trigger upstream dag_runs > > > Key: AIRFLOW-1195 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1195 > Project: Apache Airflow > Issue Type: Bug > Components: subdag >Affects Versions: 1.8.1 >Reporter: Paul Zaczkieiwcz >Priority: Minor > Attachments: example_subdag_operator.not-cleared.png, > example_subdag_operator.section-2.cleared.png > > > Let's say that you had a task fail in a SubDag. You fix the underlying issue > and want Airflow to resume the DagRun where it left off. If this were a flat > DAG, then all you need to do is clear the failed TaskInstance and its > downstream dependencies. The GUI will happily clear all of them for you in a > single PUT request! In order to resume a SubDag, you must clear the > TaskInstance + downstream dependencies AND you must clear the SubDagOperator > + downstream depencies WITHOUT clearing its recursive dependencies. There > should be an option to recursively clear task instances in upstream SubDags. > The attached files use the example_subdag_operator DAG to illustrate the > problem. Before the screenshot, I ran the operator to completion, then > cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that > {{example_subdag_operator.section-2}} is in the `running` state, but > {{example_subdag_operator}} is still in the `success` state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3072) Only admin can view logs in RBAC UI
Stefan Seelmann created AIRFLOW-3072: Summary: Only admin can view logs in RBAC UI Key: AIRFLOW-3072 URL: https://issues.apache.org/jira/browse/AIRFLOW-3072 Project: Apache Airflow Issue Type: Bug Components: ui Affects Versions: 1.10.0 Reporter: Stefan Seelmann Assignee: Stefan Seelmann With RBAC enabled, only users with role admin can view logs. Cause is that there is no permission for {{get_logs_with_metadata}} defined in {{security.py}}. My suggestion is to add the permission and assign tog viewer role. Or is there a cause why only admin should be able to see logs? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] seelmann opened a new pull request #1: Improve Docker image for local development
seelmann opened a new pull request #1: Improve Docker image for local development URL: https://github.com/apache/incubator-airflow-ci/pull/1 * Expose port 8080 to host for accessing webserver * Install nodejs to be able to build `www_rbac` assets * Install virtualenv to be able to install dependencies * Install additional tools for easier debugging I started to use this docker image for local development, but some pieces were missing. I hope those changes don't have negative impact on CI tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work started] (AIRFLOW-2747) Explicit re-schedule of sensors
[ https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2747 started by Stefan Seelmann. > Explicit re-schedule of sensors > --- > > Key: AIRFLOW-2747 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2747 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-07-12_14-10-24.png, > Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png > > > By default sensors block a worker and just sleep between pokes. This is very > inefficient, especially when there are many long-running sensors. > There is a hacky workaroud by setting a small timeout value and a high retry > number. But that has drawbacks: > * Errors raised by sensors are hidden and the sensor retries too often > * The sensor is retried in a fixed time interval (with optional exponential > backoff) > * There are many attempts and many log files are generated > I'd like to propose an explicit reschedule mechanism: > * A new "reschedule" flag for sensors, if set to True it will raise an > AirflowRescheduleException that causes a reschedule. > * AirflowRescheduleException contains the (earliest) re-schedule date. > * Reschedule requests are recorded in new `task_reschedule` table and > visualized in the Gantt view. > * A new TI dependency that checks if a sensor task is ready to be > re-scheduled. > Advantages: > * This change is backward compatible. Existing sensors behave like before. > But it's possible to set the "reschedule" flag. > * The poke_interval, timeout, and soft_fail parameters are still respected > and used to calculate the next schedule time. > * Custom sensor implementations can even define the next sensible schedule > date by raising AirflowRescheduleException themselves. > * Existing TimeSensor and TimeDeltaSensor can also be changed to be > rescheduled when the time is reached. > * This mechanism can also be used by non-sensor operators (but then the new > ReadyToRescheduleDep has to be added to deps or BaseOperator). > Design decisions and caveats: > * When handling AirflowRescheduleException the `try_number` is decremented. > That means that subsequent runs use the same try number and write to the same > log file. > * Sensor TI dependency check now depends on `task_reschedule` table. However > only the BaseSensorOperator includes the new ReadyToRescheduleDep for now. > Open questions and TODOs: > * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting > the state back to `NONE`? This would require more changes in scheduler code > and especially in the UI, but the state of a task would be more explicit and > more transparent to the user. > * Add example/test for a non-sensor operator > * Document the new feature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] seelmann edited a comment on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann edited a comment on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421815236 @Fokko I rebased again and made following changes: * Instead of using a `reschedule` boolean flag I switched to `mode` property with possible values `poke` (default) and `reschedule` to allow other modes in future (e.g. `in_scheduler` as mentioned by Maxime). * Changed the Gantt view based on feedback from Pedro, see screenshots in the Jira issue. * Changed the CSS class in Gantt for the `NONE` state to display as white instead of black, streamlined to graph and tree view. * Added more tests and updated documentation From my PoV it's ready to be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421815236 @Fokko I rebased again and made following chages: * Instead of using a `reschedule` boolean flag I switched to `mode` property with possible values `poke` (default) and `reschedule` to allow other modes in future (e.g. `in_scheduler` as mentioned by Maxime). * Changed the Gantt view based on feedback from Pedro, see screenshots in the Jira issue. * Changed the CSS class in Gantt for the `NONE` state to display as white instead of black, streamlined to graph and tree view. * Added more tests and updated documentation From my PoV it's ready to be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors
[ https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Seelmann updated AIRFLOW-2747: - Attachment: Screenshot_2018-09-16_20-09-28.png > Explicit re-schedule of sensors > --- > > Key: AIRFLOW-2747 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2747 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-07-12_14-10-24.png, > Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png > > > By default sensors block a worker and just sleep between pokes. This is very > inefficient, especially when there are many long-running sensors. > There is a hacky workaroud by setting a small timeout value and a high retry > number. But that has drawbacks: > * Errors raised by sensors are hidden and the sensor retries too often > * The sensor is retried in a fixed time interval (with optional exponential > backoff) > * There are many attempts and many log files are generated > I'd like to propose an explicit reschedule mechanism: > * A new "reschedule" flag for sensors, if set to True it will raise an > AirflowRescheduleException that causes a reschedule. > * AirflowRescheduleException contains the (earliest) re-schedule date. > * Reschedule requests are recorded in new `task_reschedule` table and > visualized in the Gantt view. > * A new TI dependency that checks if a sensor task is ready to be > re-scheduled. > Advantages: > * This change is backward compatible. Existing sensors behave like before. > But it's possible to set the "reschedule" flag. > * The poke_interval, timeout, and soft_fail parameters are still respected > and used to calculate the next schedule time. > * Custom sensor implementations can even define the next sensible schedule > date by raising AirflowRescheduleException themselves. > * Existing TimeSensor and TimeDeltaSensor can also be changed to be > rescheduled when the time is reached. > * This mechanism can also be used by non-sensor operators (but then the new > ReadyToRescheduleDep has to be added to deps or BaseOperator). > Design decisions and caveats: > * When handling AirflowRescheduleException the `try_number` is decremented. > That means that subsequent runs use the same try number and write to the same > log file. > * Sensor TI dependency check now depends on `task_reschedule` table. However > only the BaseSensorOperator includes the new ReadyToRescheduleDep for now. > Open questions and TODOs: > * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting > the state back to `NONE`? This would require more changes in scheduler code > and especially in the UI, but the state of a task would be more explicit and > more transparent to the user. > * Add example/test for a non-sensor operator > * Document the new feature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors
[ https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616842#comment-16616842 ] Stefan Seelmann commented on AIRFLOW-2747: -- I changed the Gantt view to not show each individual reschedule but only a single bar. The color changes between light green (if currently running) and white (if currently inactive), those colors are also shown in other views so it's consistent. However failed attempts are still shown as separate bar (as before). Attached two screenshots for demonstration. !Screenshot_2018-09-16_20-19-23.png!!Screenshot_2018-09-16_20-09-28.png! > Explicit re-schedule of sensors > --- > > Key: AIRFLOW-2747 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2747 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-07-12_14-10-24.png, > Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png > > > By default sensors block a worker and just sleep between pokes. This is very > inefficient, especially when there are many long-running sensors. > There is a hacky workaroud by setting a small timeout value and a high retry > number. But that has drawbacks: > * Errors raised by sensors are hidden and the sensor retries too often > * The sensor is retried in a fixed time interval (with optional exponential > backoff) > * There are many attempts and many log files are generated > I'd like to propose an explicit reschedule mechanism: > * A new "reschedule" flag for sensors, if set to True it will raise an > AirflowRescheduleException that causes a reschedule. > * AirflowRescheduleException contains the (earliest) re-schedule date. > * Reschedule requests are recorded in new `task_reschedule` table and > visualized in the Gantt view. > * A new TI dependency that checks if a sensor task is ready to be > re-scheduled. > Advantages: > * This change is backward compatible. Existing sensors behave like before. > But it's possible to set the "reschedule" flag. > * The poke_interval, timeout, and soft_fail parameters are still respected > and used to calculate the next schedule time. > * Custom sensor implementations can even define the next sensible schedule > date by raising AirflowRescheduleException themselves. > * Existing TimeSensor and TimeDeltaSensor can also be changed to be > rescheduled when the time is reached. > * This mechanism can also be used by non-sensor operators (but then the new > ReadyToRescheduleDep has to be added to deps or BaseOperator). > Design decisions and caveats: > * When handling AirflowRescheduleException the `try_number` is decremented. > That means that subsequent runs use the same try number and write to the same > log file. > * Sensor TI dependency check now depends on `task_reschedule` table. However > only the BaseSensorOperator includes the new ReadyToRescheduleDep for now. > Open questions and TODOs: > * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting > the state back to `NONE`? This would require more changes in scheduler code > and especially in the UI, but the state of a task would be more explicit and > more transparent to the user. > * Add example/test for a non-sensor operator > * Document the new feature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors
[ https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Seelmann updated AIRFLOW-2747: - Attachment: Screenshot_2018-09-16_20-19-23.png > Explicit re-schedule of sensors > --- > > Key: AIRFLOW-2747 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2747 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-07-12_14-10-24.png, > Screenshot_2018-09-16_20-19-23.png > > > By default sensors block a worker and just sleep between pokes. This is very > inefficient, especially when there are many long-running sensors. > There is a hacky workaroud by setting a small timeout value and a high retry > number. But that has drawbacks: > * Errors raised by sensors are hidden and the sensor retries too often > * The sensor is retried in a fixed time interval (with optional exponential > backoff) > * There are many attempts and many log files are generated > I'd like to propose an explicit reschedule mechanism: > * A new "reschedule" flag for sensors, if set to True it will raise an > AirflowRescheduleException that causes a reschedule. > * AirflowRescheduleException contains the (earliest) re-schedule date. > * Reschedule requests are recorded in new `task_reschedule` table and > visualized in the Gantt view. > * A new TI dependency that checks if a sensor task is ready to be > re-scheduled. > Advantages: > * This change is backward compatible. Existing sensors behave like before. > But it's possible to set the "reschedule" flag. > * The poke_interval, timeout, and soft_fail parameters are still respected > and used to calculate the next schedule time. > * Custom sensor implementations can even define the next sensible schedule > date by raising AirflowRescheduleException themselves. > * Existing TimeSensor and TimeDeltaSensor can also be changed to be > rescheduled when the time is reached. > * This mechanism can also be used by non-sensor operators (but then the new > ReadyToRescheduleDep has to be added to deps or BaseOperator). > Design decisions and caveats: > * When handling AirflowRescheduleException the `try_number` is decremented. > That means that subsequent runs use the same try number and write to the same > log file. > * Sensor TI dependency check now depends on `task_reschedule` table. However > only the BaseSensorOperator includes the new ReadyToRescheduleDep for now. > Open questions and TODOs: > * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting > the state back to `NONE`? This would require more changes in scheduler code > and especially in the UI, but the state of a task would be more explicit and > more transparent to the user. > * Add example/test for a non-sensor operator > * Document the new feature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View
r39132 commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View URL: https://github.com/apache/incubator-airflow/pull/1910#issuecomment-421794784 @robin-miller-ow A few questions: 1. Would you be interested in making this work on all views, instead of just on the Graph View? 2. Also, instead of making the update in the configuration file, it would make more sense for each user to pick a refresh interval that made sense to him/her. Hence, it would be better to surface a UI control that allowed folks to pick a refresh interval that could persisted via a cookie from one session to another. Let me know if you'd like to take this forward. I'd be happy to stick with this to see it merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver
r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver URL: https://github.com/apache/incubator-airflow/pull/1869#issuecomment-421793313 Hi @dennisobrien How would you like to proceed with this PR? Would you like to take another crack? If so, I'd be happy to take a look at it. If I don't hear back in a few days, I'll close this out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism
r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism URL: https://github.com/apache/incubator-airflow/pull/1601#issuecomment-421792600 Hi @dud225 I wanted to see if you were still interested in taking a crack at this. If I don't hear back in a few days, I'll close this. You are welcome to reopen it however. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-3068) Remove deprecated import mechanism
[ https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand closed AIRFLOW-3068. Resolution: Fixed > Remove deprecated import mechanism > -- > > Key: AIRFLOW-3068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3068 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > > In Airflow 2.0 we want to remove the deprecate import mechanism. > Before: > from airflow.operators import BashOperator > changes to > from airflow.operators.bash_operator import BashOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3068) Remove deprecated import mechanism
[ https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616781#comment-16616781 ] ASF GitHub Bot commented on AIRFLOW-3068: - r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports URL: https://github.com/apache/incubator-airflow/pull/3906 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/__init__.py b/airflow/__init__.py index d010fe4c74..9d071a347c 100644 --- a/airflow/__init__.py +++ b/airflow/__init__.py @@ -79,16 +79,3 @@ class AirflowViewPlugin(BaseView): class AirflowMacroPlugin(object): def __init__(self, namespace): self.namespace = namespace - - -from airflow import operators # noqa: E402 -from airflow import sensors # noqa: E402 -from airflow import hooks # noqa: E402 -from airflow import executors # noqa: E402 -from airflow import macros # noqa: E402 - -operators._integrate_plugins() -sensors._integrate_plugins() # noqa: E402 -hooks._integrate_plugins() -executors._integrate_plugins() -macros._integrate_plugins() diff --git a/airflow/contrib/hooks/__init__.py b/airflow/contrib/hooks/__init__.py index 6c31842578..b7f8352944 100644 --- a/airflow/contrib/hooks/__init__.py +++ b/airflow/contrib/hooks/__init__.py @@ -17,54 +17,3 @@ # specific language governing permissions and limitations # under the License. # - - -# Contrib hooks are not imported by default. They should be accessed -# directly: from airflow.contrib.hooks.hook_module import Hook - - -import sys -import os - -# -# -# #TODO #FIXME Airflow 2.0 -# -# Old import machinary below. -# -# This is deprecated but should be kept until Airflow 2.0 -# for compatibility. -# -# -_hooks = { -'docker_hook': ['DockerHook'], -'ftp_hook': ['FTPHook'], -'ftps_hook': ['FTPSHook'], -'vertica_hook': ['VerticaHook'], -'ssh_hook': ['SSHHook'], -'winrm_hook': ['WinRMHook'], -'sftp_hook': ['SFTPHook'], -'bigquery_hook': ['BigQueryHook'], -'qubole_hook': ['QuboleHook'], -'gcs_hook': ['GoogleCloudStorageHook'], -'datastore_hook': ['DatastoreHook'], -'gcp_cloudml_hook': ['CloudMLHook'], -'redshift_hook': ['RedshiftHook'], -'gcp_dataproc_hook': ['DataProcHook'], -'gcp_dataflow_hook': ['DataFlowHook'], -'spark_submit_operator': ['SparkSubmitOperator'], -'cloudant_hook': ['CloudantHook'], -'fs_hook': ['FSHook'], -'wasb_hook': ['WasbHook'], -'gcp_pubsub_hook': ['PubSubHook'], -'jenkins_hook': ['JenkinsHook'], -'aws_dynamodb_hook': ['AwsDynamoDBHook'], -'azure_data_lake_hook': ['AzureDataLakeHook'], -'azure_fileshare_hook': ['AzureFileShareHook'], -} - - -if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from airflow.utils.helpers import AirflowImporter - -airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) diff --git a/airflow/contrib/operators/__init__.py b/airflow/contrib/operators/__init__.py index 247ec5941f..b7f8352944 100644 --- a/airflow/contrib/operators/__init__.py +++ b/airflow/contrib/operators/__init__.py @@ -17,37 +17,3 @@ # specific language governing permissions and limitations # under the License. # - - -# Contrib operators are not imported by default. They should be accessed -# directly: from airflow.contrib.operators.operator_module import Operator - - -import sys -import os - -# -# -# #TODO #FIXME Airflow 2.0 -# -# Old import machinary below. -# -# This is deprecated but should be kept until Airflow 2.0 -# for compatibility. -# -# -_operators = { -'ssh_operator': ['SSHOperator'], -'winrm_operator': ['WinRMOperator'], -'vertica_operator': ['VerticaOperator'], -'vertica_to_hive': ['VerticaToHiveTransfer'], -'qubole_operator': ['QuboleOperator'], -'spark_submit_operator': ['SparkSubmitOperator'], -'file_to_wasb': ['FileToWasbOperator'], -'fs_operator': ['FileSensor'], -'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator'] -} - -if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from airflow.utils.helpers import AirflowImporter -airflow_importer = AirflowImporter(sys.modules[__name__], _operators) diff --git a/airflow/contrib/operators/mlengine_operator.py b/airflow/contrib/operators/mlengine_operator.py index c3d9075618..8091ceefff 100644 --- a/airflow/contrib/operators/mlengine_operator.py +++ b/airflow/contrib/operators/mlengine_operator.py @@ -19,7 +19,7 @@ from
[GitHub] r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports
r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports URL: https://github.com/apache/incubator-airflow/pull/3906 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/__init__.py b/airflow/__init__.py index d010fe4c74..9d071a347c 100644 --- a/airflow/__init__.py +++ b/airflow/__init__.py @@ -79,16 +79,3 @@ class AirflowViewPlugin(BaseView): class AirflowMacroPlugin(object): def __init__(self, namespace): self.namespace = namespace - - -from airflow import operators # noqa: E402 -from airflow import sensors # noqa: E402 -from airflow import hooks # noqa: E402 -from airflow import executors # noqa: E402 -from airflow import macros # noqa: E402 - -operators._integrate_plugins() -sensors._integrate_plugins() # noqa: E402 -hooks._integrate_plugins() -executors._integrate_plugins() -macros._integrate_plugins() diff --git a/airflow/contrib/hooks/__init__.py b/airflow/contrib/hooks/__init__.py index 6c31842578..b7f8352944 100644 --- a/airflow/contrib/hooks/__init__.py +++ b/airflow/contrib/hooks/__init__.py @@ -17,54 +17,3 @@ # specific language governing permissions and limitations # under the License. # - - -# Contrib hooks are not imported by default. They should be accessed -# directly: from airflow.contrib.hooks.hook_module import Hook - - -import sys -import os - -# -# -# #TODO #FIXME Airflow 2.0 -# -# Old import machinary below. -# -# This is deprecated but should be kept until Airflow 2.0 -# for compatibility. -# -# -_hooks = { -'docker_hook': ['DockerHook'], -'ftp_hook': ['FTPHook'], -'ftps_hook': ['FTPSHook'], -'vertica_hook': ['VerticaHook'], -'ssh_hook': ['SSHHook'], -'winrm_hook': ['WinRMHook'], -'sftp_hook': ['SFTPHook'], -'bigquery_hook': ['BigQueryHook'], -'qubole_hook': ['QuboleHook'], -'gcs_hook': ['GoogleCloudStorageHook'], -'datastore_hook': ['DatastoreHook'], -'gcp_cloudml_hook': ['CloudMLHook'], -'redshift_hook': ['RedshiftHook'], -'gcp_dataproc_hook': ['DataProcHook'], -'gcp_dataflow_hook': ['DataFlowHook'], -'spark_submit_operator': ['SparkSubmitOperator'], -'cloudant_hook': ['CloudantHook'], -'fs_hook': ['FSHook'], -'wasb_hook': ['WasbHook'], -'gcp_pubsub_hook': ['PubSubHook'], -'jenkins_hook': ['JenkinsHook'], -'aws_dynamodb_hook': ['AwsDynamoDBHook'], -'azure_data_lake_hook': ['AzureDataLakeHook'], -'azure_fileshare_hook': ['AzureFileShareHook'], -} - - -if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from airflow.utils.helpers import AirflowImporter - -airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) diff --git a/airflow/contrib/operators/__init__.py b/airflow/contrib/operators/__init__.py index 247ec5941f..b7f8352944 100644 --- a/airflow/contrib/operators/__init__.py +++ b/airflow/contrib/operators/__init__.py @@ -17,37 +17,3 @@ # specific language governing permissions and limitations # under the License. # - - -# Contrib operators are not imported by default. They should be accessed -# directly: from airflow.contrib.operators.operator_module import Operator - - -import sys -import os - -# -# -# #TODO #FIXME Airflow 2.0 -# -# Old import machinary below. -# -# This is deprecated but should be kept until Airflow 2.0 -# for compatibility. -# -# -_operators = { -'ssh_operator': ['SSHOperator'], -'winrm_operator': ['WinRMOperator'], -'vertica_operator': ['VerticaOperator'], -'vertica_to_hive': ['VerticaToHiveTransfer'], -'qubole_operator': ['QuboleOperator'], -'spark_submit_operator': ['SparkSubmitOperator'], -'file_to_wasb': ['FileToWasbOperator'], -'fs_operator': ['FileSensor'], -'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator'] -} - -if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from airflow.utils.helpers import AirflowImporter -airflow_importer = AirflowImporter(sys.modules[__name__], _operators) diff --git a/airflow/contrib/operators/mlengine_operator.py b/airflow/contrib/operators/mlengine_operator.py index c3d9075618..8091ceefff 100644 --- a/airflow/contrib/operators/mlengine_operator.py +++ b/airflow/contrib/operators/mlengine_operator.py @@ -19,7 +19,7 @@ from airflow.contrib.hooks.gcp_mlengine_hook import MLEngineHook from airflow.exceptions import AirflowException -from airflow.operators import BaseOperator +from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults from
[GitHub] r39132 commented on issue #3906: [AIRFLOW-3068] Remove deprecated imports
r39132 commented on issue #3906: [AIRFLOW-3068] Remove deprecated imports URL: https://github.com/apache/incubator-airflow/pull/3906#issuecomment-421789103 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs
XD-DENG commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs URL: https://github.com/apache/incubator-airflow/pull/3863#issuecomment-421768717 Thanks @kaxil . Appreciate the time you guys contributed as well. Especially now, a Sunday :) A nice afternoon ahead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3070) Refine web UI authentication-related docs
[ https://issues.apache.org/jira/browse/AIRFLOW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616705#comment-16616705 ] ASF GitHub Bot commented on AIRFLOW-3070: - kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI authentication-related docs URL: https://github.com/apache/incubator-airflow/pull/3863 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 4ff1ae3679..ce56e79b4b 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -2024,7 +2024,7 @@ class CLIFactory(object): 'conn_id', 'conn_uri', 'conn_extra') + tuple(alternative_conn_specs), }, { 'func': create_user, -'help': "Create an account for the Web UI", +'help': "Create an account for the Web UI (FAB-based)", 'args': ('role', 'username', 'email', 'firstname', 'lastname', 'password', 'use_random_password'), }, { diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 18c486cb1e..9a9f3aca61 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -265,6 +265,9 @@ access_logfile = - error_logfile = - # Expose the configuration file in the web server +# This is only applicable for the flask-admin based web UI (non FAB-based). +# In the FAB-based web UI with RBAC feature, +# access to configuration is controlled by role permissions. expose_config = False # Set to true to turn on authentication: diff --git a/docs/security.rst b/docs/security.rst index 253587afb1..60fe160404 100644 --- a/docs/security.rst +++ b/docs/security.rst @@ -16,9 +16,14 @@ Web Authentication Password +.. note:: + + This is for flask-admin based web UI only. If you are using FAB-based web UI with RBAC feature, + please use command line interface ``create_user`` to create accounts, or do that in the FAB-based UI itself. + One of the simplest mechanisms for authentication is requiring users to specify a password before logging in. Password authentication requires the used of the ``password`` subpackage in your requirements file. Password hashing -uses bcrypt before storing passwords. +uses ``bcrypt`` before storing passwords. .. code-block:: bash This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refine web UI authentication-related docs > -- > > Key: AIRFLOW-3070 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3070 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > Now in Airflow 1.10 we're already providing older version of web UI and new > FAB-based UI at the same time. But the documentation is not differentiated > very well. For example, > * this doc [https://airflow.apache.org/security.html#password] is only > applicable for old web UI only, but it's not hightlighted. > * command line tool {{create_user}} is only for new FAB-based UI only, it's > not highlighted as well. > This may be confusing to users, especially given not everyone is aware of the > existence of two UIs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3070) Refine web UI authentication-related docs
[ https://issues.apache.org/jira/browse/AIRFLOW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3070. - Resolution: Fixed Resolved by https://github.com/apache/incubator-airflow/pull/3863 > Refine web UI authentication-related docs > -- > > Key: AIRFLOW-3070 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3070 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > Now in Airflow 1.10 we're already providing older version of web UI and new > FAB-based UI at the same time. But the documentation is not differentiated > very well. For example, > * this doc [https://airflow.apache.org/security.html#password] is only > applicable for old web UI only, but it's not hightlighted. > * command line tool {{create_user}} is only for new FAB-based UI only, it's > not highlighted as well. > This may be confusing to users, especially given not everyone is aware of the > existence of two UIs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs
kaxil commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs URL: https://github.com/apache/incubator-airflow/pull/3863#issuecomment-421758473 @XD-DENG Hey, No worries :-) . The idea behind there is no point in creating a Jira where we are just fixing typos or a single line document change. Everything other than that should have a Jira :) Really appreciate all of your contributions. Keep contributing and thanks for this PR as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI authentication-related docs
kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI authentication-related docs URL: https://github.com/apache/incubator-airflow/pull/3863 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 4ff1ae3679..ce56e79b4b 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -2024,7 +2024,7 @@ class CLIFactory(object): 'conn_id', 'conn_uri', 'conn_extra') + tuple(alternative_conn_specs), }, { 'func': create_user, -'help': "Create an account for the Web UI", +'help': "Create an account for the Web UI (FAB-based)", 'args': ('role', 'username', 'email', 'firstname', 'lastname', 'password', 'use_random_password'), }, { diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 18c486cb1e..9a9f3aca61 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -265,6 +265,9 @@ access_logfile = - error_logfile = - # Expose the configuration file in the web server +# This is only applicable for the flask-admin based web UI (non FAB-based). +# In the FAB-based web UI with RBAC feature, +# access to configuration is controlled by role permissions. expose_config = False # Set to true to turn on authentication: diff --git a/docs/security.rst b/docs/security.rst index 253587afb1..60fe160404 100644 --- a/docs/security.rst +++ b/docs/security.rst @@ -16,9 +16,14 @@ Web Authentication Password +.. note:: + + This is for flask-admin based web UI only. If you are using FAB-based web UI with RBAC feature, + please use command line interface ``create_user`` to create accounts, or do that in the FAB-based UI itself. + One of the simplest mechanisms for authentication is requiring users to specify a password before logging in. Password authentication requires the used of the ``password`` subpackage in your requirements file. Password hashing -uses bcrypt before storing passwords. +uses ``bcrypt`` before storing passwords. .. code-block:: bash This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst
kaxil closed pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst URL: https://github.com/apache/incubator-airflow/pull/3904 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/timezone.rst b/docs/timezone.rst index fe44ecfbb9..2c6f30fd39 100644 --- a/docs/timezone.rst +++ b/docs/timezone.rst @@ -134,13 +134,13 @@ Cron schedules In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says -run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, +run at the end of interval every day at 08:00 GMT+1 it will always run at the end of interval 08:00 GMT+1, regardless if day light savings time is in place. Time deltas ''' For schedules with time deltas Airflow assumes you always will want to run with the specified interval. So if you -specify a timedelta(hours=2) you will always want to run to hours later. In this case day light savings time will +specify a timedelta(hours=2) you will always want to run two hours later. In this case day light savings time will be taken into account. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3059) PostgresToGCS log the number of rows
[ https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616698#comment-16616698 ] ASF GitHub Bot commented on AIRFLOW-3059: - kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres URL: https://github.com/apache/incubator-airflow/pull/3905 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/postgres_to_gcs_operator.py b/airflow/contrib/operators/postgres_to_gcs_operator.py index 850d858f94..78da78ee2f 100644 --- a/airflow/contrib/operators/postgres_to_gcs_operator.py +++ b/airflow/contrib/operators/postgres_to_gcs_operator.py @@ -133,28 +133,38 @@ def _write_local_data_files(self, cursor): contain the data for the GCS objects. """ schema = list(map(lambda schema_tuple: schema_tuple[0], cursor.description)) -file_no = 0 -tmp_file_handle = NamedTemporaryFile(delete=True) -tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} - -for row in cursor: -# Convert datetime objects to utc seconds, and decimals to floats -row = map(self.convert_types, row) -row_dict = dict(zip(schema, row)) - -s = json.dumps(row_dict, sort_keys=True) -if PY3: -s = s.encode('utf-8') -tmp_file_handle.write(s) - -# Append newline to make dumps BigQuery compatible. -tmp_file_handle.write(b'\n') - -# Stop if the file exceeds the file size limit. -if tmp_file_handle.tell() >= self.approx_max_file_size_bytes: -file_no += 1 -tmp_file_handle = NamedTemporaryFile(delete=True) -tmp_file_handles[self.filename.format(file_no)] = tmp_file_handle +tmp_file_handles = {} +row_no = 0 + +def _create_new_file(): +handle = NamedTemporaryFile(delete=True) +filename = self.filename.format(len(tmp_file_handles)) +tmp_file_handles[filename] = handle +return handle + +# Don't create a file if there is nothing to write +if cursor.rowcount > 0: +tmp_file_handle = _create_new_file() + +for row in cursor: +# Convert datetime objects to utc seconds, and decimals to floats +row = map(self.convert_types, row) +row_dict = dict(zip(schema, row)) + +s = json.dumps(row_dict, sort_keys=True) +if PY3: +s = s.encode('utf-8') +tmp_file_handle.write(s) + +# Append newline to make dumps BigQuery compatible. +tmp_file_handle.write(b'\n') + +# Stop if the file exceeds the file size limit. +if tmp_file_handle.tell() >= self.approx_max_file_size_bytes: +tmp_file_handle = _create_new_file() +row_no += 1 + +self.log.info('Received %s rows over %s files', row_no, len(tmp_file_handles)) return tmp_file_handles diff --git a/tests/contrib/operators/test_postgres_to_gcs_operator.py b/tests/contrib/operators/test_postgres_to_gcs_operator.py index ca72016974..1b6e731c3b 100644 --- a/tests/contrib/operators/test_postgres_to_gcs_operator.py +++ b/tests/contrib/operators/test_postgres_to_gcs_operator.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -25,40 +25,66 @@ import sys import unittest -from airflow.contrib.operators.postgres_to_gcs_operator import PostgresToGoogleCloudStorageOperator +from airflow.hooks.postgres_hook import PostgresHook +from airflow.contrib.operators.postgres_to_gcs_operator import \ +PostgresToGoogleCloudStorageOperator try: -from unittest import mock +from unittest.mock import patch except ImportError: try: -import mock +from mock import patch except ImportError: mock = None -PY3 = sys.version_info[0] == 3 +TABLES = {'postgres_to_gcs_operator', 'postgres_to_gcs_operator_empty'} TASK_ID = 'test-postgres-to-gcs' -POSTGRES_CONN_ID = 'postgres_conn_test' -SQL = 'select 1' +POSTGRES_CONN_ID = 'postgres_default' +SQL = 'SELECT * FROM postgres_to_gcs_operator' BUCKET = 'gs://test' FILENAME =
[jira] [Resolved] (AIRFLOW-3059) PostgresToGCS log the number of rows
[ https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3059. - Resolution: Fixed Assignee: Fokko Driesprong Fix Version/s: 2.0.0 Resolved by https://github.com/apache/incubator-airflow/pull/3905 > PostgresToGCS log the number of rows > > > Key: AIRFLOW-3059 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3059 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > > Log how many rows are being read from Postgres -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres
kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres URL: https://github.com/apache/incubator-airflow/pull/3905 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/postgres_to_gcs_operator.py b/airflow/contrib/operators/postgres_to_gcs_operator.py index 850d858f94..78da78ee2f 100644 --- a/airflow/contrib/operators/postgres_to_gcs_operator.py +++ b/airflow/contrib/operators/postgres_to_gcs_operator.py @@ -133,28 +133,38 @@ def _write_local_data_files(self, cursor): contain the data for the GCS objects. """ schema = list(map(lambda schema_tuple: schema_tuple[0], cursor.description)) -file_no = 0 -tmp_file_handle = NamedTemporaryFile(delete=True) -tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} - -for row in cursor: -# Convert datetime objects to utc seconds, and decimals to floats -row = map(self.convert_types, row) -row_dict = dict(zip(schema, row)) - -s = json.dumps(row_dict, sort_keys=True) -if PY3: -s = s.encode('utf-8') -tmp_file_handle.write(s) - -# Append newline to make dumps BigQuery compatible. -tmp_file_handle.write(b'\n') - -# Stop if the file exceeds the file size limit. -if tmp_file_handle.tell() >= self.approx_max_file_size_bytes: -file_no += 1 -tmp_file_handle = NamedTemporaryFile(delete=True) -tmp_file_handles[self.filename.format(file_no)] = tmp_file_handle +tmp_file_handles = {} +row_no = 0 + +def _create_new_file(): +handle = NamedTemporaryFile(delete=True) +filename = self.filename.format(len(tmp_file_handles)) +tmp_file_handles[filename] = handle +return handle + +# Don't create a file if there is nothing to write +if cursor.rowcount > 0: +tmp_file_handle = _create_new_file() + +for row in cursor: +# Convert datetime objects to utc seconds, and decimals to floats +row = map(self.convert_types, row) +row_dict = dict(zip(schema, row)) + +s = json.dumps(row_dict, sort_keys=True) +if PY3: +s = s.encode('utf-8') +tmp_file_handle.write(s) + +# Append newline to make dumps BigQuery compatible. +tmp_file_handle.write(b'\n') + +# Stop if the file exceeds the file size limit. +if tmp_file_handle.tell() >= self.approx_max_file_size_bytes: +tmp_file_handle = _create_new_file() +row_no += 1 + +self.log.info('Received %s rows over %s files', row_no, len(tmp_file_handles)) return tmp_file_handles diff --git a/tests/contrib/operators/test_postgres_to_gcs_operator.py b/tests/contrib/operators/test_postgres_to_gcs_operator.py index ca72016974..1b6e731c3b 100644 --- a/tests/contrib/operators/test_postgres_to_gcs_operator.py +++ b/tests/contrib/operators/test_postgres_to_gcs_operator.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -25,40 +25,66 @@ import sys import unittest -from airflow.contrib.operators.postgres_to_gcs_operator import PostgresToGoogleCloudStorageOperator +from airflow.hooks.postgres_hook import PostgresHook +from airflow.contrib.operators.postgres_to_gcs_operator import \ +PostgresToGoogleCloudStorageOperator try: -from unittest import mock +from unittest.mock import patch except ImportError: try: -import mock +from mock import patch except ImportError: mock = None -PY3 = sys.version_info[0] == 3 +TABLES = {'postgres_to_gcs_operator', 'postgres_to_gcs_operator_empty'} TASK_ID = 'test-postgres-to-gcs' -POSTGRES_CONN_ID = 'postgres_conn_test' -SQL = 'select 1' +POSTGRES_CONN_ID = 'postgres_default' +SQL = 'SELECT * FROM postgres_to_gcs_operator' BUCKET = 'gs://test' FILENAME = 'test_{}.ndjson' -# we expect the psycopg cursor to return encoded strs in py2 and decoded in py3 -if PY3: -ROWS = [('mock_row_content_1', 42), ('mock_row_content_2', 43), ('mock_row_content_3', 44)] -CURSOR_DESCRIPTION = (('some_str', 0), ('some_num',
[GitHub] Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow
Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow URL: https://github.com/apache/incubator-airflow/pull/2708#issuecomment-421754322 @ndmar The reason is takes soo long to get this merged is the poor quality of the PR. Don't get me wrong, but I invested some time in helping, but still a lot isn't addressed, for example: - Incomplete docstring. - Extremely minimal tests. - Not following the GIT guidelines regarding the formatting of the commits. - Not updating the docs. I'm happy to do another review, but first please get it to a decent level of quality. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko edited a comment on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code
Fokko edited a comment on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code URL: https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421753641 @ron819 I think it is a good idea to set the `schedule_interval`, but the link to the docs is old (refers to Airbnb). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code
Fokko commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code URL: https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421753641 @ron819 I think it is a good idea to set the `schedule_interval`, but the link to the docs is old (refers to AirBNB). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code
ron819 commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code URL: https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421750337 @Fokko is this still relevant? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow
eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow URL: https://github.com/apache/incubator-airflow/pull/2708#issuecomment-421746408 @Nadbkl can you add @ndmar? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3068) Remove deprecated import mechanism
[ https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616669#comment-16616669 ] ASF GitHub Bot commented on AIRFLOW-3068: - Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports URL: https://github.com/apache/incubator-airflow/pull/3906 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3068\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3068 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3068\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove deprecated import mechanism > -- > > Key: AIRFLOW-3068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3068 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > > In Airflow 2.0 we want to remove the deprecate import mechanism. > Before: > from airflow.operators import BashOperator > changes to > from airflow.operators.bash_operator import BashOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports
Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports URL: https://github.com/apache/incubator-airflow/pull/3906 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3068\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3068 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3068\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3905: [AIRFLOW-3059] Log how many rows are read from Postgres
Fokko commented on issue #3905: [AIRFLOW-3059] Log how many rows are read from Postgres URL: https://github.com/apache/incubator-airflow/pull/3905#issuecomment-421744788 @kaxil PTAL :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3059) PostgresToGCS log the number of rows
[ https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616637#comment-16616637 ] ASF GitHub Bot commented on AIRFLOW-3059: - Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres URL: https://github.com/apache/incubator-airflow/pull/3905 To know how many data is being read from Postgres, it is nice to log this to the Airflow log. Previously when there was no data, it would still create a single file. This is not something that we want, and therefore we've changed this behaviour. Refactored the tests to make use of Postgres itself since we have it running. This makes the tests more realistic, instead of mocking everything. Reopen version of #3898 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3059\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3059 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3059\], code changes always need a Jira issue. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > PostgresToGCS log the number of rows > > > Key: AIRFLOW-3059 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3059 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > > Log how many rows are being read from Postgres -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres
Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres URL: https://github.com/apache/incubator-airflow/pull/3905 To know how many data is being read from Postgres, it is nice to log this to the Airflow log. Previously when there was no data, it would still create a single file. This is not something that we want, and therefore we've changed this behaviour. Refactored the tests to make use of Postgres itself since we have it running. This makes the tests more realistic, instead of mocking everything. Reopen version of #3898 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3059\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3059 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3059\], code changes always need a Jira issue. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3071) Unable to clear Val of Variable from the UI
jack created AIRFLOW-3071: - Summary: Unable to clear Val of Variable from the UI Key: AIRFLOW-3071 URL: https://issues.apache.org/jira/browse/AIRFLOW-3071 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.0 Reporter: jack This is quite annoying bug. Reproduce steps: # Create a Variable. # Give the Variable a Val & save it. # Click edit Variable. You will see the Key with Red {color:#FF}*{color} and the value that you entered. # Remove the Val (leave the field blank) and click save. # No errors will appear. However if you will re-enter to the Variable you will see that the Blank value was not saved. Please allow to remove Val. This is also the intend behavior because the Val has no {color:#FF}*{color} near it. The current work around is to delete the Variable and re-create it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2902) Add S3ToBigQuery operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616627#comment-16616627 ] jack commented on AIRFLOW-2902: --- [~XD-DENG] is this something you can pick-up? > Add S3ToBigQuery operator > - > > Key: AIRFLOW-2902 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2902 > Project: Apache Airflow > Issue Type: Wish > Components: operators >Affects Versions: 1.10.0 >Reporter: jack >Priority: Major > > Please add operators that allow to move data between Amazon and Google > services. > I saw there is > [S3ToHiveTransfer|https://airflow.apache.org/integration.html#s3tohivetransfer] > operator... It described as: > "Moves data from S3 to Hive. The operator downloads a file from S3, stores > the file locally before loading it into a Hive table." > > So what I'm asking here is very similar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] dlebech commented on issue #3890: [AIRFLOW-3049] Add extra operations for Mongo hook
dlebech commented on issue #3890: [AIRFLOW-3049] Add extra operations for Mongo hook URL: https://github.com/apache/incubator-airflow/pull/3890#issuecomment-421715800 @Fokko thanks for the feedback. I've updated the parameter name and added tweaks to the documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG opened a new pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst
XD-DENG opened a new pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst URL: https://github.com/apache/incubator-airflow/pull/3904 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: To fix a typo. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Only minor doc change. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3058) Airflow log & multi-threading
[ https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616607#comment-16616607 ] jack commented on AIRFLOW-3058: --- [~ashb] I'm using local executor... > Airflow log & multi-threading > - > > Key: AIRFLOW-3058 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3058 > Project: Apache Airflow > Issue Type: Task >Reporter: jack >Priority: Major > Attachments: 456.PNG, Sni.PNG > > > The airflow log does not show messages in real time when executing scripts > with Multi-threading. > > for example: > > The left is the Airflow log time. the right is the actual time of the print > in my code. If I would execute the script without airflow the console will > show the times on the right. > !Sni.PNG! > {code:java} > 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 > 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 > Thread: Thread-1 Generate page: #0 run #0 with URL: > http://...=2=0=1000 > [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 > 14:14:55.231635 Thread: Thread-2 Generate page: #1 run #0 with URL: > http://...=2=1000=1000 > [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 > 14:14:55.233226 Thread: Thread-3 Generate page: #2 run #0 with URL: > http://...=2=2000=1000 > [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 > 14:14:55.234020 Thread: Thread-4 Generate page: #3 run #0 with URL: > http://...=2=3000=1000 > [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:43.100122 Thread: Thread-1 page 0 finished. Length is 1000 > [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:43.100877 Thread: Thread-1 Generate page: #4 run #0 with URL: > http://...=2=4000=1000 > [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:46.254536 Thread: Thread-3 page 2 finished. Length is 1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:46.255508 Thread: Thread-3 Generate page: #5 run #0 with URL: > http://...=2=5000=1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:51.096360 Thread: Thread-2 page 1 finished. Length is 1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:51.097269 Thread: Thread-2 Generate page: #6 run #0 with URL: > http://...=2=6000=1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:53.112621 Thread: Thread-4 page 3 finished. Length is 1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:15:53.113455 Thread: Thread-4 Generate page: #7 run #0 with URL: > http://...=2=7000=1000 > [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:16:37.345343 Thread: Thread-3 Generate page: #8 run #0 with URL: > http://...=2=8000=1000 > [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 > 14:16:37.701201 Thread: Thread-2 Generate page: #9 run #0 with URL: > http://...=2=9000=1000 > [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 > 14:16:47.283796 Thread: Thread-1 page 4 finished. Length is 1000 > [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: > [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 > 14:17:27.169359 Thread: Thread-2 page 9 finished. Length is 1000 > > {code} > This never happens when executing regular code.. Happens only with > multi-threading. I have some other scripts that the airflow print appears > after more than 30 minutes. > > Check this one: > hours of delay and then printing everything together. These are not real > time. the prints in the log has no correlation to the actual time the command > was executed. > > !456.PNG! -- This message was sent by Atlassian JIRA