[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617096#comment-16617096
 ] 

ASF GitHub Bot commented on AIRFLOW-2761:
-

yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in 
celery executor
URL: https://github.com/apache/incubator-airflow/pull/3910
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 4fd45b2c0e..124da34690 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -31,6 +31,11 @@ some bugs.
 The new `sync_parallelism` config option will control how many processes 
CeleryExecutor will use to
 fetch celery task state in parallel. Default value is max(1, number of cores - 
1)
 
+### New `log_processor_manager_location` config option
+
+The DAG parsing manager log now by default will be log into a file, where its 
location is
+controlled by the new `log_processor_manager_location` config option in core 
section.
+
 ## Airflow 1.10
 
 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` 
in your environment or
diff --git a/airflow/config_templates/airflow_local_settings.py 
b/airflow/config_templates/airflow_local_settings.py
index 95150ab3bb..634b21747e 100644
--- a/airflow/config_templates/airflow_local_settings.py
+++ b/airflow/config_templates/airflow_local_settings.py
@@ -20,6 +20,7 @@
 import os
 
 from airflow import configuration as conf
+from airflow.utils.file import mkdirs
 
 # TODO: Logging format and level should be configured
 # in this file instead of from airflow.cfg. Currently
@@ -38,7 +39,10 @@
 
 PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'CHILD_PROCESS_LOG_DIRECTORY')
 
+LOG_PROCESSOR_MANAGER_LOCATION = conf.get('core', 
'LOG_PROCESSOR_MANAGER_LOCATION')
+
 FILENAME_TEMPLATE = conf.get('core', 'LOG_FILENAME_TEMPLATE')
+
 PROCESSOR_FILENAME_TEMPLATE = conf.get('core', 
'LOG_PROCESSOR_FILENAME_TEMPLATE')
 
 # Storage bucket url for remote logging
@@ -79,7 +83,7 @@
 'formatter': 'airflow',
 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
 'filename_template': PROCESSOR_FILENAME_TEMPLATE,
-},
+}
 },
 'loggers': {
 'airflow.processor': {
@@ -104,6 +108,26 @@
 }
 }
 
+DEFAULT_DAG_PARSING_LOGGING_CONFIG = {
+'handlers': {
+'processor_manager': {
+'class': 'logging.handlers.RotatingFileHandler',
+'formatter': 'airflow',
+'filename': LOG_PROCESSOR_MANAGER_LOCATION,
+'mode': 'a',
+'maxBytes': 104857600,  # 100MB
+'backupCount': 5
+}
+},
+'loggers': {
+'airflow.processor_manager': {
+'handlers': ['processor_manager'],
+'level': LOG_LEVEL,
+'propagate': False,
+}
+}
+}
+
 REMOTE_HANDLERS = {
 's3': {
 'task': {
@@ -172,6 +196,20 @@
 
 REMOTE_LOGGING = conf.get('core', 'remote_logging')
 
+if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True':
+DEFAULT_LOGGING_CONFIG['handlers'] \
+.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'])
+DEFAULT_LOGGING_CONFIG['loggers'] \
+.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers'])
+
+# Manually create log directory for processor_manager handler as 
RotatingFileHandler
+# will only create file but not the directory.
+processor_manager_handler_config = 
DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][
+'processor_manager']
+directory = os.path.dirname(processor_manager_handler_config['filename'])
+if not os.path.exists(directory):
+mkdirs(directory, 0o777)
+
 if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
 DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
 elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('gs://'):
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 000dd67a13..367527b1c1 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -70,6 +70,7 @@ simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
 # we need to escape the curly braces by adding an additional curly brace
 log_filename_template =  ti.dag_id / ti.task_id / ts 
/ try_number .log
 log_processor_filename_template =  filename .log
+log_processor_manager_location = 
{AIRFLOW_HOME}/logs/dag_processor_manager/dag_processor_manager.log
 
 # Hostname by providing a path to a callable, which will resolve the hostname
 hostname_callable = socket:getfqdn
diff --git a/airflow/config_templates/default_test.cfg 

[GitHub] yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor

2018-09-16 Thread GitBox
yrqls21 closed pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in 
celery executor
URL: https://github.com/apache/incubator-airflow/pull/3910
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 4fd45b2c0e..124da34690 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -31,6 +31,11 @@ some bugs.
 The new `sync_parallelism` config option will control how many processes 
CeleryExecutor will use to
 fetch celery task state in parallel. Default value is max(1, number of cores - 
1)
 
+### New `log_processor_manager_location` config option
+
+The DAG parsing manager log now by default will be log into a file, where its 
location is
+controlled by the new `log_processor_manager_location` config option in core 
section.
+
 ## Airflow 1.10
 
 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` 
in your environment or
diff --git a/airflow/config_templates/airflow_local_settings.py 
b/airflow/config_templates/airflow_local_settings.py
index 95150ab3bb..634b21747e 100644
--- a/airflow/config_templates/airflow_local_settings.py
+++ b/airflow/config_templates/airflow_local_settings.py
@@ -20,6 +20,7 @@
 import os
 
 from airflow import configuration as conf
+from airflow.utils.file import mkdirs
 
 # TODO: Logging format and level should be configured
 # in this file instead of from airflow.cfg. Currently
@@ -38,7 +39,10 @@
 
 PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'CHILD_PROCESS_LOG_DIRECTORY')
 
+LOG_PROCESSOR_MANAGER_LOCATION = conf.get('core', 
'LOG_PROCESSOR_MANAGER_LOCATION')
+
 FILENAME_TEMPLATE = conf.get('core', 'LOG_FILENAME_TEMPLATE')
+
 PROCESSOR_FILENAME_TEMPLATE = conf.get('core', 
'LOG_PROCESSOR_FILENAME_TEMPLATE')
 
 # Storage bucket url for remote logging
@@ -79,7 +83,7 @@
 'formatter': 'airflow',
 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
 'filename_template': PROCESSOR_FILENAME_TEMPLATE,
-},
+}
 },
 'loggers': {
 'airflow.processor': {
@@ -104,6 +108,26 @@
 }
 }
 
+DEFAULT_DAG_PARSING_LOGGING_CONFIG = {
+'handlers': {
+'processor_manager': {
+'class': 'logging.handlers.RotatingFileHandler',
+'formatter': 'airflow',
+'filename': LOG_PROCESSOR_MANAGER_LOCATION,
+'mode': 'a',
+'maxBytes': 104857600,  # 100MB
+'backupCount': 5
+}
+},
+'loggers': {
+'airflow.processor_manager': {
+'handlers': ['processor_manager'],
+'level': LOG_LEVEL,
+'propagate': False,
+}
+}
+}
+
 REMOTE_HANDLERS = {
 's3': {
 'task': {
@@ -172,6 +196,20 @@
 
 REMOTE_LOGGING = conf.get('core', 'remote_logging')
 
+if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True':
+DEFAULT_LOGGING_CONFIG['handlers'] \
+.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'])
+DEFAULT_LOGGING_CONFIG['loggers'] \
+.update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers'])
+
+# Manually create log directory for processor_manager handler as 
RotatingFileHandler
+# will only create file but not the directory.
+processor_manager_handler_config = 
DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][
+'processor_manager']
+directory = os.path.dirname(processor_manager_handler_config['filename'])
+if not os.path.exists(directory):
+mkdirs(directory, 0o777)
+
 if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
 DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
 elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('gs://'):
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 000dd67a13..367527b1c1 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -70,6 +70,7 @@ simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
 # we need to escape the curly braces by adding an additional curly brace
 log_filename_template =  ti.dag_id / ti.task_id / ts 
/ try_number .log
 log_processor_filename_template =  filename .log
+log_processor_manager_location = 
{AIRFLOW_HOME}/logs/dag_processor_manager/dag_processor_manager.log
 
 # Hostname by providing a path to a callable, which will resolve the hostname
 hostname_callable = socket:getfqdn
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index f9279cce54..7f8f350970 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -39,6 +39,7 @@ logging_level = INFO
 fab_logging_level = WARN
 

[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617087#comment-16617087
 ] 

ASF GitHub Bot commented on AIRFLOW-2761:
-

yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize 
enqueue in celery executor
URL: https://github.com/apache/incubator-airflow/pull/3910
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parallelize Celery Executor enqueuing
> -
>
> Key: AIRFLOW-2761
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently celery executor enqueues in an async fashion but still doing that 
> in a single process loop. This can slows down scheduler loop and creates 
> scheduling delay if we have large # of task to schedule in a short time, e.g. 
> UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize enqueue in celery executor

2018-09-16 Thread GitBox
yrqls21 opened a new pull request #3910: [WIP][AIRFLOW-2761] Parallelize 
enqueue in celery executor
URL: https://github.com/apache/incubator-airflow/pull/3910
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3074) Missing options in ECS operator

2018-09-16 Thread Josh Carp (JIRA)
Josh Carp created AIRFLOW-3074:
--

 Summary: Missing options in ECS operator
 Key: AIRFLOW-3074
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3074
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: 1.10.0
Reporter: Josh Carp


The ECS operator exposes some options passed to the boto3 run_task method, but 
omits a few other useful options. For example, the networkConfiguration option, 
which is useful for setting security groups on a Fargate container, isn't 
exposed. The ECS operator should expose all run_task options that are relevant.

I wrote up a quick patch for this issue; I'll amend the pull request to 
reference this report.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616988#comment-16616988
 ] 

ASF GitHub Bot commented on AIRFLOW-3073:
-

XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling 
feature not supported in new webserver
URL: https://github.com/apache/incubator-airflow/pull/3909
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3073
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   `Adhoc queries` and `Charts` features are no longer supported in new 
FAB-based webserver and UI. But this is not mentioned at all in the doc "Data 
Profiling" (https://airflow.incubator.apache.org/profiling.html). This may 
confuse users.
   
   Ref: 
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes
   
   This commit adds a note to remind users for this.
   
   (Later after the `www` is removed, this `profiling.rst` should be removed)
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Documentation chagne.
   
   
   ### Commits
   
   - [] My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> A note is needed in 'Data Profiling' doc page to reminder users it's no 
> longer supported in new webserver UI
> 
>
> Key: AIRFLOW-3073
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3073
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned 
> at all that these features are no longer supported in new webser (FAB-based) 
> due to security concern 
> (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling feature not supported in new webserver

2018-09-16 Thread GitBox
XD-DENG opened a new pull request #3909: [AIRFLOW-3073] Add note-Profiling 
feature not supported in new webserver
URL: https://github.com/apache/incubator-airflow/pull/3909
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3073
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   `Adhoc queries` and `Charts` features are no longer supported in new 
FAB-based webserver and UI. But this is not mentioned at all in the doc "Data 
Profiling" (https://airflow.incubator.apache.org/profiling.html). This may 
confuse users.
   
   Ref: 
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes
   
   This commit adds a note to remind users for this.
   
   (Later after the `www` is removed, this `profiling.rst` should be removed)
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Documentation chagne.
   
   
   ### Commits
   
   - [] My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI

2018-09-16 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG updated AIRFLOW-3073:
---
Description: In [https://airflow.incubator.apache.org/profiling.html,] it's 
not mentioned at all that these features are no longer supported in new webser 
(FAB-based) due to security concern 
(https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes).
  (was: In [https://airflow.incubator.apache.org/profiling.html,] it's not 
mentioned at all that these features are no longer supported in new webser 
(FAB-based) due to security concern.)

> A note is needed in 'Data Profiling' doc page to reminder users it's no 
> longer supported in new webserver UI
> 
>
> Key: AIRFLOW-3073
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3073
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned 
> at all that these features are no longer supported in new webser (FAB-based) 
> due to security concern 
> (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI

2018-09-16 Thread Xiaodong DENG (JIRA)
Xiaodong DENG created AIRFLOW-3073:
--

 Summary: A note is needed in 'Data Profiling' doc page to reminder 
users it's no longer supported in new webserver UI
 Key: AIRFLOW-3073
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3073
 Project: Apache Airflow
  Issue Type: Improvement
  Components: Documentation
Reporter: Xiaodong DENG
Assignee: Xiaodong DENG


In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned at 
all that these features are no longer supported in new webser (FAB-based) due 
to security concern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jmcarp opened a new pull request #3908: [WIP] Add relevant ECS options to ECS operator.

2018-09-16 Thread GitBox
jmcarp opened a new pull request #3908: [WIP] Add relevant ECS options to ECS 
operator.
URL: https://github.com/apache/incubator-airflow/pull/3908
 
 
   The ECS operator currently supports only a subset of available options
   for running ECS tasks. This patch adds all ECS options that could be
   relevant to airflow; options that wouldn't make sense here, like
   `count`, were skipped.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger Parent dag_runs

2018-09-16 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-1195:

Summary: Cleared tasks in SubDagOperator do not trigger Parent dag_runs  
(was: Cleared tasks in SubDagOperator do not trigger upstream dag_runs)

> Cleared tasks in SubDagOperator do not trigger Parent dag_runs
> --
>
> Key: AIRFLOW-1195
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1195
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
>Reporter: Paul Zaczkieiwcz
>Assignee: Kaxil Naik
>Priority: Minor
> Attachments: example_subdag_operator.not-cleared.png, 
> example_subdag_operator.section-2.cleared.png
>
>
> Let's say that you had a task fail in a SubDag.  You fix the underlying issue 
> and want Airflow to resume the DagRun where it left off.  If this were a flat 
> DAG, then all you need to do is clear the failed TaskInstance and its 
> downstream dependencies. The GUI will happily clear all of them for you in a 
> single PUT request!  In order to resume a SubDag, you must clear the 
> TaskInstance + downstream dependencies AND you must clear the SubDagOperator 
> + downstream depencies WITHOUT clearing its recursive dependencies. There 
> should be an option to recursively clear task instances in upstream SubDags.
> The attached files use the example_subdag_operator DAG to illustrate the 
> problem.  Before the screenshot, I ran the operator to completion, then 
> cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that 
> {{example_subdag_operator.section-2}} is in the `running` state, but 
> {{example_subdag_operator}} is still in the `success` state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger upstream dag_runs

2018-09-16 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-1195:
---

Assignee: Kaxil Naik

> Cleared tasks in SubDagOperator do not trigger upstream dag_runs
> 
>
> Key: AIRFLOW-1195
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1195
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
>Reporter: Paul Zaczkieiwcz
>Assignee: Kaxil Naik
>Priority: Minor
> Attachments: example_subdag_operator.not-cleared.png, 
> example_subdag_operator.section-2.cleared.png
>
>
> Let's say that you had a task fail in a SubDag.  You fix the underlying issue 
> and want Airflow to resume the DagRun where it left off.  If this were a flat 
> DAG, then all you need to do is clear the failed TaskInstance and its 
> downstream dependencies. The GUI will happily clear all of them for you in a 
> single PUT request!  In order to resume a SubDag, you must clear the 
> TaskInstance + downstream dependencies AND you must clear the SubDagOperator 
> + downstream depencies WITHOUT clearing its recursive dependencies. There 
> should be an option to recursively clear task instances in upstream SubDags.
> The attached files use the example_subdag_operator DAG to illustrate the 
> problem.  Before the screenshot, I ran the operator to completion, then 
> cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that 
> {{example_subdag_operator.section-2}} is in the `running` state, but 
> {{example_subdag_operator}} is still in the `success` state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors

2018-09-16 Thread GitBox
feng-tao commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of 
sensors
URL: 
https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421856616
 
 
   I will try to spend some time on the pr tomorrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear tasks in Parent Dag

2018-09-16 Thread GitBox
kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear 
tasks in Parent Dag
URL: https://github.com/apache/incubator-airflow/pull/3907
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1195
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Details on Jira
   
   Let's say that you had a task fail in a SubDag. You fix the underlying issue 
and want Airflow to resume the DagRun where it left off. If this were a flat 
DAG, then all you need to do is clear the failed TaskInstance and its 
downstream dependencies. The GUI will happily clear all of them for you in a 
single PUT request! In order to resume a SubDag, you must clear the 
TaskInstance + downstream dependencies AND you must clear the SubDagOperator + 
downstream depencies WITHOUT clearing its recursive dependencies. There should 
be an option to recursively clear task instances in upstream SubDags.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-1195) Cleared tasks in SubDagOperator do not trigger upstream dag_runs

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616902#comment-16616902
 ] 

ASF GitHub Bot commented on AIRFLOW-1195:
-

kaxil opened a new pull request #3907: WIP: [AIRFLOW-1195] Add feature to clear 
tasks in Parent Dag
URL: https://github.com/apache/incubator-airflow/pull/3907
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1195
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Details on Jira
   
   Let's say that you had a task fail in a SubDag. You fix the underlying issue 
and want Airflow to resume the DagRun where it left off. If this were a flat 
DAG, then all you need to do is clear the failed TaskInstance and its 
downstream dependencies. The GUI will happily clear all of them for you in a 
single PUT request! In order to resume a SubDag, you must clear the 
TaskInstance + downstream dependencies AND you must clear the SubDagOperator + 
downstream depencies WITHOUT clearing its recursive dependencies. There should 
be an option to recursively clear task instances in upstream SubDags.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cleared tasks in SubDagOperator do not trigger upstream dag_runs
> 
>
> Key: AIRFLOW-1195
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1195
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
>Reporter: Paul Zaczkieiwcz
>Priority: Minor
> Attachments: example_subdag_operator.not-cleared.png, 
> example_subdag_operator.section-2.cleared.png
>
>
> Let's say that you had a task fail in a SubDag.  You fix the underlying issue 
> and want Airflow to resume the DagRun where it left off.  If this were a flat 
> DAG, then all you need to do is clear the failed TaskInstance and its 
> downstream dependencies. The GUI will happily clear all of them for you in a 
> single PUT request!  In order to resume a SubDag, you must clear the 
> TaskInstance + downstream dependencies AND you must clear the SubDagOperator 
> + downstream depencies WITHOUT clearing its recursive dependencies. There 
> should be an option to recursively clear task instances in upstream SubDags.
> The attached files use the example_subdag_operator DAG to illustrate the 
> problem.  Before the screenshot, I ran the operator to completion, then 
> cleared {{example_subdag_operator.section-2.section-2-task-5}}. Notice that 
> {{example_subdag_operator.section-2}} is in the `running` state, but 
> {{example_subdag_operator}} is still in the `success` state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3072) Only admin can view logs in RBAC UI

2018-09-16 Thread Stefan Seelmann (JIRA)
Stefan Seelmann created AIRFLOW-3072:


 Summary: Only admin can view logs in RBAC UI
 Key: AIRFLOW-3072
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3072
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui
Affects Versions: 1.10.0
Reporter: Stefan Seelmann
Assignee: Stefan Seelmann


With RBAC enabled, only users with role admin can view logs.

Cause is that there is no permission for {{get_logs_with_metadata}} defined in 
{{security.py}}.

My suggestion is to add the permission and assign tog viewer role. Or is there 
a cause why only admin should be able to see logs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] seelmann opened a new pull request #1: Improve Docker image for local development

2018-09-16 Thread GitBox
seelmann opened a new pull request #1: Improve Docker image for local 
development
URL: https://github.com/apache/incubator-airflow-ci/pull/1
 
 
   * Expose port 8080 to host for accessing webserver
   * Install nodejs to be able to build `www_rbac` assets
   * Install virtualenv to be able to install dependencies
   * Install additional tools for easier debugging
   
   I started to use this docker image for local development, but some pieces 
were missing. I hope those changes don't have negative impact on CI tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work started] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2747 started by Stefan Seelmann.

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] seelmann edited a comment on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors

2018-09-16 Thread GitBox
seelmann edited a comment on issue #3596: [AIRFLOW-2747] Explicit re-schedule 
of sensors
URL: 
https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421815236
 
 
   @Fokko I rebased again and made following changes:
   
   * Instead of using a `reschedule` boolean flag I switched to `mode` property 
with possible values `poke` (default) and `reschedule` to allow other modes in 
future (e.g. `in_scheduler` as mentioned by Maxime).
   * Changed the Gantt view based on feedback from Pedro, see screenshots in 
the Jira issue.
   * Changed the CSS class in Gantt for the `NONE` state to display as white 
instead of black, streamlined to graph and tree view.
   * Added more tests and updated documentation
   
   From my PoV it's ready to be merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors

2018-09-16 Thread GitBox
seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of 
sensors
URL: 
https://github.com/apache/incubator-airflow/pull/3596#issuecomment-421815236
 
 
   @Fokko I rebased again and made following chages:
   
   * Instead of using a `reschedule` boolean flag I switched to `mode` property 
with possible values `poke` (default) and `reschedule` to allow other modes in 
future (e.g. `in_scheduler` as mentioned by Maxime).
   * Changed the Gantt view based on feedback from Pedro, see screenshots in 
the Jira issue.
   * Changed the CSS class in Gantt for the `NONE` state to display as white 
instead of black, streamlined to graph and tree view.
   * Added more tests and updated documentation
   
   From my PoV it's ready to be merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Attachment: Screenshot_2018-09-16_20-09-28.png

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616842#comment-16616842
 ] 

Stefan Seelmann commented on AIRFLOW-2747:
--

I changed the Gantt view to not show each individual reschedule but only a 
single bar. The color changes between light green (if currently running) and 
white (if currently inactive), those colors are also shown in other views so 
it's consistent. However failed attempts are still shown as separate bar (as 
before). Attached two screenshots for demonstration.

!Screenshot_2018-09-16_20-19-23.png!!Screenshot_2018-09-16_20-09-28.png!

 

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-09-16 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated AIRFLOW-2747:
-
Attachment: Screenshot_2018-09-16_20-19-23.png

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-19-23.png
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View

2018-09-16 Thread GitBox
r39132 commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph 
View
URL: 
https://github.com/apache/incubator-airflow/pull/1910#issuecomment-421794784
 
 
   @robin-miller-ow A few questions:
   1. Would you be interested in making this work on all views, instead of just 
on the Graph View? 
   2. Also, instead of making the update in the configuration file, it would 
make more sense for each user to pick a refresh interval that made sense to 
him/her. Hence, it would be better to surface a UI control that allowed folks 
to pick a refresh interval that could persisted via a cookie from one session 
to another.
   
   Let me know if you'd like to take this forward. I'd be happy to stick with 
this to see it merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a command line argument to webserver

2018-09-16 Thread GitBox
r39132 commented on issue #1869: [AIRFLOW-571] added --forwarded_allow_ips as a 
command line argument to webserver
URL: 
https://github.com/apache/incubator-airflow/pull/1869#issuecomment-421793313
 
 
   Hi @dennisobrien 
   How would you like to proceed with this PR? Would you like to take another 
crack? If so, I'd be happy to take a look at it. If I don't hear back in a few 
days, I'll close this out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism

2018-09-16 Thread GitBox
r39132 commented on issue #1601: [AIRFLOW-249] Refactor the SLA mechanism
URL: 
https://github.com/apache/incubator-airflow/pull/1601#issuecomment-421792600
 
 
   Hi @dud225 
   I wanted to see if you were still interested in taking a crack at this. If I 
don't hear back in a few days, I'll close this. You are welcome to reopen it 
however.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-3068) Remove deprecated import mechanism

2018-09-16 Thread Siddharth Anand (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand closed AIRFLOW-3068.

Resolution: Fixed

> Remove deprecated import mechanism
> --
>
> Key: AIRFLOW-3068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3068
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>
> In Airflow 2.0 we want to remove the deprecate import mechanism.
> Before:
> from airflow.operators import BashOperator 
> changes to
> from airflow.operators.bash_operator import BashOperator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3068) Remove deprecated import mechanism

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616781#comment-16616781
 ] 

ASF GitHub Bot commented on AIRFLOW-3068:
-

r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports
URL: https://github.com/apache/incubator-airflow/pull/3906
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/__init__.py b/airflow/__init__.py
index d010fe4c74..9d071a347c 100644
--- a/airflow/__init__.py
+++ b/airflow/__init__.py
@@ -79,16 +79,3 @@ class AirflowViewPlugin(BaseView):
 class AirflowMacroPlugin(object):
 def __init__(self, namespace):
 self.namespace = namespace
-
-
-from airflow import operators  # noqa: E402
-from airflow import sensors  # noqa: E402
-from airflow import hooks  # noqa: E402
-from airflow import executors  # noqa: E402
-from airflow import macros  # noqa: E402
-
-operators._integrate_plugins()
-sensors._integrate_plugins()  # noqa: E402
-hooks._integrate_plugins()
-executors._integrate_plugins()
-macros._integrate_plugins()
diff --git a/airflow/contrib/hooks/__init__.py 
b/airflow/contrib/hooks/__init__.py
index 6c31842578..b7f8352944 100644
--- a/airflow/contrib/hooks/__init__.py
+++ b/airflow/contrib/hooks/__init__.py
@@ -17,54 +17,3 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-
-
-# Contrib hooks are not imported by default. They should be accessed
-# directly: from airflow.contrib.hooks.hook_module import Hook
-
-
-import sys
-import os
-
-# 
-#
-# #TODO #FIXME Airflow 2.0
-#
-# Old import machinary below.
-#
-# This is deprecated but should be kept until Airflow 2.0
-# for compatibility.
-#
-# 
-_hooks = {
-'docker_hook': ['DockerHook'],
-'ftp_hook': ['FTPHook'],
-'ftps_hook': ['FTPSHook'],
-'vertica_hook': ['VerticaHook'],
-'ssh_hook': ['SSHHook'],
-'winrm_hook': ['WinRMHook'],
-'sftp_hook': ['SFTPHook'],
-'bigquery_hook': ['BigQueryHook'],
-'qubole_hook': ['QuboleHook'],
-'gcs_hook': ['GoogleCloudStorageHook'],
-'datastore_hook': ['DatastoreHook'],
-'gcp_cloudml_hook': ['CloudMLHook'],
-'redshift_hook': ['RedshiftHook'],
-'gcp_dataproc_hook': ['DataProcHook'],
-'gcp_dataflow_hook': ['DataFlowHook'],
-'spark_submit_operator': ['SparkSubmitOperator'],
-'cloudant_hook': ['CloudantHook'],
-'fs_hook': ['FSHook'],
-'wasb_hook': ['WasbHook'],
-'gcp_pubsub_hook': ['PubSubHook'],
-'jenkins_hook': ['JenkinsHook'],
-'aws_dynamodb_hook': ['AwsDynamoDBHook'],
-'azure_data_lake_hook': ['AzureDataLakeHook'],
-'azure_fileshare_hook': ['AzureFileShareHook'],
-}
-
-
-if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from airflow.utils.helpers import AirflowImporter
-
-airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
diff --git a/airflow/contrib/operators/__init__.py 
b/airflow/contrib/operators/__init__.py
index 247ec5941f..b7f8352944 100644
--- a/airflow/contrib/operators/__init__.py
+++ b/airflow/contrib/operators/__init__.py
@@ -17,37 +17,3 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-
-
-# Contrib operators are not imported by default. They should be accessed
-# directly: from airflow.contrib.operators.operator_module import Operator
-
-
-import sys
-import os
-
-# 
-#
-# #TODO #FIXME Airflow 2.0
-#
-# Old import machinary below.
-#
-# This is deprecated but should be kept until Airflow 2.0
-# for compatibility.
-#
-# 
-_operators = {
-'ssh_operator': ['SSHOperator'],
-'winrm_operator': ['WinRMOperator'],
-'vertica_operator': ['VerticaOperator'],
-'vertica_to_hive': ['VerticaToHiveTransfer'],
-'qubole_operator': ['QuboleOperator'],
-'spark_submit_operator': ['SparkSubmitOperator'],
-'file_to_wasb': ['FileToWasbOperator'],
-'fs_operator': ['FileSensor'],
-'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator']
-}
-
-if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from airflow.utils.helpers import AirflowImporter
-airflow_importer = AirflowImporter(sys.modules[__name__], _operators)
diff --git a/airflow/contrib/operators/mlengine_operator.py 
b/airflow/contrib/operators/mlengine_operator.py
index c3d9075618..8091ceefff 100644
--- a/airflow/contrib/operators/mlengine_operator.py
+++ b/airflow/contrib/operators/mlengine_operator.py
@@ -19,7 +19,7 @@
 
 from 

[GitHub] r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports

2018-09-16 Thread GitBox
r39132 closed pull request #3906: [AIRFLOW-3068] Remove deprecated imports
URL: https://github.com/apache/incubator-airflow/pull/3906
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/__init__.py b/airflow/__init__.py
index d010fe4c74..9d071a347c 100644
--- a/airflow/__init__.py
+++ b/airflow/__init__.py
@@ -79,16 +79,3 @@ class AirflowViewPlugin(BaseView):
 class AirflowMacroPlugin(object):
 def __init__(self, namespace):
 self.namespace = namespace
-
-
-from airflow import operators  # noqa: E402
-from airflow import sensors  # noqa: E402
-from airflow import hooks  # noqa: E402
-from airflow import executors  # noqa: E402
-from airflow import macros  # noqa: E402
-
-operators._integrate_plugins()
-sensors._integrate_plugins()  # noqa: E402
-hooks._integrate_plugins()
-executors._integrate_plugins()
-macros._integrate_plugins()
diff --git a/airflow/contrib/hooks/__init__.py 
b/airflow/contrib/hooks/__init__.py
index 6c31842578..b7f8352944 100644
--- a/airflow/contrib/hooks/__init__.py
+++ b/airflow/contrib/hooks/__init__.py
@@ -17,54 +17,3 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-
-
-# Contrib hooks are not imported by default. They should be accessed
-# directly: from airflow.contrib.hooks.hook_module import Hook
-
-
-import sys
-import os
-
-# 
-#
-# #TODO #FIXME Airflow 2.0
-#
-# Old import machinary below.
-#
-# This is deprecated but should be kept until Airflow 2.0
-# for compatibility.
-#
-# 
-_hooks = {
-'docker_hook': ['DockerHook'],
-'ftp_hook': ['FTPHook'],
-'ftps_hook': ['FTPSHook'],
-'vertica_hook': ['VerticaHook'],
-'ssh_hook': ['SSHHook'],
-'winrm_hook': ['WinRMHook'],
-'sftp_hook': ['SFTPHook'],
-'bigquery_hook': ['BigQueryHook'],
-'qubole_hook': ['QuboleHook'],
-'gcs_hook': ['GoogleCloudStorageHook'],
-'datastore_hook': ['DatastoreHook'],
-'gcp_cloudml_hook': ['CloudMLHook'],
-'redshift_hook': ['RedshiftHook'],
-'gcp_dataproc_hook': ['DataProcHook'],
-'gcp_dataflow_hook': ['DataFlowHook'],
-'spark_submit_operator': ['SparkSubmitOperator'],
-'cloudant_hook': ['CloudantHook'],
-'fs_hook': ['FSHook'],
-'wasb_hook': ['WasbHook'],
-'gcp_pubsub_hook': ['PubSubHook'],
-'jenkins_hook': ['JenkinsHook'],
-'aws_dynamodb_hook': ['AwsDynamoDBHook'],
-'azure_data_lake_hook': ['AzureDataLakeHook'],
-'azure_fileshare_hook': ['AzureFileShareHook'],
-}
-
-
-if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from airflow.utils.helpers import AirflowImporter
-
-airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
diff --git a/airflow/contrib/operators/__init__.py 
b/airflow/contrib/operators/__init__.py
index 247ec5941f..b7f8352944 100644
--- a/airflow/contrib/operators/__init__.py
+++ b/airflow/contrib/operators/__init__.py
@@ -17,37 +17,3 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-
-
-# Contrib operators are not imported by default. They should be accessed
-# directly: from airflow.contrib.operators.operator_module import Operator
-
-
-import sys
-import os
-
-# 
-#
-# #TODO #FIXME Airflow 2.0
-#
-# Old import machinary below.
-#
-# This is deprecated but should be kept until Airflow 2.0
-# for compatibility.
-#
-# 
-_operators = {
-'ssh_operator': ['SSHOperator'],
-'winrm_operator': ['WinRMOperator'],
-'vertica_operator': ['VerticaOperator'],
-'vertica_to_hive': ['VerticaToHiveTransfer'],
-'qubole_operator': ['QuboleOperator'],
-'spark_submit_operator': ['SparkSubmitOperator'],
-'file_to_wasb': ['FileToWasbOperator'],
-'fs_operator': ['FileSensor'],
-'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator']
-}
-
-if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from airflow.utils.helpers import AirflowImporter
-airflow_importer = AirflowImporter(sys.modules[__name__], _operators)
diff --git a/airflow/contrib/operators/mlengine_operator.py 
b/airflow/contrib/operators/mlengine_operator.py
index c3d9075618..8091ceefff 100644
--- a/airflow/contrib/operators/mlengine_operator.py
+++ b/airflow/contrib/operators/mlengine_operator.py
@@ -19,7 +19,7 @@
 
 from airflow.contrib.hooks.gcp_mlengine_hook import MLEngineHook
 from airflow.exceptions import AirflowException
-from airflow.operators import BaseOperator
+from airflow.models import BaseOperator
 from airflow.utils.decorators import apply_defaults
 from 

[GitHub] r39132 commented on issue #3906: [AIRFLOW-3068] Remove deprecated imports

2018-09-16 Thread GitBox
r39132 commented on issue #3906: [AIRFLOW-3068] Remove deprecated imports
URL: 
https://github.com/apache/incubator-airflow/pull/3906#issuecomment-421789103
 
 
   +1
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs

2018-09-16 Thread GitBox
XD-DENG commented on issue #3863: [AIRFLOW-3070] Refine web UI 
authentication-related docs
URL: 
https://github.com/apache/incubator-airflow/pull/3863#issuecomment-421768717
 
 
   Thanks @kaxil . Appreciate the time you guys contributed as well. Especially 
now, a Sunday :) 
   
   A nice afternoon ahead.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3070) Refine web UI authentication-related docs

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616705#comment-16616705
 ] 

ASF GitHub Bot commented on AIRFLOW-3070:
-

kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI 
authentication-related docs
URL: https://github.com/apache/incubator-airflow/pull/3863
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 4ff1ae3679..ce56e79b4b 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -2024,7 +2024,7 @@ class CLIFactory(object):
  'conn_id', 'conn_uri', 'conn_extra') + 
tuple(alternative_conn_specs),
 }, {
 'func': create_user,
-'help': "Create an account for the Web UI",
+'help': "Create an account for the Web UI (FAB-based)",
 'args': ('role', 'username', 'email', 'firstname', 'lastname',
  'password', 'use_random_password'),
 }, {
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 18c486cb1e..9a9f3aca61 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -265,6 +265,9 @@ access_logfile = -
 error_logfile = -
 
 # Expose the configuration file in the web server
+# This is only applicable for the flask-admin based web UI (non FAB-based).
+# In the FAB-based web UI with RBAC feature,
+# access to configuration is controlled by role permissions.
 expose_config = False
 
 # Set to true to turn on authentication:
diff --git a/docs/security.rst b/docs/security.rst
index 253587afb1..60fe160404 100644
--- a/docs/security.rst
+++ b/docs/security.rst
@@ -16,9 +16,14 @@ Web Authentication
 Password
 
 
+.. note::
+
+   This is for flask-admin based web UI only. If you are using FAB-based web 
UI with RBAC feature,
+   please use command line interface ``create_user`` to create accounts, or do 
that in the FAB-based UI itself.
+
 One of the simplest mechanisms for authentication is requiring users to 
specify a password before logging in.
 Password authentication requires the used of the ``password`` subpackage in 
your requirements file. Password hashing
-uses bcrypt before storing passwords.
+uses ``bcrypt`` before storing passwords.
 
 .. code-block:: bash
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refine web UI authentication-related docs 
> --
>
> Key: AIRFLOW-3070
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3070
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> Now in Airflow 1.10 we're already providing older version of web UI and new 
> FAB-based UI at the same time. But the documentation is not differentiated 
> very well. For example,
>  * this doc [https://airflow.apache.org/security.html#password] is only 
> applicable for old web UI only, but it's not hightlighted.
>  * command line tool {{create_user}} is only for new FAB-based UI only, it's 
> not highlighted as well.
> This may be confusing to users, especially given not everyone is aware of the 
> existence of two UIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3070) Refine web UI authentication-related docs

2018-09-16 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3070.
-
Resolution: Fixed

Resolved by https://github.com/apache/incubator-airflow/pull/3863

> Refine web UI authentication-related docs 
> --
>
> Key: AIRFLOW-3070
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3070
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> Now in Airflow 1.10 we're already providing older version of web UI and new 
> FAB-based UI at the same time. But the documentation is not differentiated 
> very well. For example,
>  * this doc [https://airflow.apache.org/security.html#password] is only 
> applicable for old web UI only, but it's not hightlighted.
>  * command line tool {{create_user}} is only for new FAB-based UI only, it's 
> not highlighted as well.
> This may be confusing to users, especially given not everyone is aware of the 
> existence of two UIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil commented on issue #3863: [AIRFLOW-3070] Refine web UI authentication-related docs

2018-09-16 Thread GitBox
kaxil commented on issue #3863: [AIRFLOW-3070] Refine web UI 
authentication-related docs
URL: 
https://github.com/apache/incubator-airflow/pull/3863#issuecomment-421758473
 
 
   @XD-DENG Hey, No worries :-) . The idea behind there is no point in creating 
a Jira where we are just fixing typos or a single line document change. 
Everything other than that should have a Jira :) 
   
   Really appreciate all of your contributions. Keep contributing   and thanks 
for this PR as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI authentication-related docs

2018-09-16 Thread GitBox
kaxil closed pull request #3863: [AIRFLOW-3070] Refine web UI 
authentication-related docs
URL: https://github.com/apache/incubator-airflow/pull/3863
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 4ff1ae3679..ce56e79b4b 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -2024,7 +2024,7 @@ class CLIFactory(object):
  'conn_id', 'conn_uri', 'conn_extra') + 
tuple(alternative_conn_specs),
 }, {
 'func': create_user,
-'help': "Create an account for the Web UI",
+'help': "Create an account for the Web UI (FAB-based)",
 'args': ('role', 'username', 'email', 'firstname', 'lastname',
  'password', 'use_random_password'),
 }, {
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 18c486cb1e..9a9f3aca61 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -265,6 +265,9 @@ access_logfile = -
 error_logfile = -
 
 # Expose the configuration file in the web server
+# This is only applicable for the flask-admin based web UI (non FAB-based).
+# In the FAB-based web UI with RBAC feature,
+# access to configuration is controlled by role permissions.
 expose_config = False
 
 # Set to true to turn on authentication:
diff --git a/docs/security.rst b/docs/security.rst
index 253587afb1..60fe160404 100644
--- a/docs/security.rst
+++ b/docs/security.rst
@@ -16,9 +16,14 @@ Web Authentication
 Password
 
 
+.. note::
+
+   This is for flask-admin based web UI only. If you are using FAB-based web 
UI with RBAC feature,
+   please use command line interface ``create_user`` to create accounts, or do 
that in the FAB-based UI itself.
+
 One of the simplest mechanisms for authentication is requiring users to 
specify a password before logging in.
 Password authentication requires the used of the ``password`` subpackage in 
your requirements file. Password hashing
-uses bcrypt before storing passwords.
+uses ``bcrypt`` before storing passwords.
 
 .. code-block:: bash
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil closed pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst

2018-09-16 Thread GitBox
kaxil closed pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst
URL: https://github.com/apache/incubator-airflow/pull/3904
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/timezone.rst b/docs/timezone.rst
index fe44ecfbb9..2c6f30fd39 100644
--- a/docs/timezone.rst
+++ b/docs/timezone.rst
@@ -134,13 +134,13 @@ Cron schedules
 
 In case you set a cron schedule, Airflow assumes you will always want to run 
at the exact same time. It will
 then ignore day light savings time. Thus, if you have a schedule that says
-run at end of interval every day at 08:00 GMT+1 it will always run end of 
interval 08:00 GMT+1,
+run at the end of interval every day at 08:00 GMT+1 it will always run at the 
end of interval 08:00 GMT+1,
 regardless if day light savings time is in place.
 
 
 Time deltas
 '''
 For schedules with time deltas Airflow assumes you always will want to run 
with the specified interval. So if you
-specify a timedelta(hours=2) you will always want to run to hours later. In 
this case day light savings time will
+specify a timedelta(hours=2) you will always want to run two hours later. In 
this case day light savings time will
 be taken into account.
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3059) PostgresToGCS log the number of rows

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616698#comment-16616698
 ] 

ASF GitHub Bot commented on AIRFLOW-3059:
-

kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from 
Postgres
URL: https://github.com/apache/incubator-airflow/pull/3905
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/postgres_to_gcs_operator.py 
b/airflow/contrib/operators/postgres_to_gcs_operator.py
index 850d858f94..78da78ee2f 100644
--- a/airflow/contrib/operators/postgres_to_gcs_operator.py
+++ b/airflow/contrib/operators/postgres_to_gcs_operator.py
@@ -133,28 +133,38 @@ def _write_local_data_files(self, cursor):
 contain the data for the GCS objects.
 """
 schema = list(map(lambda schema_tuple: schema_tuple[0], 
cursor.description))
-file_no = 0
-tmp_file_handle = NamedTemporaryFile(delete=True)
-tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
-
-for row in cursor:
-# Convert datetime objects to utc seconds, and decimals to floats
-row = map(self.convert_types, row)
-row_dict = dict(zip(schema, row))
-
-s = json.dumps(row_dict, sort_keys=True)
-if PY3:
-s = s.encode('utf-8')
-tmp_file_handle.write(s)
-
-# Append newline to make dumps BigQuery compatible.
-tmp_file_handle.write(b'\n')
-
-# Stop if the file exceeds the file size limit.
-if tmp_file_handle.tell() >= self.approx_max_file_size_bytes:
-file_no += 1
-tmp_file_handle = NamedTemporaryFile(delete=True)
-tmp_file_handles[self.filename.format(file_no)] = 
tmp_file_handle
+tmp_file_handles = {}
+row_no = 0
+
+def _create_new_file():
+handle = NamedTemporaryFile(delete=True)
+filename = self.filename.format(len(tmp_file_handles))
+tmp_file_handles[filename] = handle
+return handle
+
+# Don't create a file if there is nothing to write
+if cursor.rowcount > 0:
+tmp_file_handle = _create_new_file()
+
+for row in cursor:
+# Convert datetime objects to utc seconds, and decimals to 
floats
+row = map(self.convert_types, row)
+row_dict = dict(zip(schema, row))
+
+s = json.dumps(row_dict, sort_keys=True)
+if PY3:
+s = s.encode('utf-8')
+tmp_file_handle.write(s)
+
+# Append newline to make dumps BigQuery compatible.
+tmp_file_handle.write(b'\n')
+
+# Stop if the file exceeds the file size limit.
+if tmp_file_handle.tell() >= self.approx_max_file_size_bytes:
+tmp_file_handle = _create_new_file()
+row_no += 1
+
+self.log.info('Received %s rows over %s files', row_no, 
len(tmp_file_handles))
 
 return tmp_file_handles
 
diff --git a/tests/contrib/operators/test_postgres_to_gcs_operator.py 
b/tests/contrib/operators/test_postgres_to_gcs_operator.py
index ca72016974..1b6e731c3b 100644
--- a/tests/contrib/operators/test_postgres_to_gcs_operator.py
+++ b/tests/contrib/operators/test_postgres_to_gcs_operator.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -25,40 +25,66 @@
 import sys
 import unittest
 
-from airflow.contrib.operators.postgres_to_gcs_operator import 
PostgresToGoogleCloudStorageOperator
+from airflow.hooks.postgres_hook import PostgresHook
+from airflow.contrib.operators.postgres_to_gcs_operator import \
+PostgresToGoogleCloudStorageOperator
 
 try:
-from unittest import mock
+from unittest.mock import patch
 except ImportError:
 try:
-import mock
+from mock import patch
 except ImportError:
 mock = None
 
-PY3 = sys.version_info[0] == 3
+TABLES = {'postgres_to_gcs_operator', 'postgres_to_gcs_operator_empty'}
 
 TASK_ID = 'test-postgres-to-gcs'
-POSTGRES_CONN_ID = 'postgres_conn_test'
-SQL = 'select 1'
+POSTGRES_CONN_ID = 'postgres_default'
+SQL = 'SELECT * FROM postgres_to_gcs_operator'
 BUCKET = 'gs://test'
 FILENAME = 

[jira] [Resolved] (AIRFLOW-3059) PostgresToGCS log the number of rows

2018-09-16 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3059.
-
   Resolution: Fixed
 Assignee: Fokko Driesprong
Fix Version/s: 2.0.0

Resolved by https://github.com/apache/incubator-airflow/pull/3905

> PostgresToGCS log the number of rows
> 
>
> Key: AIRFLOW-3059
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3059
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> Log how many rows are being read from Postgres



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres

2018-09-16 Thread GitBox
kaxil closed pull request #3905: [AIRFLOW-3059] Log how many rows are read from 
Postgres
URL: https://github.com/apache/incubator-airflow/pull/3905
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/postgres_to_gcs_operator.py 
b/airflow/contrib/operators/postgres_to_gcs_operator.py
index 850d858f94..78da78ee2f 100644
--- a/airflow/contrib/operators/postgres_to_gcs_operator.py
+++ b/airflow/contrib/operators/postgres_to_gcs_operator.py
@@ -133,28 +133,38 @@ def _write_local_data_files(self, cursor):
 contain the data for the GCS objects.
 """
 schema = list(map(lambda schema_tuple: schema_tuple[0], 
cursor.description))
-file_no = 0
-tmp_file_handle = NamedTemporaryFile(delete=True)
-tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
-
-for row in cursor:
-# Convert datetime objects to utc seconds, and decimals to floats
-row = map(self.convert_types, row)
-row_dict = dict(zip(schema, row))
-
-s = json.dumps(row_dict, sort_keys=True)
-if PY3:
-s = s.encode('utf-8')
-tmp_file_handle.write(s)
-
-# Append newline to make dumps BigQuery compatible.
-tmp_file_handle.write(b'\n')
-
-# Stop if the file exceeds the file size limit.
-if tmp_file_handle.tell() >= self.approx_max_file_size_bytes:
-file_no += 1
-tmp_file_handle = NamedTemporaryFile(delete=True)
-tmp_file_handles[self.filename.format(file_no)] = 
tmp_file_handle
+tmp_file_handles = {}
+row_no = 0
+
+def _create_new_file():
+handle = NamedTemporaryFile(delete=True)
+filename = self.filename.format(len(tmp_file_handles))
+tmp_file_handles[filename] = handle
+return handle
+
+# Don't create a file if there is nothing to write
+if cursor.rowcount > 0:
+tmp_file_handle = _create_new_file()
+
+for row in cursor:
+# Convert datetime objects to utc seconds, and decimals to 
floats
+row = map(self.convert_types, row)
+row_dict = dict(zip(schema, row))
+
+s = json.dumps(row_dict, sort_keys=True)
+if PY3:
+s = s.encode('utf-8')
+tmp_file_handle.write(s)
+
+# Append newline to make dumps BigQuery compatible.
+tmp_file_handle.write(b'\n')
+
+# Stop if the file exceeds the file size limit.
+if tmp_file_handle.tell() >= self.approx_max_file_size_bytes:
+tmp_file_handle = _create_new_file()
+row_no += 1
+
+self.log.info('Received %s rows over %s files', row_no, 
len(tmp_file_handles))
 
 return tmp_file_handles
 
diff --git a/tests/contrib/operators/test_postgres_to_gcs_operator.py 
b/tests/contrib/operators/test_postgres_to_gcs_operator.py
index ca72016974..1b6e731c3b 100644
--- a/tests/contrib/operators/test_postgres_to_gcs_operator.py
+++ b/tests/contrib/operators/test_postgres_to_gcs_operator.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -25,40 +25,66 @@
 import sys
 import unittest
 
-from airflow.contrib.operators.postgres_to_gcs_operator import 
PostgresToGoogleCloudStorageOperator
+from airflow.hooks.postgres_hook import PostgresHook
+from airflow.contrib.operators.postgres_to_gcs_operator import \
+PostgresToGoogleCloudStorageOperator
 
 try:
-from unittest import mock
+from unittest.mock import patch
 except ImportError:
 try:
-import mock
+from mock import patch
 except ImportError:
 mock = None
 
-PY3 = sys.version_info[0] == 3
+TABLES = {'postgres_to_gcs_operator', 'postgres_to_gcs_operator_empty'}
 
 TASK_ID = 'test-postgres-to-gcs'
-POSTGRES_CONN_ID = 'postgres_conn_test'
-SQL = 'select 1'
+POSTGRES_CONN_ID = 'postgres_default'
+SQL = 'SELECT * FROM postgres_to_gcs_operator'
 BUCKET = 'gs://test'
 FILENAME = 'test_{}.ndjson'
-# we expect the psycopg cursor to return encoded strs in py2 and decoded in py3
-if PY3:
-ROWS = [('mock_row_content_1', 42), ('mock_row_content_2', 43), 
('mock_row_content_3', 44)]
-CURSOR_DESCRIPTION = (('some_str', 0), ('some_num', 

[GitHub] Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow

2018-09-16 Thread GitBox
Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger 
job from Airflow
URL: 
https://github.com/apache/incubator-airflow/pull/2708#issuecomment-421754322
 
 
   @ndmar The reason is takes soo long to get this merged is the poor quality 
of the PR. Don't get me wrong, but I invested some time in helping, but still a 
lot isn't addressed, for example:
   
   - Incomplete docstring.
   - Extremely minimal tests.
   - Not following the GIT guidelines regarding the formatting of the commits. 
   - Not updating the docs.
   
   I'm happy to do another review, but first please get it to a decent level of 
quality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko edited a comment on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code

2018-09-16 Thread GitBox
Fokko edited a comment on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial 
code
URL: 
https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421753641
 
 
   @ron819 I think it is a good idea to set the `schedule_interval`, but the 
link to the docs is old (refers to Airbnb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code

2018-09-16 Thread GitBox
Fokko commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code
URL: 
https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421753641
 
 
   @ron819 I think it is a good idea to set the `schedule_interval`, but the 
link to the docs is old (refers to AirBNB).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ron819 commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code

2018-09-16 Thread GitBox
ron819 commented on issue #2466: [AIRFLOW-1441] Fix inconsistent tutorial code
URL: 
https://github.com/apache/incubator-airflow/pull/2466#issuecomment-421750337
 
 
   @Fokko  is this still relevant?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow

2018-09-16 Thread GitBox
eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to 
trigger job from Airflow
URL: 
https://github.com/apache/incubator-airflow/pull/2708#issuecomment-421746408
 
 
   @Nadbkl can you add @ndmar?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3068) Remove deprecated import mechanism

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616669#comment-16616669
 ] 

ASF GitHub Bot commented on AIRFLOW-3068:
-

Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports
URL: https://github.com/apache/incubator-airflow/pull/3906
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3068\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3068
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3068\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove deprecated import mechanism
> --
>
> Key: AIRFLOW-3068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3068
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>
> In Airflow 2.0 we want to remove the deprecate import mechanism.
> Before:
> from airflow.operators import BashOperator 
> changes to
> from airflow.operators.bash_operator import BashOperator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports

2018-09-16 Thread GitBox
Fokko opened a new pull request #3906: [AIRFLOW-3068] Remove deprecated imports
URL: https://github.com/apache/incubator-airflow/pull/3906
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3068\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3068
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3068\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3905: [AIRFLOW-3059] Log how many rows are read from Postgres

2018-09-16 Thread GitBox
Fokko commented on issue #3905: [AIRFLOW-3059] Log how many rows are read from 
Postgres
URL: 
https://github.com/apache/incubator-airflow/pull/3905#issuecomment-421744788
 
 
   @kaxil PTAL :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3059) PostgresToGCS log the number of rows

2018-09-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616637#comment-16616637
 ] 

ASF GitHub Bot commented on AIRFLOW-3059:
-

Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are 
read from Postgres
URL: https://github.com/apache/incubator-airflow/pull/3905
 
 
   To know how many data is being read from Postgres, it is nice to log
   this to the Airflow log.
   
   Previously when there was no data, it would still create a single file.
   This is not something that we want, and therefore we've changed this
   behaviour.
   
   Refactored the tests to make use of Postgres itself since we have it
   running. This makes the tests more realistic, instead of mocking
   everything.
   
   Reopen version of #3898 
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3059\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3059
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3059\], code changes always need a Jira issue.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PostgresToGCS log the number of rows
> 
>
> Key: AIRFLOW-3059
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3059
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>
> Log how many rows are being read from Postgres



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are read from Postgres

2018-09-16 Thread GitBox
Fokko opened a new pull request #3905: [AIRFLOW-3059] Log how many rows are 
read from Postgres
URL: https://github.com/apache/incubator-airflow/pull/3905
 
 
   To know how many data is being read from Postgres, it is nice to log
   this to the Airflow log.
   
   Previously when there was no data, it would still create a single file.
   This is not something that we want, and therefore we've changed this
   behaviour.
   
   Refactored the tests to make use of Postgres itself since we have it
   running. This makes the tests more realistic, instead of mocking
   everything.
   
   Reopen version of #3898 
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3059\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3059
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3059\], code changes always need a Jira issue.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3071) Unable to clear Val of Variable from the UI

2018-09-16 Thread jack (JIRA)
jack created AIRFLOW-3071:
-

 Summary: Unable to clear Val of Variable from the UI
 Key: AIRFLOW-3071
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3071
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: jack


This is quite annoying bug.

 

Reproduce steps:
 # Create a Variable.
 # Give the Variable a Val & save it.
 # Click edit Variable. You will see the Key with Red {color:#FF}*{color} 
and the value that you entered.
 # Remove the Val (leave the field blank) and click save.
 # No errors will appear. However if you will re-enter to the Variable you will 
see that the Blank value was not saved.

 

Please allow to remove Val. This is also the intend behavior because the Val 
has no {color:#FF}*{color} near it.

The current work around is to delete the Variable and re-create it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2902) Add S3ToBigQuery operator

2018-09-16 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616627#comment-16616627
 ] 

jack commented on AIRFLOW-2902:
---

[~XD-DENG]  is this something you can pick-up?

> Add S3ToBigQuery operator
> -
>
> Key: AIRFLOW-2902
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2902
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> Please add operators that allow to move data between Amazon and Google 
> services.
> I saw there is 
> [S3ToHiveTransfer|https://airflow.apache.org/integration.html#s3tohivetransfer]
>  operator... It described as:
> "Moves data from S3 to Hive. The operator downloads a file from S3, stores 
> the file locally before loading it into a Hive table."
>  
> So what I'm asking here is very similar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] dlebech commented on issue #3890: [AIRFLOW-3049] Add extra operations for Mongo hook

2018-09-16 Thread GitBox
dlebech commented on issue #3890: [AIRFLOW-3049] Add extra operations for Mongo 
hook
URL: 
https://github.com/apache/incubator-airflow/pull/3890#issuecomment-421715800
 
 
   @Fokko thanks for the feedback. I've updated the parameter name and added 
tweaks to the documentation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG opened a new pull request #3904: [AIRFLOW-XXX] Fix typo in docs/timezone.rst

2018-09-16 Thread GitBox
XD-DENG opened a new pull request #3904: [AIRFLOW-XXX] Fix typo in 
docs/timezone.rst
URL: https://github.com/apache/incubator-airflow/pull/3904
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   To fix a typo.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Only minor doc change.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3058) Airflow log & multi-threading

2018-09-16 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616607#comment-16616607
 ] 

jack commented on AIRFLOW-3058:
---

[~ashb] I'm using local executor...

> Airflow log & multi-threading
> -
>
> Key: AIRFLOW-3058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3058
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: jack
>Priority: Major
> Attachments: 456.PNG, Sni.PNG
>
>
> The airflow log does not show messages in real time when executing scripts 
> with Multi-threading.
>  
> for example:
>  
> The left is the Airflow log time. the right is the actual time of the print 
> in my code. If I would execute the script without airflow the console will 
> show the times on the right.
> !Sni.PNG!
> {code:java}
> 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 
> 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 
> Thread: Thread-1 Generate page: #0 run #0 with URL: 
> http://...=2=0=1000
> [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.231635 Thread: Thread-2 Generate page: #1 run #0 with URL: 
> http://...=2=1000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.233226 Thread: Thread-3 Generate page: #2 run #0 with URL: 
> http://...=2=2000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.234020 Thread: Thread-4 Generate page: #3 run #0 with URL: 
> http://...=2=3000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100122 Thread: Thread-1 page 0 finished. Length is 1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100877 Thread: Thread-1 Generate page: #4 run #0 with URL: 
> http://...=2=4000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.254536 Thread: Thread-3 page 2 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.255508 Thread: Thread-3 Generate page: #5 run #0 with URL: 
> http://...=2=5000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.096360 Thread: Thread-2 page 1 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.097269 Thread: Thread-2 Generate page: #6 run #0 with URL: 
> http://...=2=6000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.112621 Thread: Thread-4 page 3 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.113455 Thread: Thread-4 Generate page: #7 run #0 with URL: 
> http://...=2=7000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.345343 Thread: Thread-3 Generate page: #8 run #0 with URL: 
> http://...=2=8000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.701201 Thread: Thread-2 Generate page: #9 run #0 with URL: 
> http://...=2=9000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:47.283796 Thread: Thread-1 page 4 finished. Length is 1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:17:27.169359 Thread: Thread-2 page 9 finished. Length is 1000
>  
> {code}
> This never happens when executing regular code.. Happens only with 
> multi-threading. I have some other scripts that the airflow print appears 
> after more than 30 minutes.
>  
>  Check this one:
> hours of delay and then printing everything together. These are not real 
> time. the prints in the log has no correlation to the actual time the command 
> was executed.
>  
> !456.PNG!



--
This message was sent by Atlassian JIRA