[jira] [Closed] (AIRFLOW-2808) Plugin duplication checking is not working

2018-08-14 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG closed AIRFLOW-2808.
--
Resolution: Invalid

Can't pass Kubernetes tests properly (even though seems not relating to this 
commit). Will need to check

> Plugin duplication checking is not working
> --
>
> Key: AIRFLOW-2808
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2808
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: plugins
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Major
>
> h2. *Background*
> A plugin duplication checking was designed in *plugins_manager.py* 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/plugins_manager.py#L93]
>   .
> Corresponding commit was 
> [https://github.com/apache/incubator-airflow/commit/3f38dec9bf44717a275412d1fe155e8252e45ee5|https://github.com/apache/incubator-airflow/commit/3f38dec9bf44717a275412d1fe155e8252e45ee5.]
>     
> However, it turns out that this checking is not really working (reason: 
> plugin method object name is formed using plugin file path + plugin file name 
> + Plugin Class name. It will never be duplicated given there will not be two 
> files with the same name in the same directory).
> h2. *Issue*
> In my production environment, there are two plugin files with the same name 
> and operator names in the new _AirflowPlugin_ classes defined inside. 
> However, they passed the check without any warning or exception.
> For example, I have a plugin *file_sensor_1.py* as below, 
> {code:java}
> from airflow.plugins_manager import AirflowPlugin
> from airflow.operators.sensors import BaseSensorOperator
> from airflow.utils.decorators import apply_defaults
> import os
> class local_file_sensor(BaseSensorOperator):
> @apply_defaults
> def __init__(self, file_path, *args, **kwargs):
> super(local_file_sensor, self).__init__(*args, **kwargs)
> self.file_path = file_path
> def poke(self, context):
> self.log.info('A-Poking: %s', self.file_path)
> return os.path.exists(self.file_path)
> class AirflowLocalFileSensorPlugin(AirflowPlugin):
> name = "local_file_sensor_plugin"
> operators = [local_file_sensor]
> {code}
>  
> I copy & paste it into another plugin file *file_sensor_2.py*, and make the 
> only change to change the log info from "_A-Poking_" to "_B-Poking_" (to help 
> me check which one is picked).
> Only one plugin would be loaded eventually (because the earlier loaded one 
> will be overwritten by the later loaded one 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/plugins_manager.py#L101]
>  ). However, which one? We don't know. It's indeterminate. So far the file 
> name seems to be the only factor affecting which one would be picked by 
> Airflow.
> h2. *My proposal*
> Give WARNING to the users when they launch the Airflow. (Or should we give 
> error msg and fail the launching?) 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2808) Plugin duplication checking is not working

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580738#comment-16580738
 ] 

ASF GitHub Bot commented on AIRFLOW-2808:
-

XD-DENG closed pull request #3649: [AIRFLOW-2808] Fix Plugin Duplication 
Checking
URL: https://github.com/apache/incubator-airflow/pull/3649
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/plugins_manager.py b/airflow/plugins_manager.py
index 735f2de1e8..00200f66f1 100644
--- a/airflow/plugins_manager.py
+++ b/airflow/plugins_manager.py
@@ -28,6 +28,7 @@
 import os
 import re
 import sys
+from collections import Counter
 
 from airflow import configuration
 from airflow.utils.log.logging_mixin import LoggingMixin
@@ -90,8 +91,7 @@ def validate(cls):
 issubclass(obj, AirflowPlugin) and
 obj is not AirflowPlugin):
 obj.validate()
-if obj not in plugins:
-plugins.append(obj)
+plugins.append(obj)
 
 except Exception as e:
 log.exception(e)
@@ -119,7 +119,12 @@ def make_module(name, objects):
 flask_blueprints = []
 menu_links = []
 
+uniq_plugin_modules = []
+
 for p in plugins:
+
+uniq_plugin_modules.append(p.name)
+
 operators_modules.append(
 make_module('airflow.operators.' + p.name, p.operators + p.sensors))
 sensors_modules.append(
@@ -133,3 +138,9 @@ def make_module(name, objects):
 admin_views.extend(p.admin_views)
 flask_blueprints.extend(p.flask_blueprints)
 menu_links.extend(p.menu_links)
+
+plugins_counter = Counter(uniq_plugin_modules)
+if max(plugins_counter.values()) > 1:
+log.warn("There are duplicated plugin files for method(s) %s.",
+ [p[0] for p in plugins_counter.items() if p[1] > 1])
+log.warn("Among duplicated plugins of each method, only one to be loaded.")


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Plugin duplication checking is not working
> --
>
> Key: AIRFLOW-2808
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2808
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: plugins
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Major
>
> h2. *Background*
> A plugin duplication checking was designed in *plugins_manager.py* 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/plugins_manager.py#L93]
>   .
> Corresponding commit was 
> [https://github.com/apache/incubator-airflow/commit/3f38dec9bf44717a275412d1fe155e8252e45ee5|https://github.com/apache/incubator-airflow/commit/3f38dec9bf44717a275412d1fe155e8252e45ee5.]
>     
> However, it turns out that this checking is not really working (reason: 
> plugin method object name is formed using plugin file path + plugin file name 
> + Plugin Class name. It will never be duplicated given there will not be two 
> files with the same name in the same directory).
> h2. *Issue*
> In my production environment, there are two plugin files with the same name 
> and operator names in the new _AirflowPlugin_ classes defined inside. 
> However, they passed the check without any warning or exception.
> For example, I have a plugin *file_sensor_1.py* as below, 
> {code:java}
> from airflow.plugins_manager import AirflowPlugin
> from airflow.operators.sensors import BaseSensorOperator
> from airflow.utils.decorators import apply_defaults
> import os
> class local_file_sensor(BaseSensorOperator):
> @apply_defaults
> def __init__(self, file_path, *args, **kwargs):
> super(local_file_sensor, self).__init__(*args, **kwargs)
> self.file_path = file_path
> def poke(self, context):
> self.log.info('A-Poking: %s', self.file_path)
> return os.path.exists(self.file_path)
> class AirflowLocalFileSensorPlugin(AirflowPlugin):
> name = "local_file_sensor_plugin"
> operators = [local_file_sensor]
> {code}
>  
> I copy & paste it into another plugin file *file_sensor_2.py*, and make the 
> only change to change the log info from "_A-Poking_" to "_B-Poking_" (to help 
> me check which one is picked).
> Only one plugin would be loaded eventually (because the earlier loaded one 
> will be overwritten by the later loaded one 
> 

[GitHub] XD-DENG closed pull request #3649: [AIRFLOW-2808] Fix Plugin Duplication Checking

2018-08-14 Thread GitBox
XD-DENG closed pull request #3649: [AIRFLOW-2808] Fix Plugin Duplication 
Checking
URL: https://github.com/apache/incubator-airflow/pull/3649
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/plugins_manager.py b/airflow/plugins_manager.py
index 735f2de1e8..00200f66f1 100644
--- a/airflow/plugins_manager.py
+++ b/airflow/plugins_manager.py
@@ -28,6 +28,7 @@
 import os
 import re
 import sys
+from collections import Counter
 
 from airflow import configuration
 from airflow.utils.log.logging_mixin import LoggingMixin
@@ -90,8 +91,7 @@ def validate(cls):
 issubclass(obj, AirflowPlugin) and
 obj is not AirflowPlugin):
 obj.validate()
-if obj not in plugins:
-plugins.append(obj)
+plugins.append(obj)
 
 except Exception as e:
 log.exception(e)
@@ -119,7 +119,12 @@ def make_module(name, objects):
 flask_blueprints = []
 menu_links = []
 
+uniq_plugin_modules = []
+
 for p in plugins:
+
+uniq_plugin_modules.append(p.name)
+
 operators_modules.append(
 make_module('airflow.operators.' + p.name, p.operators + p.sensors))
 sensors_modules.append(
@@ -133,3 +138,9 @@ def make_module(name, objects):
 admin_views.extend(p.admin_views)
 flask_blueprints.extend(p.flask_blueprints)
 menu_links.extend(p.menu_links)
+
+plugins_counter = Counter(uniq_plugin_modules)
+if max(plugins_counter.values()) > 1:
+log.warn("There are duplicated plugin files for method(s) %s.",
+ [p[0] for p in plugins_counter.items() if p[1] > 1])
+log.warn("Among duplicated plugins of each method, only one to be loaded.")


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3

2018-08-14 Thread GitBox
XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r210173489
 
 

 ##
 File path: airflow/sensors/hdfs_sensor.py
 ##
 @@ -17,103 +17,231 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import re
-import sys
-from builtins import str
+import posixpath
 
 from airflow import settings
-from airflow.hooks.hdfs_hook import HDFSHook
+from airflow.hooks.hdfs_hook import HdfsHook
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 
-class HdfsSensor(BaseSensorOperator):
-"""
-Waits for a file or folder to land in HDFS
+class HdfsFileSensor(BaseSensorOperator):
+"""Sensor that waits for files matching a specific (glob) pattern to land 
in HDFS.
+
+:param str file_pattern: Glob pattern to match.
+:param str conn_id: Connection to use.
+:param Iterable[FilePathFilter] filters: Optional list of filters that can 
be
+used to apply further filtering to any file paths matching the glob 
pattern.
+Any files that fail a filter are dropped from consideration.
+:param int min_size: Minimum size (in MB) for files to be considered. Can 
be used
+to filter any intermediate files that are below the expected file size.
+:param Set[str] ignore_exts: File extensions to ignore. By default, files 
with
+a '_COPYING_' extension are ignored, as these represent temporary 
files.
 
 Review comment:
   Hi @jrderuiter @Fokko , I think it would be good to explicitly tell users 
that how `ignore_exts` should be like in the comment (which will be the 
documentation later).
   
   For example, both `{'.py', '.exe'}` and `{'py', 'exe'}` seem valid, but only 
`{'py', 'exe'}` would work here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3

2018-08-14 Thread GitBox
XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r210173489
 
 

 ##
 File path: airflow/sensors/hdfs_sensor.py
 ##
 @@ -17,103 +17,231 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import re
-import sys
-from builtins import str
+import posixpath
 
 from airflow import settings
-from airflow.hooks.hdfs_hook import HDFSHook
+from airflow.hooks.hdfs_hook import HdfsHook
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 
-class HdfsSensor(BaseSensorOperator):
-"""
-Waits for a file or folder to land in HDFS
+class HdfsFileSensor(BaseSensorOperator):
+"""Sensor that waits for files matching a specific (glob) pattern to land 
in HDFS.
+
+:param str file_pattern: Glob pattern to match.
+:param str conn_id: Connection to use.
+:param Iterable[FilePathFilter] filters: Optional list of filters that can 
be
+used to apply further filtering to any file paths matching the glob 
pattern.
+Any files that fail a filter are dropped from consideration.
+:param int min_size: Minimum size (in MB) for files to be considered. Can 
be used
+to filter any intermediate files that are below the expected file size.
+:param Set[str] ignore_exts: File extensions to ignore. By default, files 
with
+a '_COPYING_' extension are ignored, as these represent temporary 
files.
 
 Review comment:
   Hi @jrderuiter @Fokko , I think it would be good to explicitly tell users 
that how `ignore_exts` should be like. For example, both `{'.py', '.exe'}` and 
`{'py', 'exe'}` seem valid, but only `{'py', 'exe'}` would work here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-2872) Implement "Ad Hoc Query" in /www_rbac, and refine existing QueryView()

2018-08-14 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG closed AIRFLOW-2872.
--
Resolution: Won't Fix

> Implement "Ad Hoc Query" in /www_rbac, and refine existing QueryView()
> --
>
> Key: AIRFLOW-2872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2872
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> To implement "Ad Hoc Query" in for RBAC in /www_rbac, based on the existing 
> implementation in /www.
> In addition, refine the existing QueryView():
>  # The ".csv" button in *Ad Hoc Query* view is responding with a plain text 
> file, rather than a CSV file (even though users can manually change the 
> extension).
>  # Argument 'has_data' passed to the template is not used by the template 
> 'airflow/query.html'.
>  # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced 
> before assignment'
>  # 'result = df.to_html()' should only be invoked when user doesn NOT choose 
> '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the 
> result it returns will not be used if user askes for CSV downloading instead 
> of a html page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2886) Secure Flask SECRET_KEY

2018-08-14 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG closed AIRFLOW-2886.
--
Resolution: Fixed

Fixed with commit 
https://github.com/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8

> Secure Flask SECRET_KEY
> ---
>
> Key: AIRFLOW-2886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2886
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In my earlier PRs, [https://github.com/apache/incubator-airflow/pull/3651] 
> and [https://github.com/apache/incubator-airflow/pull/3729] , I proposed to 
> generate random SECRET_KEY for Flask App.
> If we have multiple workers for the Flask webserver, we may encounter CSRF 
> error {{The CSRF session token is missing}} .
> On the other hand, it's still very important to have as random SECRET_KEY as 
> possible for security reasons. We can deal with it like how we dealt with 
> FERNET_KEY (i.e. generate a random value when the airflow.cfg file is 
> initiated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2896) Improve HdfsSensor()

2018-08-14 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG closed AIRFLOW-2896.
--
Resolution: Invalid

Hdfs is being refactored in 
[#3560|https://github.com/apache/incubator-airflow/pull/3560] and this is then 
already outdated

> Improve HdfsSensor()
> 
>
> Key: AIRFLOW-2896
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2896
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> # Make documentation clearer (format for `ignored_ext` should be extensions 
> like ''py rather than '.py')
>  #  Ensure upper/lower case would not affect the usage of `ignored_ext` 
> feagure.
>  #  Add tests for methods filter_for_ignored_ext() and filter_for_filesize()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2888) Do not use Shell=True and bash to launch tasks

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580727#comment-16580727
 ] 

ASF GitHub Bot commented on AIRFLOW-2888:
-

bolkedebruin closed pull request #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: https://github.com/apache/incubator-airflow/pull/3740
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 4fda57663f..af10729085 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -5,6 +5,13 @@ assists users migrating to a new version.
 
 ## Airflow Master
 
+### Rename of BashTaskRunner to StandardTaskRunner
+
+BashTaskRunner has been renamed to StandardTaskRunner. It is the default task 
runner
+so you might need to update your config.
+
+`task_runner = StandardTaskRunner`
+
 ## Airflow 1.10
 
 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` 
in your environment or
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 7a86e1f069..76c66c90f6 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -140,7 +140,7 @@ donot_pickle = False
 dagbag_import_timeout = 30
 
 # The class to use for running task instances in a subprocess
-task_runner = BashTaskRunner
+task_runner = StandardTaskRunner
 
 # If set, tasks without a `run_as_user` argument will be run with this user
 # Can be used to de-elevate a sudo user running Airflow when executing tasks
diff --git a/airflow/contrib/executors/mesos_executor.py 
b/airflow/contrib/executors/mesos_executor.py
index ff974ffc3c..0609d71cf2 100644
--- a/airflow/contrib/executors/mesos_executor.py
+++ b/airflow/contrib/executors/mesos_executor.py
@@ -162,7 +162,7 @@ def resourceOffers(self, driver, offers):
 
 command = mesos_pb2.CommandInfo()
 command.shell = True
-command.value = cmd
+command.value = " ".join(cmd)
 task.command.MergeFrom(command)
 
 # If docker image for airflow is specified in config then pull 
that
diff --git a/airflow/contrib/kubernetes/worker_configuration.py 
b/airflow/contrib/kubernetes/worker_configuration.py
index 88a5cf0a40..482a823809 100644
--- a/airflow/contrib/kubernetes/worker_configuration.py
+++ b/airflow/contrib/kubernetes/worker_configuration.py
@@ -203,8 +203,7 @@ def make_pod(self, namespace, worker_uuid, pod_id, dag_id, 
task_id, execution_da
 image=kube_executor_config.image or self.kube_config.kube_image,
 image_pull_policy=(kube_executor_config.image_pull_policy or
self.kube_config.kube_image_pull_policy),
-cmds=['bash', '-cx', '--'],
-args=[airflow_command],
+cmds=airflow_command,
 labels={
 'airflow-worker': worker_uuid,
 'dag_id': dag_id,
diff --git a/airflow/contrib/task_runner/cgroup_task_runner.py 
b/airflow/contrib/task_runner/cgroup_task_runner.py
index faa2407f09..78a240f2db 100644
--- a/airflow/contrib/task_runner/cgroup_task_runner.py
+++ b/airflow/contrib/task_runner/cgroup_task_runner.py
@@ -117,7 +117,7 @@ def start(self):
 "creating another one",
 cgroups.get("cpu"), cgroups.get("memory")
 )
-self.process = self.run_command(['bash', '-c'], join_args=True)
+self.process = self.run_command()
 return
 
 # Create a unique cgroup name
diff --git a/airflow/executors/base_executor.py 
b/airflow/executors/base_executor.py
index 701ac66f8b..8baed1a250 100644
--- a/airflow/executors/base_executor.py
+++ b/airflow/executors/base_executor.py
@@ -75,7 +75,7 @@ def queue_task_instance(
 # cfg_path is needed to propagate the config values if using 
impersonation
 # (run_as_user), given that there are different code paths running 
tasks.
 # For a long term solution we need to address AIRFLOW-1986
-command = task_instance.command(
+command = task_instance.command_as_list(
 local=True,
 mark_success=mark_success,
 ignore_all_deps=ignore_all_deps,
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 481daa5826..03a4b3b792 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -56,7 +56,7 @@ def execute_command(command):
 log.info("Executing command in Celery: %s", command)
 env = os.environ.copy()
 try:
-subprocess.check_call(command, shell=True, stderr=subprocess.STDOUT,
+

[jira] [Commented] (AIRFLOW-2896) Improve HdfsSensor()

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580728#comment-16580728
 ] 

ASF GitHub Bot commented on AIRFLOW-2896:
-

XD-DENG closed pull request #3746: [AIRFLOW-2896] Improve HdfsSensor()
URL: https://github.com/apache/incubator-airflow/pull/3746
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/sensors/hdfs_sensor.py b/airflow/sensors/hdfs_sensor.py
index 4d95556f47..4175f4cf07 100644
--- a/airflow/sensors/hdfs_sensor.py
+++ b/airflow/sensors/hdfs_sensor.py
@@ -81,19 +81,20 @@ def filter_for_ignored_ext(result, ignored_ext, 
ignore_copying):
 Will filter if instructed to do so the result to remove matching 
criteria
 
 :param result: (list) of dicts returned by Snakebite ls
-:param ignored_ext: (list) of ignored extensions
+:param ignored_ext: (list) of ignored extensions, like ``['exe', 
'py']``
 :param ignore_copying: (bool) shall we ignore ?
 :return: (list) of dicts which were not removed
 """
 if ignore_copying:
 log = LoggingMixin().log
-regex_builder = "^.*\.(%s$)$" % '$|'.join(ignored_ext)
+regex_builder = "^.*\.(%s$)$" % '$|'.join([e.lower() for e in 
ignored_ext])
 ignored_extensions_regex = re.compile(regex_builder)
 log.debug(
 'Filtering result for ignored extensions: %s in files %s',
 ignored_extensions_regex.pattern, map(lambda x: x['path'], 
result)
 )
-result = [x for x in result if not 
ignored_extensions_regex.match(x['path'])]
+result = [x for x in result
+  if not ignored_extensions_regex.match(x['path'].lower())]
 log.debug('HdfsSensor.poke: after ext filter result is %s', result)
 return result
 
diff --git a/tests/sensors/test_hdfs_sensor.py 
b/tests/sensors/test_hdfs_sensor.py
index b94065d842..13b1f3449c 100644
--- a/tests/sensors/test_hdfs_sensor.py
+++ b/tests/sensors/test_hdfs_sensor.py
@@ -89,3 +89,38 @@ def test_legacy_file_does_not_exists(self):
 # Then
 with self.assertRaises(AirflowSensorTimeout):
 task.execute(None)
+
+def test_filter_for_ignored_ext(self):
+"""
+Test the method HdfsSensor.filter_for_ignored_ext
+:return:
+"""
+sample_files = [{'path': 'x.py'}, {'path': 'x.txt'}, {'path': 'x.exe'}]
+
+check_1 = HdfsSensor.filter_for_ignored_ext(result=sample_files,
+ignored_ext=['exe', 'py'],
+ignore_copying=True)
+self.assertTrue(len(check_1) == 1)
+self.assertEqual(check_1[0]['path'].rsplit(".")[-1], "txt")
+
+check_2 = HdfsSensor.filter_for_ignored_ext(result=sample_files,
+ignored_ext=['EXE', 'PY'],
+ignore_copying=True)
+self.assertTrue(len(check_2) == 1)
+self.assertEqual(check_2[0]['path'].rsplit(".")[-1], "txt")
+
+def test_filter_for_filesize(self):
+"""
+Test the method HdfsSensor.filter_for_filesize
+:return:
+"""
+# unit of 'length' here is "byte"
+sample_files = [{'path': 'small_file_1.txt', 'length': 1024},
+{'path': 'small_file_2.txt', 'length': 2048},
+{'path': 'big_file.txt', 'length': 1024 ** 2 + 1}]
+
+# unit of argument 'size' inside HdfsSensor.filter_for_filesize is "MB"
+check = HdfsSensor.filter_for_filesize(result=sample_files,
+   size=1)
+self.assertTrue(len(check) == 1)
+self.assertEqual(check[0]['path'], 'big_file.txt')


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve HdfsSensor()
> 
>
> Key: AIRFLOW-2896
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2896
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> # Make documentation clearer (format for `ignored_ext` should be extensions 
> like ''py rather than '.py')
>  #  Ensure upper/lower case would not affect 

[GitHub] bolkedebruin closed pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
bolkedebruin closed pull request #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: https://github.com/apache/incubator-airflow/pull/3740
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 4fda57663f..af10729085 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -5,6 +5,13 @@ assists users migrating to a new version.
 
 ## Airflow Master
 
+### Rename of BashTaskRunner to StandardTaskRunner
+
+BashTaskRunner has been renamed to StandardTaskRunner. It is the default task 
runner
+so you might need to update your config.
+
+`task_runner = StandardTaskRunner`
+
 ## Airflow 1.10
 
 Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` 
in your environment or
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 7a86e1f069..76c66c90f6 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -140,7 +140,7 @@ donot_pickle = False
 dagbag_import_timeout = 30
 
 # The class to use for running task instances in a subprocess
-task_runner = BashTaskRunner
+task_runner = StandardTaskRunner
 
 # If set, tasks without a `run_as_user` argument will be run with this user
 # Can be used to de-elevate a sudo user running Airflow when executing tasks
diff --git a/airflow/contrib/executors/mesos_executor.py 
b/airflow/contrib/executors/mesos_executor.py
index ff974ffc3c..0609d71cf2 100644
--- a/airflow/contrib/executors/mesos_executor.py
+++ b/airflow/contrib/executors/mesos_executor.py
@@ -162,7 +162,7 @@ def resourceOffers(self, driver, offers):
 
 command = mesos_pb2.CommandInfo()
 command.shell = True
-command.value = cmd
+command.value = " ".join(cmd)
 task.command.MergeFrom(command)
 
 # If docker image for airflow is specified in config then pull 
that
diff --git a/airflow/contrib/kubernetes/worker_configuration.py 
b/airflow/contrib/kubernetes/worker_configuration.py
index 88a5cf0a40..482a823809 100644
--- a/airflow/contrib/kubernetes/worker_configuration.py
+++ b/airflow/contrib/kubernetes/worker_configuration.py
@@ -203,8 +203,7 @@ def make_pod(self, namespace, worker_uuid, pod_id, dag_id, 
task_id, execution_da
 image=kube_executor_config.image or self.kube_config.kube_image,
 image_pull_policy=(kube_executor_config.image_pull_policy or
self.kube_config.kube_image_pull_policy),
-cmds=['bash', '-cx', '--'],
-args=[airflow_command],
+cmds=airflow_command,
 labels={
 'airflow-worker': worker_uuid,
 'dag_id': dag_id,
diff --git a/airflow/contrib/task_runner/cgroup_task_runner.py 
b/airflow/contrib/task_runner/cgroup_task_runner.py
index faa2407f09..78a240f2db 100644
--- a/airflow/contrib/task_runner/cgroup_task_runner.py
+++ b/airflow/contrib/task_runner/cgroup_task_runner.py
@@ -117,7 +117,7 @@ def start(self):
 "creating another one",
 cgroups.get("cpu"), cgroups.get("memory")
 )
-self.process = self.run_command(['bash', '-c'], join_args=True)
+self.process = self.run_command()
 return
 
 # Create a unique cgroup name
diff --git a/airflow/executors/base_executor.py 
b/airflow/executors/base_executor.py
index 701ac66f8b..8baed1a250 100644
--- a/airflow/executors/base_executor.py
+++ b/airflow/executors/base_executor.py
@@ -75,7 +75,7 @@ def queue_task_instance(
 # cfg_path is needed to propagate the config values if using 
impersonation
 # (run_as_user), given that there are different code paths running 
tasks.
 # For a long term solution we need to address AIRFLOW-1986
-command = task_instance.command(
+command = task_instance.command_as_list(
 local=True,
 mark_success=mark_success,
 ignore_all_deps=ignore_all_deps,
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 481daa5826..03a4b3b792 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -56,7 +56,7 @@ def execute_command(command):
 log.info("Executing command in Celery: %s", command)
 env = os.environ.copy()
 try:
-subprocess.check_call(command, shell=True, stderr=subprocess.STDOUT,
+subprocess.check_call(command, stderr=subprocess.STDOUT,
   close_fds=True, env=env)
 except subprocess.CalledProcessError as e:
 log.exception('execute_command encountered a CalledProcessError')
@@ -84,7 +84,7 @@ def 

[GitHub] XD-DENG commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()

2018-08-14 Thread GitBox
XD-DENG commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()
URL: 
https://github.com/apache/incubator-airflow/pull/3746#issuecomment-413096164
 
 
   Thanks @bolkedebruin . Didn't know the refactoring going on. Will close this 
PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG closed pull request #3746: [AIRFLOW-2896] Improve HdfsSensor()

2018-08-14 Thread GitBox
XD-DENG closed pull request #3746: [AIRFLOW-2896] Improve HdfsSensor()
URL: https://github.com/apache/incubator-airflow/pull/3746
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/sensors/hdfs_sensor.py b/airflow/sensors/hdfs_sensor.py
index 4d95556f47..4175f4cf07 100644
--- a/airflow/sensors/hdfs_sensor.py
+++ b/airflow/sensors/hdfs_sensor.py
@@ -81,19 +81,20 @@ def filter_for_ignored_ext(result, ignored_ext, 
ignore_copying):
 Will filter if instructed to do so the result to remove matching 
criteria
 
 :param result: (list) of dicts returned by Snakebite ls
-:param ignored_ext: (list) of ignored extensions
+:param ignored_ext: (list) of ignored extensions, like ``['exe', 
'py']``
 :param ignore_copying: (bool) shall we ignore ?
 :return: (list) of dicts which were not removed
 """
 if ignore_copying:
 log = LoggingMixin().log
-regex_builder = "^.*\.(%s$)$" % '$|'.join(ignored_ext)
+regex_builder = "^.*\.(%s$)$" % '$|'.join([e.lower() for e in 
ignored_ext])
 ignored_extensions_regex = re.compile(regex_builder)
 log.debug(
 'Filtering result for ignored extensions: %s in files %s',
 ignored_extensions_regex.pattern, map(lambda x: x['path'], 
result)
 )
-result = [x for x in result if not 
ignored_extensions_regex.match(x['path'])]
+result = [x for x in result
+  if not ignored_extensions_regex.match(x['path'].lower())]
 log.debug('HdfsSensor.poke: after ext filter result is %s', result)
 return result
 
diff --git a/tests/sensors/test_hdfs_sensor.py 
b/tests/sensors/test_hdfs_sensor.py
index b94065d842..13b1f3449c 100644
--- a/tests/sensors/test_hdfs_sensor.py
+++ b/tests/sensors/test_hdfs_sensor.py
@@ -89,3 +89,38 @@ def test_legacy_file_does_not_exists(self):
 # Then
 with self.assertRaises(AirflowSensorTimeout):
 task.execute(None)
+
+def test_filter_for_ignored_ext(self):
+"""
+Test the method HdfsSensor.filter_for_ignored_ext
+:return:
+"""
+sample_files = [{'path': 'x.py'}, {'path': 'x.txt'}, {'path': 'x.exe'}]
+
+check_1 = HdfsSensor.filter_for_ignored_ext(result=sample_files,
+ignored_ext=['exe', 'py'],
+ignore_copying=True)
+self.assertTrue(len(check_1) == 1)
+self.assertEqual(check_1[0]['path'].rsplit(".")[-1], "txt")
+
+check_2 = HdfsSensor.filter_for_ignored_ext(result=sample_files,
+ignored_ext=['EXE', 'PY'],
+ignore_copying=True)
+self.assertTrue(len(check_2) == 1)
+self.assertEqual(check_2[0]['path'].rsplit(".")[-1], "txt")
+
+def test_filter_for_filesize(self):
+"""
+Test the method HdfsSensor.filter_for_filesize
+:return:
+"""
+# unit of 'length' here is "byte"
+sample_files = [{'path': 'small_file_1.txt', 'length': 1024},
+{'path': 'small_file_2.txt', 'length': 2048},
+{'path': 'big_file.txt', 'length': 1024 ** 2 + 1}]
+
+# unit of argument 'size' inside HdfsSensor.filter_for_filesize is "MB"
+check = HdfsSensor.filter_for_filesize(result=sample_files,
+   size=1)
+self.assertTrue(len(check) == 1)
+self.assertEqual(check[0]['path'], 'big_file.txt')


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on a change in pull request #3747: [AIRFLOW-2895] Prevent scheduler from spamming heartbeats/logs

2018-08-14 Thread GitBox
bolkedebruin commented on a change in pull request #3747: [AIRFLOW-2895] 
Prevent scheduler from spamming heartbeats/logs
URL: https://github.com/apache/incubator-airflow/pull/3747#discussion_r210172561
 
 

 ##
 File path: UPDATING.md
 ##
 @@ -421,7 +421,7 @@ indefinitely. This is only available on the command line.
 After how much time should an updated DAG be picked up from the filesystem.
 
  min_file_parsing_loop_time
-
+CURRENTLY DISABLED DUE TO A BUG
 
 Review comment:
   Please add a note to the master section, we are not changing the past. 
People will read it as “I am at this version, what do I need to do to get to 
the next”


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin closed pull request #3750: [AIRFLOW-XXX] Clean up installation extra packages table

2018-08-14 Thread GitBox
bolkedebruin closed pull request #3750: [AIRFLOW-XXX] Clean up installation 
extra packages table
URL: https://github.com/apache/incubator-airflow/pull/3750
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/installation.rst b/docs/installation.rst
index e012b288a8..4beef47d3a 100644
--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -41,68 +41,67 @@ Here's the list of the subpackages and what they enable:
 
+---+--+-+
 | subpackage| install command  | enables   
  |
 
+===+==+=+
-|  all  | ``pip install apache-airflow[all]``  | All Airflow 
features known to man   |
+| all   | ``pip install apache-airflow[all]``  | All Airflow 
features known to man   |
 
+---+--+-+
-|  all_dbs  | ``pip install apache-airflow[all_dbs]``  | All databases 
integrations  |
+| all_dbs   | ``pip install apache-airflow[all_dbs]``  | All databases 
integrations  |
 
+---+--+-+
-|  async| ``pip install apache-airflow[async]``| Async worker 
classes for gunicorn   |
+| async | ``pip install apache-airflow[async]``| Async worker 
classes for Gunicorn   |
 
+---+--+-+
-|  devel| ``pip install apache-airflow[devel]``| Minimum dev 
tools requirements  |
+| celery| ``pip install apache-airflow[celery]``   | 
CeleryExecutor  |
 
+---+--+-+
-|  devel_hadoop | ``pip install apache-airflow[devel_hadoop]`` | Airflow + 
dependencies on the Hadoop stack  |
+| cloudant  | ``pip install apache-airflow[cloudant]`` | Cloudant hook 
  |
 
+---+--+-+
-|  celery   | ``pip install apache-airflow[celery]``   | 
CeleryExecutor  |
+| crypto| ``pip install apache-airflow[crypto]``   | Encrypt 
connection passwords in metadata db |
 
+---+--+-+
-|  crypto   | ``pip install apache-airflow[crypto]``   | Encrypt 
connection passwords in metadata db |
+| devel | ``pip install apache-airflow[devel]``| Minimum dev 
tools requirements  |
 
+---+--+-+
-|  druid| ``pip install apache-airflow[druid]``| Druid.io 
related operators & hooks  |
+| devel_hadoop  | ``pip install apache-airflow[devel_hadoop]`` | Airflow + 
dependencies on the Hadoop stack  |
 
+---+--+-+
-|  gcp_api  | ``pip install apache-airflow[gcp_api]``  | Google Cloud 
Platform hooks and operators   |
+| druid | ``pip install apache-airflow[druid]``| Druid related 
operators & hooks |
++---+--+-+
+| gcp_api   | ``pip install apache-airflow[gcp_api]``  | Google Cloud 
Platform hooks and operators   |
 |   |  | (using 
``google-api-python-client``)|
 
+---+--+-+
-|  jdbc | ``pip install apache-airflow[jdbc]`` | JDBC hooks 
and operators|
+| hdfs  | ``pip install apache-airflow[hdfs]`` | HDFS hooks 
and operators|
 
+---+--+-+
-|  hdfs | ``pip install apache-airflow[hdfs]`` | HDFS hooks 
and operators|
+| hive  

[GitHub] bolkedebruin commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()

2018-08-14 Thread GitBox
bolkedebruin commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()
URL: 
https://github.com/apache/incubator-airflow/pull/3746#issuecomment-413095650
 
 
   Hdfs is being refactored in #3560 and I think this is then already outdated


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3744: [AIRFLOW-2893] fix stuck dataflow 
job due to name mismatch
URL: 
https://github.com/apache/incubator-airflow/pull/3744#issuecomment-412739331
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=h1)
 Report
   > Merging 
[#3744](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d516c7134eb22a3c2fc63cf96626ef6e8b247f2?src=pr=desc)
 will **increase** coverage by `59.98%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3744/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master#3744   +/-   ##
   ===
   + Coverage   17.68%   77.67%   +59.98% 
   ===
 Files 204  204   
 Lines   1584615846   
   ===
   + Hits 280312309 +9506 
   + Misses  13043 3537 -9506
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/operator\_resources.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9vcGVyYXRvcl9yZXNvdXJjZXMucHk=)
 | `86.95% <0%> (+4.34%)` | :arrow_up: |
   | 
[airflow/executors/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvX19pbml0X18ucHk=)
 | `63.46% <0%> (+5.76%)` | :arrow_up: |
   | 
[airflow/utils/decorators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5)
 | `91.66% <0%> (+14.58%)` | :arrow_up: |
   | 
[airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==)
 | `81.15% <0%> (+15.21%)` | :arrow_up: |
   | 
[airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==)
 | `80.43% <0%> (+15.21%)` | :arrow_up: |
   | 
[airflow/hooks/oracle\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9vcmFjbGVfaG9vay5weQ==)
 | `15.47% <0%> (+15.47%)` | :arrow_up: |
   | 
[airflow/task/task\_runner/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL19faW5pdF9fLnB5)
 | `63.63% <0%> (+18.18%)` | :arrow_up: |
   | 
[airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==)
 | `33.33% <0%> (+18.25%)` | :arrow_up: |
   | 
[airflow/macros/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9tYWNyb3MvX19pbml0X18ucHk=)
 | `81.48% <0%> (+18.51%)` | :arrow_up: |
   | 
[airflow/ti\_deps/deps/not\_running\_dep.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvbm90X3J1bm5pbmdfZGVwLnB5)
 | `100% <0%> (+22.22%)` | :arrow_up: |
   | ... and [161 
more](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=footer).
 Last update 
[9d516c7...86d2dd1](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3744: [AIRFLOW-2893] fix stuck dataflow 
job due to name mismatch
URL: 
https://github.com/apache/incubator-airflow/pull/3744#issuecomment-412739331
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=h1)
 Report
   > Merging 
[#3744](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d516c7134eb22a3c2fc63cf96626ef6e8b247f2?src=pr=desc)
 will **increase** coverage by `59.98%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3744/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master#3744   +/-   ##
   ===
   + Coverage   17.68%   77.67%   +59.98% 
   ===
 Files 204  204   
 Lines   1584615846   
   ===
   + Hits 280312309 +9506 
   + Misses  13043 3537 -9506
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/operator\_resources.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9vcGVyYXRvcl9yZXNvdXJjZXMucHk=)
 | `86.95% <0%> (+4.34%)` | :arrow_up: |
   | 
[airflow/executors/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvX19pbml0X18ucHk=)
 | `63.46% <0%> (+5.76%)` | :arrow_up: |
   | 
[airflow/utils/decorators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5)
 | `91.66% <0%> (+14.58%)` | :arrow_up: |
   | 
[airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==)
 | `81.15% <0%> (+15.21%)` | :arrow_up: |
   | 
[airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==)
 | `80.43% <0%> (+15.21%)` | :arrow_up: |
   | 
[airflow/hooks/oracle\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9vcmFjbGVfaG9vay5weQ==)
 | `15.47% <0%> (+15.47%)` | :arrow_up: |
   | 
[airflow/task/task\_runner/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL19faW5pdF9fLnB5)
 | `63.63% <0%> (+18.18%)` | :arrow_up: |
   | 
[airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==)
 | `33.33% <0%> (+18.25%)` | :arrow_up: |
   | 
[airflow/macros/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy9tYWNyb3MvX19pbml0X18ucHk=)
 | `81.48% <0%> (+18.51%)` | :arrow_up: |
   | 
[airflow/ti\_deps/deps/not\_skipped\_dep.py](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvbm90X3NraXBwZWRfZGVwLnB5)
 | `100% <0%> (+22.22%)` | :arrow_up: |
   | ... and [161 
more](https://codecov.io/gh/apache/incubator-airflow/pull/3744/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=footer).
 Last update 
[9d516c7...86d2dd1](https://codecov.io/gh/apache/incubator-airflow/pull/3744?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck 
dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210165168
 
 

 ##
 File path: tests/contrib/hooks/test_gcp_dataflow_hook.py
 ##
 @@ -24,6 +24,7 @@
 
 from airflow.contrib.hooks.gcp_dataflow_hook import DataFlowHook
 from airflow.contrib.hooks.gcp_dataflow_hook import _Dataflow
+from airflow.contrib.hooks.gcp_dataflow_hook import _DataflowJob
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()

2018-08-14 Thread GitBox
XD-DENG commented on issue #3746: [AIRFLOW-2896] Improve HdfsSensor()
URL: 
https://github.com/apache/incubator-airflow/pull/3746#issuecomment-413070866
 
 
   1 out of 9 tests failed due to
   
   `ERROR: InvocationError for command 
'/home/travis/build/apache/incubator-airflow/scripts/ci/setup_env.sh' (exited 
with code 1)`
   
   ` /home/travis/.travis_cache//cdh/hadoop.tar.gz: Cannot open: Permission 
denied`
   
   Not sure whether it's a Travis-CI issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3691: [AIRFLOW-2846] Add missing python test dependency to setup.py

2018-08-14 Thread GitBox
r39132 commented on issue #3691: [AIRFLOW-2846] Add missing python test 
dependency to setup.py
URL: 
https://github.com/apache/incubator-airflow/pull/3691#issuecomment-413058803
 
 
   @holdenk can you look at @Fokko suggestion, which I interpret as removing 
the tox from travis.yml but leaving it in setup.py in order to let the tests 
pass. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3728: [AIRFLOW-2883] Not search dag owner if owners are missing

2018-08-14 Thread GitBox
r39132 commented on issue #3728: [AIRFLOW-2883] Not search dag owner if owners 
are missing
URL: 
https://github.com/apache/incubator-airflow/pull/3728#issuecomment-413058498
 
 
   Hi Feng!
   So, a couple things. I have been trying really hard to reproduce this bug. I 
understand your fix, but am also trying to understand how you were able to 
reproduce this. 
   
   Here's why. 
   If I specify a DAG that is missing an owner, the default_owner in 
airflow.cfg seems to be getting used. If I remove that configuration and retry, 
somehow, `Airflow`, is still specified as the owner. If I specify `None`, I get 
an exception that I showed earlier -- the DAG will not even be imported. If I 
specify `''` (empty string), the UI works as expected. 
   
   Ignoring the fact that I can't reproduce this, I do wonder about your fix. 
Your fix won't search for dag_id match if the owner field is missing. That does 
not feel like the right fix. Why not look for a match against both fields, only 
ignoring the owner field if the owner field is missing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao closed pull request #3752: [AIRFLOW-XXX] Make pip install commands consistent

2018-08-14 Thread GitBox
feng-tao closed pull request #3752: [AIRFLOW-XXX] Make pip install commands 
consistent
URL: https://github.com/apache/incubator-airflow/pull/3752
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 060c5dd84d..e6df3d4751 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -93,7 +93,7 @@ docker run -t -i -v `pwd`:/airflow/ -w /airflow/ -e 
SLUGIFY_USES_TEXT_UNIDECODE=
 
 # Install Airflow with all the required dependencies,
 # including the devel which will provide the development tools
-pip install -e ".[hdfs,hive,druid,devel]"
+pip install -e .[devel,druid,hdfs,hive]
 
 # Init the database
 airflow initdb
diff --git a/airflow/contrib/utils/sendgrid.py 
b/airflow/contrib/utils/sendgrid.py
index 9055c97879..6d8b5f5ddb 100644
--- a/airflow/contrib/utils/sendgrid.py
+++ b/airflow/contrib/utils/sendgrid.py
@@ -42,7 +42,7 @@ def send_email(to, subject, html_content, files=None,
 
 To use this plugin:
 0. include sendgrid subpackage as part of your Airflow installation, e.g.,
-pip install airflow[sendgrid]
+pip install apache-airflow[sendgrid]
 1. update [email] backend in airflow.cfg, i.e.,
 [email]
 email_backend = airflow.contrib.utils.sendgrid.send_email
diff --git a/airflow/example_dags/example_kubernetes_operator.py 
b/airflow/example_dags/example_kubernetes_operator.py
index 92d73c5d33..e8d35c4c5b 100644
--- a/airflow/example_dags/example_kubernetes_operator.py
+++ b/airflow/example_dags/example_kubernetes_operator.py
@@ -25,7 +25,7 @@
 
 try:
 # Kubernetes is optional, so not available in vanilla Airflow
-# pip install airflow[kubernetes]
+# pip install apache-airflow[kubernetes]
 from airflow.contrib.operators.kubernetes_pod_operator import 
KubernetesPodOperator
 
 args = {
@@ -53,4 +53,4 @@
 except ImportError as e:
 log.warn("Could not import KubernetesPodOperator: " + str(e))
 log.warn("Install kubernetes dependencies with: "
- "pip install airflow['kubernetes']")
+ "pip install apache-airflow[kubernetes]")
diff --git a/docs/installation.rst b/docs/installation.rst
index e012b288a8..d522d0b2da 100644
--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -14,7 +14,7 @@ You can also install Airflow with support for extra features 
like ``s3`` or ``po
 
 .. code-block:: bash
 
-pip install "apache-airflow[s3, postgres]"
+pip install apache-airflow[postgres,s3]
 
 .. note:: GPL dependency
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3752: [AIRFLOW-XXX] Make pip install commands consistent

2018-08-14 Thread GitBox
feng-tao commented on issue #3752: [AIRFLOW-XXX] Make pip install commands 
consistent
URL: 
https://github.com/apache/incubator-airflow/pull/3752#issuecomment-413053581
 
 
   lgtm, thanks @tedmiston 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2901) WebHdfsSensor doesn't support HDFS HA

2018-08-14 Thread Manu Zhang (JIRA)
Manu Zhang created AIRFLOW-2901:
---

 Summary: WebHdfsSensor doesn't support HDFS HA
 Key: AIRFLOW-2901
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2901
 Project: Apache Airflow
  Issue Type: Improvement
  Components: hooks
Reporter: Manu Zhang


If  HDFS is configured with HA, we cannot use WebHdfsSensor to check for file 
existence since WebHdfs cannot resolve the name service ID. Consider using 
[pyarrow.hdfs|https://arrow.apache.org/docs/python/filesystems.html] as a 
replacement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io commented on issue #3752: [AIRFLOW-XXX] Make pip install commands consistent

2018-08-14 Thread GitBox
codecov-io commented on issue #3752: [AIRFLOW-XXX] Make pip install commands 
consistent
URL: 
https://github.com/apache/incubator-airflow/pull/3752#issuecomment-413052441
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=h1)
 Report
   > Merging 
[#3752](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3752/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3752  +/-   ##
   ==
   - Coverage   77.67%   77.65%   -0.03% 
   ==
 Files 204  204  
 Lines   1584615846  
   ==
   - Hits1230912305   -4 
   - Misses   3537 3541   +4
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[...irflow/example\_dags/example\_kubernetes\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3752/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9rdWJlcm5ldGVzX29wZXJhdG9yLnB5)
 | `75% <ø> (ø)` | :arrow_up: |
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3752/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `82.49% <0%> (-0.27%)` | :arrow_down: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3752/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=footer).
 Last update 
[f7602f8...7cbf615](https://codecov.io/gh/apache/incubator-airflow/pull/3752?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs

2018-08-14 Thread GitBox
jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged 
DAGs
URL: 
https://github.com/apache/incubator-airflow/pull/3749#issuecomment-413050166
 
 
   Travis passes w/the exception of flake8, which is not passing due to 
monkey-patching that pep8 doesn't like 
https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/utils.py#L21
   
   Not sure what you want me to do there... if you monkey-patch that way, pep8 
will yell that the subsequent imports aren't at the top of the file.
   ![screenshot 2018-08-14 at 4 41 16 
pm](https://user-images.githubusercontent.com/463861/44124218-e35e814a-9fe0-11e8-9d1b-6ada37dfa7aa.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3751: [AIRFLOW-2524] Add Amazon SageMaker Tuning

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3751: [AIRFLOW-2524] Add Amazon SageMaker 
Tuning
URL: 
https://github.com/apache/incubator-airflow/pull/3751#issuecomment-413049884
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=h1)
 Report
   > Merging 
[#3751](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3751/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3751   +/-   ##
   ===
 Coverage   77.67%   77.67%   
   ===
 Files 204  204   
 Lines   1584615846   
   ===
 Hits1230912309   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=footer).
 Last update 
[f7602f8...5167865](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3751: [AIRFLOW-2524] Add Amazon SageMaker Tuning

2018-08-14 Thread GitBox
codecov-io commented on issue #3751: [AIRFLOW-2524] Add Amazon SageMaker Tuning
URL: 
https://github.com/apache/incubator-airflow/pull/3751#issuecomment-413049884
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=h1)
 Report
   > Merging 
[#3751](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3751/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3751   +/-   ##
   ===
 Coverage   77.67%   77.67%   
   ===
 Files 204  204   
 Lines   1584615846   
   ===
 Hits1230912309   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=footer).
 Last update 
[f7602f8...5167865](https://codecov.io/gh/apache/incubator-airflow/pull/3751?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on issue #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
fenglu-g commented on issue #3744: [AIRFLOW-2893] fix stuck dataflow job due to 
name mismatch
URL: 
https://github.com/apache/incubator-airflow/pull/3744#issuecomment-413048477
 
 
   @kaxil could you merge the PR if it looks good to you as well? Thank!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix 
stuck dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210135260
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,48 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+line = ''.join(self._proc.stderr.readlines())
+self.log.warning(line[:-1])
+return line
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
+line = ''.join(self._proc.stdout.readlines())
+self.log.info(line[:-1])
 return line
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+# Job id info: https://goo.gl/SE29y9.
+job_id_pattern = re.compile(
+
b'.*console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
 
 Review comment:
   Actually this should be fine. Even though there is a location in the 
hierarchy, the operator is only launching one job, so job id collision is not a 
concern.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3750: [AIRFLOW-XXX] Clean up installation extra packages table

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3750: [AIRFLOW-XXX] Clean up installation 
extra packages table
URL: 
https://github.com/apache/incubator-airflow/pull/3750#issuecomment-413046863
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=h1)
 Report
   > Merging 
[#3750](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3750/graphs/tree.svg?height=150=650=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3750   +/-   ##
   ===
 Coverage   77.67%   77.67%   
   ===
 Files 204  204   
 Lines   1584615846   
   ===
 Hits1230912309   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=footer).
 Last update 
[f7602f8...6065591](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3750: [AIRFLOW-XXX] Clean up installation extra packages table

2018-08-14 Thread GitBox
codecov-io commented on issue #3750: [AIRFLOW-XXX] Clean up installation extra 
packages table
URL: 
https://github.com/apache/incubator-airflow/pull/3750#issuecomment-413046863
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=h1)
 Report
   > Merging 
[#3750](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/f7602f8266559e55bc602a9639e3e1ab640f30e8?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3750/graphs/tree.svg?token=WdLKlKHOAU=650=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3750   +/-   ##
   ===
 Coverage   77.67%   77.67%   
   ===
 Files 204  204   
 Lines   1584615846   
   ===
 Hits1230912309   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=footer).
 Last update 
[f7602f8...6065591](https://codecov.io/gh/apache/incubator-airflow/pull/3750?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-413044361
 
 
   Thank you @feng-tao @ashb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck 
dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210130372
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,48 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+line = ''.join(self._proc.stderr.readlines())
+self.log.warning(line[:-1])
+return line
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
+line = ''.join(self._proc.stdout.readlines())
+self.log.info(line[:-1])
 return line
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+# Job id info: https://goo.gl/SE29y9.
+job_id_pattern = re.compile(
+
b'.*console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
 
 Review comment:
   Note that this is the URL of the Cloud Console job monitoring page where 
job-id is a flat thing (i.e., no location hierarchy). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] troychen728 opened a new pull request #3751: [AIRFLOW-2524] Add Amazon SageMaker Tuning

2018-08-14 Thread GitBox
troychen728 opened a new pull request #3751: [AIRFLOW-2524] Add Amazon 
SageMaker Tuning
URL: https://github.com/apache/incubator-airflow/pull/3751
 
 
   Make sure you have checked _all_ steps below.
   
   ### JIRA
   - [X] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. 
   - https://issues.apache.org/jira/browse/AIRFLOW-2524
   
   ### Description
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   - This PR allows user to start a Amazon SageMaker Hyper_Parameter_Tuning 
job using the SageMakerCreateHyperParameterTuningJobOperator
   - User can also check the progress(state) of the tuning job through the 
SageMakerTuningSensor
   
   
   ### Tests
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
- tests/contrib/hooks/test_sagemaker_hook.py
- tests/contrib/operators/test_sagemaker_create_tuning_job_operator.py
- tests/contrib/sensors/test_sagemaker_tuning_sensor.py
 
   
   
   
   ### Commits
   - [X] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Documentation
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
   - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   
   ### Code Quality
   - [X] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix 
stuck dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210130649
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,48 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+line = ''.join(self._proc.stderr.readlines())
+self.log.warning(line[:-1])
+return line
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
+line = ''.join(self._proc.stdout.readlines())
+self.log.info(line[:-1])
 return line
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+# Job id info: https://goo.gl/SE29y9.
+job_id_pattern = re.compile(
+
b'.*console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
 
 Review comment:
   The linked line includes location- I think location is in the hierarchy from 
looking at it as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580527#comment-16580527
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 opened a new pull request #3751: [AIRFLOW-2524] Add Amazon 
SageMaker Tuning
URL: https://github.com/apache/incubator-airflow/pull/3751
 
 
   Make sure you have checked _all_ steps below.
   
   ### JIRA
   - [X] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. 
   - https://issues.apache.org/jira/browse/AIRFLOW-2524
   
   ### Description
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   - This PR allows user to start a Amazon SageMaker Hyper_Parameter_Tuning 
job using the SageMakerCreateHyperParameterTuningJobOperator
   - User can also check the progress(state) of the tuning job through the 
SageMakerTuningSensor
   
   
   ### Tests
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
- tests/contrib/hooks/test_sagemaker_hook.py
- tests/contrib/operators/test_sagemaker_create_tuning_job_operator.py
- tests/contrib/sensors/test_sagemaker_tuning_sensor.py
 
   
   
   
   ### Commits
   - [X] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Documentation
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
   - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   
   ### Code Quality
   - [X] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck 
dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r21012
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,38 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+return self._proc.stderr.readline()
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
-return line
+return self._proc.stdout.readline()
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+job_id_pattern = re.compile(
+
'.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
+matched_job = job_id_pattern.match(line or '')
+if matched_job:
+return matched_job.group(1)
 
 def wait_for_done(self):
 reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()]
 self.log.info("Start waiting for DataFlow process to complete.")
-while self._proc.poll() is None:
+job_id = None
+while True:
 ret = select.select(reads, [], [], 5)
 if ret is not None:
 for fd in ret[0]:
 line = self._line(fd)
 if line:
-self.log.debug(line[:-1])
+self.log.info(line[:-1])
+job_id = job_id or self._extract_job(line)
 else:
 self.log.info("Waiting for DataFlow process to complete.")
+if self._proc.poll() is not None:
 
 Review comment:
   Thanks for the suggestion, note that we are not trying to read/write to 
STDIN/STDOUT. 
   Revised to do one more round of stderr/stdout reading even after subprocess 
ends. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
fenglu-g commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck 
dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210127353
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,38 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+return self._proc.stderr.readline()
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
-return line
+return self._proc.stdout.readline()
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+job_id_pattern = re.compile(
+
'.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
+matched_job = job_id_pattern.match(line or '')
+if matched_job:
+return matched_job.group(1)
 
 def wait_for_done(self):
 reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()]
 self.log.info("Start waiting for DataFlow process to complete.")
-while self._proc.poll() is None:
+job_id = None
+while True:
 ret = select.select(reads, [], [], 5)
 if ret is not None:
 for fd in ret[0]:
 line = self._line(fd)
 if line:
-self.log.debug(line[:-1])
+self.log.info(line[:-1])
 
 Review comment:
   Good point, done. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tedmiston opened a new pull request #3750: [AIRFLOW-XXX] Cleanup installation extra packages table

2018-08-14 Thread GitBox
tedmiston opened a new pull request #3750: [AIRFLOW-XXX] Cleanup installation 
extra packages table
URL: https://github.com/apache/incubator-airflow/pull/3750
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Clean up installation extra packages table
   
   - Sort the extra packages table
   - Use official product names
   - Improve capitalization
   - Make table whitespace consistent.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   n/a - docs
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on a change in pull request #3730: [AIRFLOW-2882] Add import and export for pool cli using JSON

2018-08-14 Thread GitBox
r39132 commented on a change in pull request #3730: [AIRFLOW-2882] Add import 
and export for pool cli using JSON
URL: https://github.com/apache/incubator-airflow/pull/3730#discussion_r210125544
 
 

 ##
 File path: tests/cli/test_cli.py
 ##
 @@ -165,3 +166,38 @@ def test_local_run(self):
 ti.refresh_from_db()
 state = ti.current_state()
 self.assertEqual(state, State.SUCCESS)
+
+def test_cli_pool_import_export(self):
+pool_config_input = {
+"s3_pool": {
+"description": "This is my test s3_pool",
+"slots": 5
+},
+"s3_pool2": {
+"description": "This is my test s3_pool",
+"slots": 8
+}
+}
+with open('pool_import.json', mode='w', encoding='utf-8') as f:
+json.dump(pool_config_input, f)
+process_import = psutil.Popen(["airflow", "pool", "-i", 
"pool_import.json"])
+sleep(3)  # wait for webserver to start
 
 Review comment:
   @Fokko This looks good to me.. .do you want to take a quick look? The test 
failures are unrelated to the code changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jbacon commented on issue #3475: [AIRFLOW-2315] Improve S3Hook

2018-08-14 Thread GitBox
jbacon commented on issue #3475: [AIRFLOW-2315] Improve S3Hook
URL: 
https://github.com/apache/incubator-airflow/pull/3475#issuecomment-413036051
 
 
   Commits will be squashed, linting will be performed, and I'm working on 
getting these tests to pass.
   
   Regarding Test Failures:
   It seems improper `AwsHook` instantiation is causing most of my headache 
here. Some code uses `aws_conn_id=None` by default, which breaks my usage of 
`BaseHook.get_connection(conn_id)` (in order to fetch the json extras). I'm 
attempting to refactor all instances of `aws_conn_id=None`, changing them to 
`aws_conn_id="aws_default"`. I don't think this should cause backward 
compatibility issues. I appreciate any code reviews. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs

2018-08-14 Thread GitBox
jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged 
DAGs
URL: 
https://github.com/apache/incubator-airflow/pull/3749#issuecomment-413025529
 
 
   Done. Travis whined about python3 so I updated to use `io.open` and tested 
again manually.
   
   Is there a way to force the www_rbac code path in a local `airflow 
webserver`? I'm a n00b here and had no idea that path even existed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
kaxil commented on a change in pull request #3725: [AIRFLOW-2877] Make docs 
site URL consistent everywhere
URL: https://github.com/apache/incubator-airflow/pull/3725#discussion_r210112637
 
 

 ##
 File path: CONTRIBUTING.md
 ##
 @@ -61,8 +61,16 @@ If you are proposing a feature:
 
 ## Documentation
 
-The latest API documentation is usually available
-[here](https://airflow.incubator.apache.org/). To generate a local version,
+The Airflow documentation is located at:
+
+-  (points to
 
 Review comment:
   I think for now the plan is to stick with apache for stable docs and not 
adding anything new or redirecting it to readthedocs (atleast for time being).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
kaxil commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: 
https://github.com/apache/incubator-airflow/pull/3748#issuecomment-413024537
 
 
   @feng-tao Would you or someone with good Flask experience (e.g. @jgao54 , 
@Fokko ) be able to help with adding a test for this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ChengzhiZhao commented on a change in pull request #3730: [AIRFLOW-2882] Add import and export for pool cli using JSON

2018-08-14 Thread GitBox
ChengzhiZhao commented on a change in pull request #3730: [AIRFLOW-2882] Add 
import and export for pool cli using JSON
URL: https://github.com/apache/incubator-airflow/pull/3730#discussion_r210111838
 
 

 ##
 File path: tests/cli/test_cli.py
 ##
 @@ -165,3 +166,38 @@ def test_local_run(self):
 ti.refresh_from_db()
 state = ti.current_state()
 self.assertEqual(state, State.SUCCESS)
+
+def test_cli_pool_import_export(self):
+pool_config_input = {
+"s3_pool": {
+"description": "This is my test s3_pool",
+"slots": 5
+},
+"s3_pool2": {
+"description": "This is my test s3_pool",
+"slots": 8
+}
+}
+with open('pool_import.json', mode='w', encoding='utf-8') as f:
+json.dump(pool_config_input, f)
+process_import = psutil.Popen(["airflow", "pool", "-i", 
"pool_import.json"])
+sleep(3)  # wait for webserver to start
 
 Review comment:
   @Fokko I moved the test to core.py. Please review. thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tedmiston commented on a change in pull request #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
tedmiston commented on a change in pull request #3725: [AIRFLOW-2877] Make docs 
site URL consistent everywhere
URL: https://github.com/apache/incubator-airflow/pull/3725#discussion_r210111382
 
 

 ##
 File path: CONTRIBUTING.md
 ##
 @@ -61,8 +61,16 @@ If you are proposing a feature:
 
 ## Documentation
 
-The latest API documentation is usually available
-[here](https://airflow.incubator.apache.org/). To generate a local version,
+The Airflow documentation is located at:
+
+-  (points to
 
 Review comment:
   @kaxil Thank you for clarifying.  I was operating under the assumption that 
`airflow.apache.org` was set as a `CNAME` to `airflow.readthedocs.io` using the 
domain setting on Read the Docs 
(https://readthedocs.org/dashboard/airflow/domains/).  Is this something that's 
been considered?
   
   Is the longterm plan to keep the separate Apache site?  For instance, 
suppose we have 2 LTS releases like 1.10 and 2.0, would the Apache site then 
serve both versions with the version dropdown?
   
   The reference example I'm thinking of is how Django hosts all versions on 
one site - https://docs.djangoproject.com/ (try bottom right version button).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs

2018-08-14 Thread GitBox
bolkedebruin commented on issue #3749: [AIRFLOW-2900] Show code for packaged 
DAGs
URL: 
https://github.com/apache/incubator-airflow/pull/3749#issuecomment-413014026
 
 
   Great, please add the rbac view changed as well. `www` will disappear soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2900) Code not visible for Packaged DAGs

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580420#comment-16580420
 ] 

ASF GitHub Bot commented on AIRFLOW-2900:
-

jakebiesinger opened a new pull request #3749: [AIRFLOW-2900] Show code for 
packaged DAGs
URL: https://github.com/apache/incubator-airflow/pull/3749
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Packaged DAGs currently fail on the `Code` page with the error `[Errno 20] 
Not a directory: ...`
   
   ![screenshot 2018-08-14 at 11 22 46 
am](https://user-images.githubusercontent.com/463861/44110455-94264706-9fb4-11e8-90bb-e0b4efd0eac8.png)
   
   
   This PR fixes the screen:
   
   ![screenshot 2018-08-14 at 11 21 22 
am](https://user-images.githubusercontent.com/463861/44110464-9e98ce52-9fb4-11e8-8c44-1c6d3925b74c.png)
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
 - Tests added for open_maybe_zip
 - Manual testing of the Code screen
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Code not visible for Packaged DAGs
> --
>
> Key: AIRFLOW-2900
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2900
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Affects Versions: Airflow 1.9.0
>Reporter: Jacob Biesinger
>Assignee: Jacob Biesinger
>Priority: Minor
>
> Packaged DAGs are present on the server as ZIP files. The [rendering 
> code|https://github.com/apache/incubator-airflow/blob/a29fe350164937b28f525b46f7aecbc309665e5a/airflow/www/views.py#L668]
>  is not aware of zip files and fails to show code for packaged apps.
>  
> Easy fix: If .zip appears as a suffix in the path components, attempt to open 
> the file using ZipFile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jakebiesinger opened a new pull request #3749: [AIRFLOW-2900] Show code for packaged DAGs

2018-08-14 Thread GitBox
jakebiesinger opened a new pull request #3749: [AIRFLOW-2900] Show code for 
packaged DAGs
URL: https://github.com/apache/incubator-airflow/pull/3749
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Packaged DAGs currently fail on the `Code` page with the error `[Errno 20] 
Not a directory: ...`
   
   ![screenshot 2018-08-14 at 11 22 46 
am](https://user-images.githubusercontent.com/463861/44110455-94264706-9fb4-11e8-90bb-e0b4efd0eac8.png)
   
   
   This PR fixes the screen:
   
   ![screenshot 2018-08-14 at 11 21 22 
am](https://user-images.githubusercontent.com/463861/44110464-9e98ce52-9fb4-11e8-8c44-1c6d3925b74c.png)
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
 - Tests added for open_maybe_zip
 - Manual testing of the Code screen
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
kaxil commented on a change in pull request #3725: [AIRFLOW-2877] Make docs 
site URL consistent everywhere
URL: https://github.com/apache/incubator-airflow/pull/3725#discussion_r210093902
 
 

 ##
 File path: CONTRIBUTING.md
 ##
 @@ -61,8 +61,16 @@ If you are proposing a feature:
 
 ## Documentation
 
-The latest API documentation is usually available
-[here](https://airflow.incubator.apache.org/). To generate a local version,
+The Airflow documentation is located at:
+
+-  (points to
 
 Review comment:
   This is not true as I said in the comments https://airflow.apache.org hosts 
the docs for stable version on Apache website. RTD stores versioned docs 
including stable but that is on RTD servers not apache.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
kaxil commented on issue #3725: [AIRFLOW-2877] Make docs site URL consistent 
everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-413005844
 
 
   @tedmiston To answer your question: https://airflow.apache.org is where we 
host our static documentation for the stable release, it is not hosted on 
readthedocs, hence it doesn't have any version dropdown. We would have 
versioned docs at RTD but https://airflow.apache.org will always have docs for 
stable release only.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bnutt commented on issue #3683: [AIRFLOW-2770] kubernetes: add support for dag folder in the docker i…

2018-08-14 Thread GitBox
bnutt commented on issue #3683: [AIRFLOW-2770] kubernetes: add support for dag 
folder in the docker i…
URL: 
https://github.com/apache/incubator-airflow/pull/3683#issuecomment-413002376
 
 
   LGTM, i've needed a change like this so I can stuff all my different DAG 
repositories into the docker image we're using for deployments :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412990175
 
 
   @dimberman rebased. 
   @ashb fixed UPDATING.md


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
bolkedebruin commented on a change in pull request #3740: [AIRFLOW-2888] Remove 
shell=True and bash from task launch
URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r210071170
 
 

 ##
 File path: airflow/config_templates/default_airflow.cfg
 ##
 @@ -140,7 +140,7 @@ donot_pickle = False
 dagbag_import_timeout = 30
 
 # The class to use for running task instances in a subprocess
-task_runner = BashTaskRunner
+task_runner = StandardTaskRunner
 
 Review comment:
   Ah yes, that behavior grrrmbl


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
feng-tao commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash 
from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412983022
 
 
   https://github.com/apache/incubator-airflow/pull/3738 has been merged to 
master. Just need to rebase this pr from master which should solve the issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao closed pull request #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
feng-tao closed pull request #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: https://github.com/apache/incubator-airflow/pull/3738
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index b957d41355..7a86e1f069 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -250,9 +250,8 @@ worker_refresh_batch_size = 1
 worker_refresh_interval = 30
 
 # Secret key used to run your flask app
-# If default value is given ("temporary_key"), a random secret_key will be 
generated
-# when you launch your webserver for security reason
-secret_key = temporary_key
+# It should be as random as possible
+secret_key = {SECRET_KEY}
 
 # Number of workers to run the Gunicorn web server
 workers = 4
diff --git a/airflow/configuration.py b/airflow/configuration.py
index ed8943ac77..9e80648c74 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -22,6 +22,7 @@
 from __future__ import print_function
 from __future__ import unicode_literals
 
+from base64 import b64encode
 from builtins import str
 from collections import OrderedDict
 import copy
@@ -478,6 +479,8 @@ def parameterized_config(template):
 else:
 FERNET_KEY = ''
 
+SECRET_KEY = b64encode(os.urandom(16)).decode('utf-8')
+
 TEMPLATE_START = (
 '# --- TEMPLATE BEGINS HERE ---')
 if not os.path.isfile(TEST_CONFIG_FILE):
diff --git a/airflow/www/app.py b/airflow/www/app.py
index 319fe11ada..f7976b0dd5 100644
--- a/airflow/www/app.py
+++ b/airflow/www/app.py
@@ -49,13 +49,7 @@ def create_app(config=None, testing=False):
 
 app = Flask(__name__)
 app.wsgi_app = ProxyFix(app.wsgi_app)
-
-if configuration.conf.get('webserver', 'SECRET_KEY') == "temporary_key":
-log.info("SECRET_KEY for Flask App is not specified. Using a random 
one.")
-app.secret_key = os.urandom(16)
-else:
-app.secret_key = configuration.conf.get('webserver', 'SECRET_KEY')
-
+app.secret_key = configuration.conf.get('webserver', 'SECRET_KEY')
 app.config['LOGIN_DISABLED'] = not configuration.conf.getboolean(
 'webserver', 'AUTHENTICATE')
 
diff --git a/airflow/www_rbac/app.py b/airflow/www_rbac/app.py
index 8d3400a668..b319426aa9 100644
--- a/airflow/www_rbac/app.py
+++ b/airflow/www_rbac/app.py
@@ -43,10 +43,7 @@ def create_app(config=None, session=None, testing=False, 
app_name="Airflow"):
 global app, appbuilder
 app = Flask(__name__)
 app.wsgi_app = ProxyFix(app.wsgi_app)
-if conf.get('webserver', 'SECRET_KEY') == "temporary_key":
-app.secret_key = os.urandom(16)
-else:
-app.secret_key = conf.get('webserver', 'SECRET_KEY')
+app.secret_key = conf.get('webserver', 'SECRET_KEY')
 
 airflow_home_path = conf.get('core', 'AIRFLOW_HOME')
 webserver_config_path = airflow_home_path + '/webserver_config.py'


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2886) Secure Flask SECRET_KEY

2018-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580293#comment-16580293
 ] 

ASF subversion and git services commented on AIRFLOW-2886:
--

Commit f7602f8266559e55bc602a9639e3e1ab640f30e8 in incubator-airflow's branch 
refs/heads/master from Xiaodong
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=f7602f8 ]

[AIRFLOW-2886] Secure Flask SECRET_KEY (#3738)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

> Secure Flask SECRET_KEY
> ---
>
> Key: AIRFLOW-2886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2886
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In my earlier PRs, [https://github.com/apache/incubator-airflow/pull/3651] 
> and [https://github.com/apache/incubator-airflow/pull/3729] , I proposed to 
> generate random SECRET_KEY for Flask App.
> If we have multiple workers for the Flask webserver, we may encounter CSRF 
> error {{The CSRF session token is missing}} .
> On the other hand, it's still very important to have as random SECRET_KEY as 
> possible for security reasons. We can deal with it like how we dealt with 
> FERNET_KEY (i.e. generate a random value when the airflow.cfg file is 
> initiated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2886) Secure Flask SECRET_KEY

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580292#comment-16580292
 ] 

ASF GitHub Bot commented on AIRFLOW-2886:
-

feng-tao closed pull request #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: https://github.com/apache/incubator-airflow/pull/3738
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index b957d41355..7a86e1f069 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -250,9 +250,8 @@ worker_refresh_batch_size = 1
 worker_refresh_interval = 30
 
 # Secret key used to run your flask app
-# If default value is given ("temporary_key"), a random secret_key will be 
generated
-# when you launch your webserver for security reason
-secret_key = temporary_key
+# It should be as random as possible
+secret_key = {SECRET_KEY}
 
 # Number of workers to run the Gunicorn web server
 workers = 4
diff --git a/airflow/configuration.py b/airflow/configuration.py
index ed8943ac77..9e80648c74 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -22,6 +22,7 @@
 from __future__ import print_function
 from __future__ import unicode_literals
 
+from base64 import b64encode
 from builtins import str
 from collections import OrderedDict
 import copy
@@ -478,6 +479,8 @@ def parameterized_config(template):
 else:
 FERNET_KEY = ''
 
+SECRET_KEY = b64encode(os.urandom(16)).decode('utf-8')
+
 TEMPLATE_START = (
 '# --- TEMPLATE BEGINS HERE ---')
 if not os.path.isfile(TEST_CONFIG_FILE):
diff --git a/airflow/www/app.py b/airflow/www/app.py
index 319fe11ada..f7976b0dd5 100644
--- a/airflow/www/app.py
+++ b/airflow/www/app.py
@@ -49,13 +49,7 @@ def create_app(config=None, testing=False):
 
 app = Flask(__name__)
 app.wsgi_app = ProxyFix(app.wsgi_app)
-
-if configuration.conf.get('webserver', 'SECRET_KEY') == "temporary_key":
-log.info("SECRET_KEY for Flask App is not specified. Using a random 
one.")
-app.secret_key = os.urandom(16)
-else:
-app.secret_key = configuration.conf.get('webserver', 'SECRET_KEY')
-
+app.secret_key = configuration.conf.get('webserver', 'SECRET_KEY')
 app.config['LOGIN_DISABLED'] = not configuration.conf.getboolean(
 'webserver', 'AUTHENTICATE')
 
diff --git a/airflow/www_rbac/app.py b/airflow/www_rbac/app.py
index 8d3400a668..b319426aa9 100644
--- a/airflow/www_rbac/app.py
+++ b/airflow/www_rbac/app.py
@@ -43,10 +43,7 @@ def create_app(config=None, session=None, testing=False, 
app_name="Airflow"):
 global app, appbuilder
 app = Flask(__name__)
 app.wsgi_app = ProxyFix(app.wsgi_app)
-if conf.get('webserver', 'SECRET_KEY') == "temporary_key":
-app.secret_key = os.urandom(16)
-else:
-app.secret_key = conf.get('webserver', 'SECRET_KEY')
+app.secret_key = conf.get('webserver', 'SECRET_KEY')
 
 airflow_home_path = conf.get('core', 'AIRFLOW_HOME')
 webserver_config_path = airflow_home_path + '/webserver_config.py'


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Secure Flask SECRET_KEY
> ---
>
> Key: AIRFLOW-2886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2886
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In my earlier PRs, [https://github.com/apache/incubator-airflow/pull/3651] 
> and [https://github.com/apache/incubator-airflow/pull/3729] , I proposed to 
> generate random SECRET_KEY for Flask App.
> If we have multiple workers for the Flask webserver, we may encounter CSRF 
> error {{The CSRF session token is missing}} .
> On the other hand, it's still very important to have as random SECRET_KEY as 
> possible for security reasons. We can deal with it like how we dealt with 
> FERNET_KEY (i.e. generate a random value when the airflow.cfg file is 
> initiated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
feng-tao commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412982701
 
 
   sounds good @ashb . Let's go ahead to merge this pr to unblock master and 
document the functionality you mentioned later. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
feng-tao commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412982734
 
 
   thanks @XD-DENG  for fixing the issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
ashb commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412976333
 
 
   To use an external secret store this could be extended to use the (already 
existing) `_cmd` functionality that exist in airflow for certain config options 
(sql alchemy connection instance) - we'd just need to add this option to the 
list that are read from cmds.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-14 Thread GitBox
ashb commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412975884
 
 
   My vote is to go with this approach (secure by default and easy for the 
common case) plus a bit of documentation saying that you will want to make sure 
that this value is the same across multiple machines if you run more than one 
behind a load balancer etc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
feng-tao edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412974281
 
 
   @dimberman , @bolkedebruin,  this is related to 
https://github.com/apache/incubator-airflow/pull/3651 and 
https://github.com/apache/incubator-airflow/pull/3729 which uses random value 
for secret key.  The original pr owner has a proposed 
fix(https://github.com/apache/incubator-airflow/pull/3738) which only works if 
the webserver is deployed on a single machine(not for a cluster of machines for 
webservers). 
   
   There are two solutions:
   1.  if we don't have a use case for a cluster of webserver, we could go 
ahead to merge his pr.
   2. if we do, I think we should revert the original two prs and but update 
the descriptions to indicate users need to update the secret key(e.g read from 
certain key management services).
   
   I am ok with either of the solution, but would like to hear the communities' 
opinions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
feng-tao edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412974281
 
 
   @dimberman , @bolkedebruin,  this is related to 
https://github.com/apache/incubator-airflow/pull/3651 and 
https://github.com/apache/incubator-airflow/pull/3729 which uses random value 
for secret key.  The original pr owner has a proposed 
fix(https://github.com/apache/incubator-airflow/pull/3738) which only works if 
the webserver is deployed on a single machine(not for a cluster of machines for 
webservers). 
   
   There are two solutions:
   1.  if we don't have a use case for a cluster of webserver, we could go 
ahead to merge his pr.
   2. if we do, I think we should revert the original two prs and but update 
the descriptions to indicate users need to update the secret key(e.g read from 
certain key management services).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
feng-tao commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash 
from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412974281
 
 
   @dimberman , @bolkedebruin,  this is related to 
https://github.com/apache/incubator-airflow/pull/3651 and 
https://github.com/apache/incubator-airflow/pull/3729 which uses random value 
for secret key.  The original pr owner has a proposed 
fix(https://github.com/apache/incubator-airflow/pull/3738) which only works if 
the webserver is deployed on a single machine(not for a cluster of machine for 
webservers). 
   
   There are two solutions:
   1.  if we don't have a use case for a cluster of webserver, we could go 
ahead to merge his pr.
   2. if we do, I think we should revert the original two prs and but update 
the descriptions to indicate users need to update the secret key(e.g read from 
certain key management services).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
dimberman edited a comment on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412973048
 
 
   @bolkedebruin This doesn't have to do with this ticket, but I am unable to 
log into airflow from either your branch or the current master branch. Login 
works from from the 1-10-stable branch.
   
   When I put in credentials for the login page I get the following 400 error 
   ```
   Bad Request
   
   The CSRF session token is missing.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
dimberman commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash 
from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412973048
 
 
   @bolkedebruin This doesn't have to do with this ticket, but I am unable to 
log into airflow from either your branch or the current master branch. Login 
works from from the 1-10-stable branch.
   
   When I put in credentials for the login page I get the following 400 error 
```Bad Request
   
   The CSRF session token is missing.```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
ashb commented on a change in pull request #3740: [AIRFLOW-2888] Remove 
shell=True and bash from task launch
URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r210055138
 
 

 ##
 File path: airflow/config_templates/default_airflow.cfg
 ##
 @@ -140,7 +140,7 @@ donot_pickle = False
 dagbag_import_timeout = 30
 
 # The class to use for running task instances in a subprocess
-task_runner = BashTaskRunner
+task_runner = StandardTaskRunner
 
 Review comment:
   So, because of our approach of writing out a copy of the default_airflow.cfg 
as airflow.cfg on first run anyone upgrading will have their task_runner 
configured to BashTaskRunner, so we'll at least need to mention this in the 
UPDATING instructions for people upgrading.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
dimberman commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash 
from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412965891
 
 
   @bolkedebruin Sorry was traveling yesterday. Will check out and test now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-14 Thread GitBox
bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412963977
 
 
   @dimberman ping?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction

2018-08-14 Thread GitBox
feng-tao commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction
URL: 
https://github.com/apache/incubator-airflow/pull/3138#issuecomment-412948390
 
 
   Andrew @astahlman , FYI, Airflow starts using AIF(airflow improvement 
proposal). With this kinda change, I think we need to have a wiki to document 
the initial 
design(https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412940914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=h1)
 Report
   > Merging 
[#3725](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3725/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3725  +/-   ##
   ==
   - Coverage   77.67%   77.66%   -0.01% 
   ==
 Files 204  204  
 Lines   1584915849  
   ==
   - Hits1231012309   -1 
   - Misses   3539 3540   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=)
 | `96.77% <ø> (ø)` | :arrow_up: |
   | 
[airflow/example\_dags/tutorial.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdHV0b3JpYWwucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=footer).
 Last update 
[9d68fa3...e524245](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412940914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=h1)
 Report
   > Merging 
[#3725](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3725/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3725  +/-   ##
   ==
   - Coverage   77.67%   77.66%   -0.01% 
   ==
 Files 204  204  
 Lines   1584915849  
   ==
   - Hits1231012309   -1 
   - Misses   3539 3540   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=)
 | `96.77% <ø> (ø)` | :arrow_up: |
   | 
[airflow/example\_dags/tutorial.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdHV0b3JpYWwucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=footer).
 Last update 
[9d68fa3...e524245](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412940914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=h1)
 Report
   > Merging 
[#3725](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3725/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3725  +/-   ##
   ==
   - Coverage   77.67%   77.66%   -0.01% 
   ==
 Files 204  204  
 Lines   1584915849  
   ==
   - Hits1231012309   -1 
   - Misses   3539 3540   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=)
 | `96.77% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <ø> (ø)` | :arrow_up: |
   | 
[airflow/example\_dags/tutorial.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdHV0b3JpYWwucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=footer).
 Last update 
[9d68fa3...e524245](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
codecov-io edited a comment on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412940914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=h1)
 Report
   > Merging 
[#3725](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3725/graphs/tree.svg?token=WdLKlKHOAU=650=pr=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3725  +/-   ##
   ==
   - Coverage   77.67%   77.66%   -0.01% 
   ==
 Files 204  204  
 Lines   1584915849  
   ==
   - Hits1231012309   -1 
   - Misses   3539 3540   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=)
 | `96.77% <ø> (ø)` | :arrow_up: |
   | 
[airflow/example\_dags/tutorial.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdHV0b3JpYWwucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=footer).
 Last update 
[9d68fa3...e524245](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
codecov-io commented on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412940914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=h1)
 Report
   > Merging 
[#3725](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3725/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3725   +/-   ##
   ===
 Coverage   77.67%   77.67%   
   ===
 Files 204  204   
 Lines   1584915849   
   ===
 Hits1231012310   
 Misses   3539 3539
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/example\_dags/tutorial.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdHV0b3JpYWwucHk=)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=)
 | `96.77% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3725/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <ø> (ø)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=footer).
 Last update 
[9d68fa3...e524245](https://codecov.io/gh/apache/incubator-airflow/pull/3725?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tedmiston commented on issue #3725: [AIRFLOW-2877] Make docs site URL consistent everywhere

2018-08-14 Thread GitBox
tedmiston commented on issue #3725: [AIRFLOW-2877] Make docs site URL 
consistent everywhere
URL: 
https://github.com/apache/incubator-airflow/pull/3725#issuecomment-412934302
 
 
   I believe this is ready for final review.
   
   Changes
   
   - @ashb I've updated the CONTRIBUTING.md link to clarify "latest stable".
   - @r39132 I've updated [Building and deploying the 
docs](https://cwiki.apache.org/confluence/display/AIRFLOW/Building+and+deploying+the+docs)
 in the wiki to point to the new latest stable, latest master, and versioned 
links; removed the no longer relevant build info; added some relevant links, 
etc.  And added/updated the links in the repo in README.md and CONTRIBUTING.md 
as well.  Does that all look good to you?
   - I removed trailing slash on https://airflow.apache.org/ everywhere since 
it redirects to no slash.
   - Squashed & rebased
   
   Questions
   
   - It looks like https://airflow.apache.org might _not_ currently be the same 
as https://airflow.readthedocs.io/ (latest stable).  More specifically, the 
former is at least lacking the bottom left corner version dropdown.  Anyone 
know if it's otherwise the same, or if we could enable the version dropdown 
there too?  This is not necessarily a blocker for merging but would be nice to 
have.  (@kaxil or @ashb perhaps?)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] diogoalexandrefranco commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction

2018-08-14 Thread GitBox
diogoalexandrefranco commented on issue #3138: [AIRFLOW-2221] Create DagFetcher 
abstraction
URL: 
https://github.com/apache/incubator-airflow/pull/3138#issuecomment-412931504
 
 
   Hi guys,
   
   Sorry I never got around to the design doc, my free time completely shrank.
   
   Feel free to own this of course, I hope to be able to help again soon
   enough.
   
   There is an open PR which at most may be a decent starting point for some
   of the changes that are required, it is a working implementation of the
   abstraction with the current file system dag fetcher and the plug-in system
   for dag fetchers.
   
   Cheers,
   Diogo
   
   A Ter, 14 de ago de 2018, 17:08, Tao Feng 
   escreveu:
   
   > @astahlman  , let me know how I can help.
   > This should be very interesting.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3728: [AIRFLOW-2883] Not search dag owner if owners are missing

2018-08-14 Thread GitBox
feng-tao commented on issue #3728: [AIRFLOW-2883] Not search dag owner if 
owners are missing
URL: 
https://github.com/apache/incubator-airflow/pull/3728#issuecomment-412928274
 
 
   PTAL @mistercrunch , @r39132 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction

2018-08-14 Thread GitBox
feng-tao commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction
URL: 
https://github.com/apache/incubator-airflow/pull/3138#issuecomment-412926687
 
 
   @astahlman , let me know how I can help. This should be very interesting.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
ashb commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck 
dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210006800
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,38 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+return self._proc.stderr.readline()
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
-return line
+return self._proc.stdout.readline()
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+job_id_pattern = re.compile(
+
'.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
+matched_job = job_id_pattern.match(line or '')
+if matched_job:
+return matched_job.group(1)
 
 def wait_for_done(self):
 reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()]
 self.log.info("Start waiting for DataFlow process to complete.")
-while self._proc.poll() is None:
+job_id = None
+while True:
 ret = select.select(reads, [], [], 5)
 if ret is not None:
 for fd in ret[0]:
 line = self._line(fd)
 if line:
-self.log.debug(line[:-1])
+self.log.info(line[:-1])
+job_id = job_id or self._extract_job(line)
 else:
 self.log.info("Waiting for DataFlow process to complete.")
+if self._proc.poll() is not None:
 
 Review comment:
   Reading each of stdout and stderr independently is hard to do without 
deadlocking one or the other too -you can't just read one to the end then read 
the other. 
https://stackoverflow.com/questions/33886406/how-to-avoid-the-deadlock-in-a-subprocess-without-using-communicate
   
   (Sorry if this is out of context - I haven't looked at the PR, just seen the 
comments)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
feng-tao commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: 
https://github.com/apache/incubator-airflow/pull/3748#issuecomment-412921884
 
 
   @kaxil, it would be good if you could add a test to test the sensitive data 
export case. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mistercrunch commented on issue #3138: [AIRFLOW-2221] Create DagFetcher abstraction

2018-08-14 Thread GitBox
mistercrunch commented on issue #3138: [AIRFLOW-2221] Create DagFetcher 
abstraction
URL: 
https://github.com/apache/incubator-airflow/pull/3138#issuecomment-412921214
 
 
   @astahlman looks like it's up for grabs, happy to help


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2900) Code not visible for Packaged DAGs

2018-08-14 Thread Jacob Biesinger (JIRA)
Jacob Biesinger created AIRFLOW-2900:


 Summary: Code not visible for Packaged DAGs
 Key: AIRFLOW-2900
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2900
 Project: Apache Airflow
  Issue Type: Bug
  Components: webapp, webserver
Affects Versions: Airflow 1.9.0
Reporter: Jacob Biesinger
Assignee: Jacob Biesinger


Packaged DAGs are present on the server as ZIP files. The [rendering 
code|https://github.com/apache/incubator-airflow/blob/a29fe350164937b28f525b46f7aecbc309665e5a/airflow/www/views.py#L668]
 is not aware of zip files and fails to show code for packaged apps.

 

Easy fix: If .zip appears as a suffix in the path components, attempt to open 
the file using ZipFile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix 
stuck dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r210004105
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,38 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+return self._proc.stderr.readline()
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
-return line
+return self._proc.stdout.readline()
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+job_id_pattern = re.compile(
+
'.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
+matched_job = job_id_pattern.match(line or '')
+if matched_job:
+return matched_job.group(1)
 
 def wait_for_done(self):
 reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()]
 self.log.info("Start waiting for DataFlow process to complete.")
-while self._proc.poll() is None:
+job_id = None
+while True:
 ret = select.select(reads, [], [], 5)
 if ret is not None:
 for fd in ret[0]:
 line = self._line(fd)
 if line:
-self.log.debug(line[:-1])
+self.log.info(line[:-1])
+job_id = job_id or self._extract_job(line)
 else:
 self.log.info("Waiting for DataFlow process to complete.")
+if self._proc.poll() is not None:
 
 Review comment:
   I think this may not always ensure that each STDERR/STDOUT line is processed.
   
   For example, say a process completes ~instantly and logs 100 lines to STDERR 
and STDOUT.  The code as it is now seems like it would only process 1 line of 
each stream, then terminate. You'd need to read until the streams are empty 
(e.g. readlines). We should probably also ensure the streams are empty after 
the process terminates.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cjgu commented on issue #2986: [AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed

2018-08-14 Thread GitBox
cjgu commented on issue #2986: [AIRFLOW-2027] Only trigger sleep in scheduler 
after all files have parsed
URL: 
https://github.com/apache/incubator-airflow/pull/2986#issuecomment-412917112
 
 
   I hit this in production during testing of 1.10 but managed to avoid it by 
tweaking the sleep configurations.
   
   It was triggered by having only `min_file_process_interval = 60` set.
   
   Solved by setting 
   
   ```
   min_file_process_interval = 0
   min_file_parsing_loop_time = 60
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix stuck dataflow job due to name mismatch

2018-08-14 Thread GitBox
TrevorEdwards commented on a change in pull request #3744: [AIRFLOW-2893] fix 
stuck dataflow job due to name mismatch
URL: https://github.com/apache/incubator-airflow/pull/3744#discussion_r209998578
 
 

 ##
 File path: airflow/contrib/hooks/gcp_dataflow_hook.py
 ##
 @@ -124,36 +127,38 @@ def __init__(self, cmd):
 
 def _line(self, fd):
 if fd == self._proc.stderr.fileno():
-lines = self._proc.stderr.readlines()
-for line in lines:
-self.log.warning(line[:-1])
-if lines:
-return lines[-1]
+return self._proc.stderr.readline()
 if fd == self._proc.stdout.fileno():
-line = self._proc.stdout.readline()
-return line
+return self._proc.stdout.readline()
 
 @staticmethod
 def _extract_job(line):
-if line is not None:
-if line.startswith("Submitted job: "):
-return line[15:-1]
+job_id_pattern = re.compile(
+
'.*https://console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*')
+matched_job = job_id_pattern.match(line or '')
+if matched_job:
+return matched_job.group(1)
 
 def wait_for_done(self):
 reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()]
 self.log.info("Start waiting for DataFlow process to complete.")
-while self._proc.poll() is None:
+job_id = None
+while True:
 ret = select.select(reads, [], [], 5)
 if ret is not None:
 for fd in ret[0]:
 line = self._line(fd)
 if line:
-self.log.debug(line[:-1])
+self.log.info(line[:-1])
 
 Review comment:
   We should still log STDERR to warning in case users filter out info-level 
logs and for easier reading.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
codecov-io commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: 
https://github.com/apache/incubator-airflow/pull/3748#issuecomment-412912402
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=h1)
 Report
   > Merging 
[#3748](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/9d68fa337586a6a64b6a9f19fc8f2b079376a4db?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3748/graphs/tree.svg?height=150=pr=WdLKlKHOAU=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3748  +/-   ##
   ==
   - Coverage   77.67%   77.64%   -0.03% 
   ==
 Files 204  204  
 Lines   1584915853   +4 
   ==
   - Hits1231012309   -1 
   - Misses   3539 3544   +5
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3748/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `68.95% <0%> (-0.09%)` | :arrow_down: |
   | 
[airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3748/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==)
 | `72.61% <0%> (-0.11%)` | :arrow_down: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3748/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.78% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=footer).
 Last update 
[9d68fa3...8a73aa1](https://codecov.io/gh/apache/incubator-airflow/pull/3748?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil edited a comment on issue #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
kaxil edited a comment on issue #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: 
https://github.com/apache/incubator-airflow/pull/3748#issuecomment-412897241
 
 
   cc @Fokko @feng-tao @bolkedebruin 
   
   https://github.com/apache/incubator-airflow/pull/1530 covered hiding it from 
the UI but didn't consider variable export.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
kaxil commented on issue #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: 
https://github.com/apache/incubator-airflow/pull/3748#issuecomment-412897241
 
 
   cc @Fokko @feng-tao @bolkedebruin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579895#comment-16579895
 ] 

ASF GitHub Bot commented on AIRFLOW-2899:
-

kaxil opened a new pull request #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: https://github.com/apache/incubator-airflow/pull/3748
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2899
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Currently, the sensitive variable is hidden from being exposed in the Web 
UI. However, if the UI is compromised, someone can export variables where all 
the sensitive variables are exported in plain text format.
   
   
![image](https://user-images.githubusercontent.com/8811558/44098679-bfbf885e-9fd8-11e8-9486-864a93f2ef61.png)
   
   This will still allow an admin to export all the variables from the CLI. The 
main intention here would be incase of an exposed Web UI, the sensitive data 
should still be inaccessible.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Sensitive data exposed when Exporting Variables
> ---
>
> Key: AIRFLOW-2899
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2899
> Project: Apache Airflow
>  Issue Type: Task
>  Components: security
>Affects Versions: 1.9.0, 1.8.2, 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2018-08-14-15-39-17-680.png
>
>
> Currently, the sensitive variable is hidden from being exposed in the Web UI. 
> However, if the UI is compromised, someone can export variables where all the 
> sensitive variables are exported in plain text format.
>  !image-2018-08-14-15-39-17-680.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil opened a new pull request #3748: [AIRFLOW-2899] Hide sensitive data when Exporting Variables

2018-08-14 Thread GitBox
kaxil opened a new pull request #3748: [AIRFLOW-2899] Hide sensitive data when 
Exporting Variables
URL: https://github.com/apache/incubator-airflow/pull/3748
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2899
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Currently, the sensitive variable is hidden from being exposed in the Web 
UI. However, if the UI is compromised, someone can export variables where all 
the sensitive variables are exported in plain text format.
   
   
![image](https://user-images.githubusercontent.com/8811558/44098679-bfbf885e-9fd8-11e8-9486-864a93f2ef61.png)
   
   This will still allow an admin to export all the variables from the CLI. The 
main intention here would be incase of an exposed Web UI, the sensitive data 
should still be inaccessible.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables

2018-08-14 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2899:
---

 Summary: Sensitive data exposed when Exporting Variables
 Key: AIRFLOW-2899
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2899
 Project: Apache Airflow
  Issue Type: Task
  Components: security
Affects Versions: 1.9.0, 1.8.2, 1.10.0
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0
 Attachments: image-2018-08-14-15-39-17-680.png

Currently, the sensitive variable is hidden from being exposed in the Web UI. 
However, if the UI is compromised, someone can export variables where all the 
sensitive variables are exported in plain text format.

 !image-2018-08-14-15-39-17-680.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2898) Task not entering queued state for pool

2018-08-14 Thread rana (JIRA)
rana created AIRFLOW-2898:
-

 Summary: Task not entering queued state for pool
 Key: AIRFLOW-2898
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2898
 Project: Apache Airflow
  Issue Type: Bug
  Components: pools, scheduler
Affects Versions: 1.9.0
Reporter: rana


I have a pool of 3 and have several jobs (over 10) which use the pool.

Tasks timeout (after 10 mins) from being stuck in scheduled state when the 
tasks should be in queued state for the pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] xoen commented on issue #3612: [AIRFLOW-2755] Added `kubernetes.worker_dags_folder` configuration

2018-08-14 Thread GitBox
xoen commented on issue #3612: [AIRFLOW-2755] Added 
`kubernetes.worker_dags_folder` configuration
URL: 
https://github.com/apache/incubator-airflow/pull/3612#issuecomment-412875792
 
 
   And thanks @r4vi for the rebasing when I was away  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …

2018-08-14 Thread GitBox
ashb edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract 
proxy details at the base hook …
URL: 
https://github.com/apache/incubator-airflow/pull/3722#issuecomment-412797224
 
 
   Please address this point:
   
   > As Ravi mentioned on the Jira: it looks like httplib2 can use HTTP_PROXY 
and HTTPS_PROXY environment variables to set a proxy so it's probably best to 
just do that
   
   This PR is a lot of code for something that appears built-in, and I'm minded 
to reject this PR unless you can demonstrate a good reason why this code is 
needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2894) Allow Users to "bake-in" DAGs in Airflow images

2018-08-14 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579446#comment-16579446
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2894:


This sounds like a nice feature.

Just one problem: the Airflow team doesn't publish any images, so what are you 
proposing we change?

> Allow Users to "bake-in" DAGs in Airflow images
> ---
>
> Key: AIRFLOW-2894
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2894
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Daniel Imberman
>Assignee: Daniel Imberman
>Priority: Minor
>
> Multiple Users have asked that we offer the ability to have DAGs baked in to 
> their airflow images at launch (as opposed to using git-mode or a volume 
> claim). This will save start-up time and allow for versioned DAGs via docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] bolkedebruin edited a comment on issue #3684: [AIRFLOW-2840] - add update connections cli option

2018-08-14 Thread GitBox
bolkedebruin edited a comment on issue #3684: [AIRFLOW-2840] - add update 
connections cli option
URL: 
https://github.com/apache/incubator-airflow/pull/3684#issuecomment-412798572
 
 
   I wouldn't put the CRUD operations in separate files. Just use 
api/common/experimental/connections.py for simplicity


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >