[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-08-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577696#comment-16577696
 ] 

ASF subversion and git services commented on AIRFLOW-2524:
--

Commit 4d2f83b19af3489d6c9563d51210a3dab2f38b26 in incubator-airflow's branch 
refs/heads/master from Keliang Chen
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=4d2f83b ]

[AIRFLOW-2524] Add Amazon SageMaker Training (#3658)

Add SageMaker Hook, Training Operator & Sensor
Co-authored-by: srrajeev-aws 

> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-12 Thread GitBox
Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/sagemaker_hook.py 
b/airflow/contrib/hooks/sagemaker_hook.py
new file mode 100644
index 00..8b8e2e41e7
--- /dev/null
+++ b/airflow/contrib/hooks/sagemaker_hook.py
@@ -0,0 +1,241 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+import time
+from botocore.exceptions import ClientError
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_id is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
+ use_db_config=False,
+ region_name=None,
+ check_interval=5,
+ max_ingestion_time=None,
+ *args, **kwargs):
+super(SageMakerHook, self).__init__(*args, **kwargs)
+self.sagemaker_conn_id = sagemaker_conn_id
+self.use_db_config = use_db_config
+self.region_name = region_name
+self.check_interval = check_interval
+self.max_ingestion_time = max_ingestion_time
+self.conn = self.get_conn()
+
+def check_for_url(self, s3url):
+"""
+check if the s3url exists
+:param s3url: S3 url
+:type s3url:str
+:return: bool
+"""
+bucket, key = S3Hook.parse_s3_url(s3url)
+s3hook = S3Hook(aws_conn_id=self.aws_conn_id)
+if not s3hook.check_for_bucket(bucket_name=bucket):
+raise AirflowException(
+"The input S3 Bucket {} does not exist ".format(bucket))
+if not s3hook.check_for_key(key=key, bucket_name=bucket):
+raise AirflowException("The input S3 Key {} does not exist in the 
Bucket"
+   .format(s3url, bucket))
+return True
+
+def check_valid_training_input(self, training_config):
+"""
+Run checks before a training starts
+:param training_config: training_config
+:type training_config: dict
+:return: None
+"""
+for channel in training_config['InputDataConfig']:
+self.check_for_url(channel['DataSource']
+   ['S3DataSource']['S3Uri'])
+
+def check_valid_tuning_input(self, tuning_config):
+"""
+Run checks before a tuning job starts
+:param tuning_config: tuning_config
+:type tuning_config: dict
+:return: None
+"""
+for channel in 
tuning_config['TrainingJobDefinition']['InputDataConfig']:
+self.check_for_url(channel['DataSource']
+   ['S3DataSource']['S3Uri'])
+
+def check_status(self, non_terminal_states,
+ failed_state, key,
+ describe_function, *args):
+"""
+:param non_terminal_states: the set of non_terminal states
+:type non_terminal_states: dict
+:param failed_state: the set of failed states
+:type failed_state: dict
+:param key: the key of the response dict
+that points to the state
+:type key: string
+:param describe_function: the function used to retrieve the status
+:type describe_function: python callable
+:param args: the arguments for the function
+:return: None
+"""
+sec = 0
+running = True
+
+while running:
+
+sec = sec + self.check_interval
+
+if self.max_ingestion_time and sec > self.max_ingestion_time:
+# ensure that the job gets killed if the max ingestion time is 
exceeded
+raise 

[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577695#comment-16577695
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/sagemaker_hook.py 
b/airflow/contrib/hooks/sagemaker_hook.py
new file mode 100644
index 00..8b8e2e41e7
--- /dev/null
+++ b/airflow/contrib/hooks/sagemaker_hook.py
@@ -0,0 +1,241 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+import time
+from botocore.exceptions import ClientError
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_id is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
+ use_db_config=False,
+ region_name=None,
+ check_interval=5,
+ max_ingestion_time=None,
+ *args, **kwargs):
+super(SageMakerHook, self).__init__(*args, **kwargs)
+self.sagemaker_conn_id = sagemaker_conn_id
+self.use_db_config = use_db_config
+self.region_name = region_name
+self.check_interval = check_interval
+self.max_ingestion_time = max_ingestion_time
+self.conn = self.get_conn()
+
+def check_for_url(self, s3url):
+"""
+check if the s3url exists
+:param s3url: S3 url
+:type s3url:str
+:return: bool
+"""
+bucket, key = S3Hook.parse_s3_url(s3url)
+s3hook = S3Hook(aws_conn_id=self.aws_conn_id)
+if not s3hook.check_for_bucket(bucket_name=bucket):
+raise AirflowException(
+"The input S3 Bucket {} does not exist ".format(bucket))
+if not s3hook.check_for_key(key=key, bucket_name=bucket):
+raise AirflowException("The input S3 Key {} does not exist in the 
Bucket"
+   .format(s3url, bucket))
+return True
+
+def check_valid_training_input(self, training_config):
+"""
+Run checks before a training starts
+:param training_config: training_config
+:type training_config: dict
+:return: None
+"""
+for channel in training_config['InputDataConfig']:
+self.check_for_url(channel['DataSource']
+   ['S3DataSource']['S3Uri'])
+
+def check_valid_tuning_input(self, tuning_config):
+"""
+Run checks before a tuning job starts
+:param tuning_config: tuning_config
+:type tuning_config: dict
+:return: None
+"""
+for channel in 
tuning_config['TrainingJobDefinition']['InputDataConfig']:
+self.check_for_url(channel['DataSource']
+   ['S3DataSource']['S3Uri'])
+
+def check_status(self, non_terminal_states,
+ failed_state, key,
+ describe_function, *args):
+"""
+:param non_terminal_states: the set of non_terminal states
+:type non_terminal_states: dict
+:param failed_state: the set of failed states
+:type failed_state: dict
+:param key: the key of the response dict
+that points to the state
+:type key: string
+:param describe_function: the function used to retrieve the status
+:type describe_function: python callable
+:param args: the arguments for the function
+:return: None
+"""
+sec = 0
+running = True
+
+while running:
+
+

[GitHub] Fokko commented on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-12 Thread GitBox
Fokko commented on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
URL: 
https://github.com/apache/incubator-airflow/pull/3658#issuecomment-412360645
 
 
   Thanks @troychen728. Looks good, thanks for the PR. Merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463470
 
 

 ##
 File path: airflow/sensors/hdfs_sensor.py
 ##
 @@ -17,103 +17,231 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import re
-import sys
-from builtins import str
+import posixpath
 
 from airflow import settings
-from airflow.hooks.hdfs_hook import HDFSHook
+from airflow.hooks.hdfs_hook import HdfsHook
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 
-class HdfsSensor(BaseSensorOperator):
-"""
-Waits for a file or folder to land in HDFS
+class HdfsFileSensor(BaseSensorOperator):
+"""Sensor that waits for files matching a specific (glob) pattern to land 
in HDFS.
+
+:param str file_pattern: Glob pattern to match.
+:param str conn_id: Connection to use.
+:param Iterable[FilePathFilter] filters: Optional list of filters that can 
be
+used to apply further filtering to any file paths matching the glob 
pattern.
+Any files that fail a filter are dropped from consideration.
+:param int min_size: Minimum size (in MB) for files to be considered. Can 
be used
+to filter any intermediate files that are below the expected file size.
+:param Set[str] ignore_exts: File extensions to ignore. By default, files 
with
+a '_COPYING_' extension are ignored, as these represent temporary 
files.
 """
-template_fields = ('filepath',)
-ui_color = settings.WEB_COLORS['LIGHTBLUE']
+
+template_fields = ("_pattern",)
+ui_color = settings.WEB_COLORS["LIGHTBLUE"]
 
 @apply_defaults
-def __init__(self,
- filepath,
- hdfs_conn_id='hdfs_default',
- ignored_ext=None,
- ignore_copying=True,
- file_size=None,
- hook=HDFSHook,
- *args,
- **kwargs):
-super(HdfsSensor, self).__init__(*args, **kwargs)
-if ignored_ext is None:
-ignored_ext = ['_COPYING_']
-self.filepath = filepath
-self.hdfs_conn_id = hdfs_conn_id
-self.file_size = file_size
-self.ignored_ext = ignored_ext
-self.ignore_copying = ignore_copying
-self.hook = hook
-
-@staticmethod
-def filter_for_filesize(result, size=None):
-"""
-Will test the filepath result and test if its size is at least 
self.filesize
-
-:param result: a list of dicts returned by Snakebite ls
-:param size: the file size in MB a file should be at least to trigger 
True
-:return: (bool) depending on the matching criteria
-"""
-if size:
-log = LoggingMixin().log
-log.debug(
-'Filtering for file size >= %s in files: %s',
-size, map(lambda x: x['path'], result)
-)
-size *= settings.MEGABYTE
-result = [x for x in result if x['length'] >= size]
-log.debug('HdfsSensor.poke: after size filter result is %s', 
result)
-return result
-
-@staticmethod
-def filter_for_ignored_ext(result, ignored_ext, ignore_copying):
-"""
-Will filter if instructed to do so the result to remove matching 
criteria
-
-:param result: (list) of dicts returned by Snakebite ls
-:param ignored_ext: (list) of ignored extensions
-:param ignore_copying: (bool) shall we ignore ?
-:return: (list) of dicts which were not removed
-"""
-if ignore_copying:
-log = LoggingMixin().log
-regex_builder = "^.*\.(%s$)$" % '$|'.join(ignored_ext)
-ignored_extentions_regex = re.compile(regex_builder)
-log.debug(
-'Filtering result for ignored extensions: %s in files %s',
-ignored_extentions_regex.pattern, map(lambda x: x['path'], 
result)
-)
-result = [x for x in result if not 
ignored_extentions_regex.match(x['path'])]
-log.debug('HdfsSensor.poke: after ext filter result is %s', result)
-return result
+def __init__(
+self,
+pattern,
+conn_id="hdfs_default",
+filters=None,
+min_size=None,
+ignore_exts=("_COPYING_",),
+**kwargs
+):
+super(HdfsFileSensor, self).__init__(**kwargs)
+
+# Min-size and ignore-ext filters are added via
+# arguments for backwards compatibility.
+filters = list(filters or [])
+
+if min_size:
+filters.append(SizeFilter(min_size=min_size))
+
+if ignore_exts:
+filters.append(ExtFilter(exts=ignore_exts))
+
+self._pattern = pattern
+self._conn_id = conn_id
+

[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463485
 
 

 ##
 File path: tests/contrib/sensors/test_hdfs_sensor.py
 ##
 @@ -7,247 +7,199 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import logging
-import unittest
 
+import datetime as dt
 import re
-from datetime import timedelta
+import unittest
+import warnings
 
-from airflow.contrib.sensors.hdfs_sensor import HdfsSensorFolder, 
HdfsSensorRegex
-from airflow.exceptions import AirflowSensorTimeout
+from airflow import models
+from airflow.contrib.sensors.hdfs_sensor import (HdfsSensorFolder,
+ HdfsSensorRegex,
+ HdfsRegexFileSensor)
 
+from tests.sensors.test_hdfs_sensor import MockHdfs3Client
+
+
+class HdfsRegexFileSensorTests(unittest.TestCase):
+"""Tests for the HdfsRegexFileSensor class."""
 
-class HdfsSensorFolderTests(unittest.TestCase):
-def setUp(self):
-from tests.core import FakeHDFSHook
-self.hook = FakeHDFSHook
-self.log = logging.getLogger()
-self.log.setLevel(logging.DEBUG)
-
-def test_should_be_empty_directory(self):
-"""
-test the empty directory behaviour
-:return:
-"""
-# Given
-self.log.debug('#' * 10)
-self.log.debug('Running %s', self._testMethodName)
-self.log.debug('#' * 10)
-task = HdfsSensorFolder(task_id='Should_be_empty_directory',
-filepath='/datadirectory/empty_directory',
-be_empty=True,
-timeout=1,
-retry_delay=timedelta(seconds=1),
-poke_interval=1,
-hook=self.hook)
-
-# When
-task.execute(None)
-
-# Then
-# Nothing happens, nothing is raised exec is ok
-
-def test_should_be_empty_directory_fail(self):
-"""
-test the empty directory behaviour
-:return:
-"""
-# Given
-self.log.debug('#' * 10)
-self.log.debug('Running %s', self._testMethodName)
-self.log.debug('#' * 10)
-task = HdfsSensorFolder(task_id='Should_be_empty_directory_fail',
-filepath='/datadirectory/not_empty_directory',
-be_empty=True,
-timeout=1,
-retry_delay=timedelta(seconds=1),
-poke_interval=1,
-hook=self.hook)
-
-# When
-# Then
-with self.assertRaises(AirflowSensorTimeout):
-task.execute(None)
-
-def test_should_be_a_non_empty_directory(self):
-"""
-test the empty directory behaviour
-:return:
-"""
-# Given
-self.log.debug('#' * 10)
-self.log.debug('Running %s', self._testMethodName)
-self.log.debug('#' * 10)
-task = HdfsSensorFolder(task_id='Should_be_non_empty_directory',
-filepath='/datadirectory/not_empty_directory',
-timeout=1,
-retry_delay=timedelta(seconds=1),
-poke_interval=1,
-hook=self.hook)
-
-# When
-task.execute(None)
-
-# Then
-# Nothing happens, nothing is raised exec is ok
-
-def test_should_be_non_empty_directory_fail(self):
-"""
-test the empty directory behaviour
-:return:
-"""
-# Given
-self.log.debug('#' * 10)
-self.log.debug('Running %s', self._testMethodName)
-self.log.debug('#' * 10)
-task = HdfsSensorFolder(task_id='Should_be_empty_directory_fail',
-filepath='/datadirectory/empty_directory',
-timeout=1,
-retry_delay=timedelta(seconds=1),
-poke_interval=1,
-hook=self.hook)
-
-# When
-# Then
-with self.assertRaises(AirflowSensorTimeout):
-task.execute(None)
-
-
-class HdfsSensorRegexTests(unittest.TestCase):
 def 

[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463504
 
 

 ##
 File path: tests/sensors/test_hdfs_sensor.py
 ##
 @@ -7,85 +7,368 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-import unittest
 
 from datetime import timedelta
+import fnmatch
+import posixpath
+import unittest
+
+import mock
+
+from airflow import models
+from airflow.sensors.hdfs_sensor import HdfsFileSensor, HdfsFolderSensor, 
HdfsHook
+
+
+class MockHdfs3Client(object):
+"""Mock hdfs3 client for testing purposes."""
+
+def __init__(self, file_details):
+self._file_details = {entry["name"]: entry for entry in file_details}
+
+@classmethod
+def from_file_details(cls, file_details, test_instance, 
conn_id="hdfs_default"):
+"""Builds mock hdsf3 client using file_details."""
+
+mock_client = cls(file_details)
+mock_params = models.Connection(conn_id=conn_id)
+
+# Setup mock for get_connection.
+patcher = mock.patch.object(
+HdfsHook, "get_connection", return_value=mock_params
+)
+test_instance.addCleanup(patcher.stop)
+patcher.start()
+
+# Setup mock for get_conn.
+patcher = mock.patch.object(HdfsHook, "get_conn", 
return_value=mock_client)
+test_instance.addCleanup(patcher.stop)
+patcher.start()
+
+return mock_client, mock_params
+
+def glob(self, pattern):
+"""Returns glob of files matching pattern."""
+
+# Implements a non-recursive glob on file names.
+pattern_dir = posixpath.dirname(pattern)
+pattern_base = posixpath.basename(pattern)
+
+# Ensure pattern_dir ends with '/'.
+pattern_dir = posixpath.join(pattern_dir, "")
+
+for file_path in self._file_details.keys():
+file_basename = posixpath.basename(file_path)
+if file_path.startswith(pattern_dir) and \
+fnmatch.fnmatch(file_basename, pattern_base):
+yield file_path
+
+def isdir(self, path):
+"""Returns true if path is a directory."""
+
+try:
+details = self._file_details[path]
+except KeyError:
+raise IOError()
 
-from airflow import configuration
-from airflow.exceptions import AirflowSensorTimeout
-from airflow.sensors.hdfs_sensor import HdfsSensor
-from airflow.utils.timezone import datetime
-from tests.core import FakeHDFSHook
+return details["kind"] == "directory"
 
-configuration.load_test_config()
+def info(self, path):
+"""Returns info for given file path."""
 
-DEFAULT_DATE = datetime(2015, 1, 1)
-TEST_DAG_ID = 'unit_test_dag'
+try:
+return self._file_details[path]
+except KeyError:
+raise IOError()
 
 
-class HdfsSensorTests(unittest.TestCase):
+class HdfsFileSensorTests(unittest.TestCase):
+"""Tests for the HdfsFileSensor class."""
 
 def setUp(self):
-self.hook = FakeHDFSHook
+file_details = [
+{"kind": "directory", "name": "/data/empty", "size": 0},
+{"kind": "directory", "name": "/data/not_empty", "size": 0},
+{"kind": "file", "name": "/data/not_empty/small.txt", "size": 10},
+{"kind": "file", "name": "/data/not_empty/large.txt", "size": 
1000},
+{"kind": "file", "name": "/data/not_empty/file.txt._COPYING_"},
+]
 
-def test_legacy_file_exist(self):
-"""
-Test the legacy behaviour
-:return:
-"""
-# When
-task = HdfsSensor(task_id='Should_be_file_legacy',
-  filepath='/datadirectory/datafile',
-  timeout=1,
-  retry_delay=timedelta(seconds=1),
-  poke_interval=1,
-  hook=self.hook)
-task.execute(None)
-
-# Then
-# Nothing happens, nothing is raised exec is ok
-
-def test_legacy_file_exist_but_filesize(self):
+self._mock_client, self._mock_params = 
MockHdfs3Client.from_file_details(
+file_details, test_instance=self
+)
+
+self._default_task_kws = {
+"timeout": 1,
+"retry_delay": timedelta(seconds=1),
+"poke_interval": 1,
+}
+
+def test_existing_file(self):
+"""Tests poking 

[GitHub] Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken 
RTD env 
URL: https://github.com/apache/incubator-airflow/pull/3703#discussion_r209463769
 
 

 ##
 File path: setup.py
 ##
 @@ -161,6 +164,7 @@ def write_version(filename=os.path.join(*['airflow',
 databricks = ['requests>=2.5.1, <3']
 datadog = ['datadog>=0.14.0']
 doc = [
+'mock',
 
 Review comment:
   Just learned something, thanks @tedmiston !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] anne-chu opened a new pull request #3742: [AIRFLOW-2892] Add CC, BCC for SLA/Retry/Failure Emailing

2018-08-12 Thread GitBox
anne-chu opened a new pull request #3742: [AIRFLOW-2892] Add CC,BCC for 
SLA/Retry/Failure Emailing
URL: https://github.com/apache/incubator-airflow/pull/3742
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2892
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Add CC,BCC for SLA/Retry/Failure Emailing. Currently, Airflow only supports 
"TO", no "CC" or "BCC".
   
   This may be useful in some business scenarios.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2892) Add CC for SLA/Retry/Failure Email Notification

2018-08-12 Thread Anne (JIRA)
Anne created AIRFLOW-2892:
-

 Summary: Add CC for SLA/Retry/Failure Email Notification
 Key: AIRFLOW-2892
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2892
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Anne
Assignee: Anne






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on a change in pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3741: [AIRFLOW-1368] Add 
auto_remove for DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3741#discussion_r209462359
 
 

 ##
 File path: airflow/operators/docker_operator.py
 ##
 @@ -115,6 +117,7 @@ def __init__(
 force_pull=False,
 mem_limit=None,
 network_mode=None,
+auto_remove=False,
 
 Review comment:
   Could you add the argument at the very end? This is to minimize the impact, 
and keep backward compatibility as much as possible :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook

2018-08-12 Thread GitBox
Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook
URL: 
https://github.com/apache/incubator-airflow/pull/3475#issuecomment-412361667
 
 
   @jbacon can you also squash your commits? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3475: [AIRFLOW-2315] Improve S3Hook

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3475: [AIRFLOW-2315] Improve S3Hook
URL: https://github.com/apache/incubator-airflow/pull/3475#discussion_r209463662
 
 

 ##
 File path: airflow/hooks/S3_hook.py
 ##
 @@ -275,7 +275,8 @@ def load_file(self,
   key,
   bucket_name=None,
   replace=False,
-  encrypt=False):
+  encrypt=False,
+  upload_args={}):
 
 Review comment:
   This can be quite dangerous indeed if you're not familiar with its 
behaviour: 
   ```
   MacBook-Pro-van-Fokko:~ fokkodriesprong$ python3
   Python 3.7.0 (default, Jun 29 2018, 20:13:13) 
   [Clang 9.1.0 (clang-902.0.39.2)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> def some_function(a={}):
   ... return a
   ... 
   >>> some_function()
   {}
   >>> some_function()['lid'] = True
   >>> some_function()
   {'lid': True}
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-1368) Automatically remove the container when it exits

2018-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577586#comment-16577586
 ] 

ASF GitHub Bot commented on AIRFLOW-1368:
-

Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for 
DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3741
 
 
   Dear Airflow maintainers,
   
   Please take this PR into consideration. It adds the ability for containers 
created by the `DockerOperator` to be removed automatically after execution. 
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1368
   
   Other issues which seem to be duplicate:
   - https://issues.apache.org/jira/browse/AIRFLOW-516
   - https://issues.apache.org/jira/browse/AIRFLOW-465
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR adds back the `auto_remove` option to the `DockerOperator`, which 
will automatically delete the Docker container once it exits. This was added 
before in PR #2411, but something went wrong there and it somehow got lost.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   I have changed one of the unit tests to work with the change, ran the test 
cases for the docker_operator, and testing airflow locally to check the docker 
container gets removed if `auto_remove=True` and not removed if 
`auto_remove=False`.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   I updated the doc comment.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Automatically remove the container when it exits
> 
>
> Key: AIRFLOW-1368
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1368
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docker, operators
> Environment: MacOS Sierra Version 10.12.5
> Docker Community Edition Version 17.06.0-ce-mac18 Stable Channel
> Docker Base Image: puckel/docker-airflow
>Reporter: Nathaniel Varona
>Assignee: Nathaniel Varona
>Priority: Major
>  Labels: docker
> Fix For: 1.9.0, 1.8.3
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A container should automatically remove when it exits for short-term 
> foreground processes.
> Manual Example:
> {{$ docker run -i --rm busybox echo 'Hello World!'}}
> {{> Hello World!}}
> {{$ docker ps -a}}
> Command output should have a list of clean running processes without having 
> an {{Exited (0)}} status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator

2018-08-12 Thread GitBox
Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for 
DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3741
 
 
   Dear Airflow maintainers,
   
   Please take this PR into consideration. It adds the ability for containers 
created by the `DockerOperator` to be removed automatically after execution. 
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1368
   
   Other issues which seem to be duplicate:
   - https://issues.apache.org/jira/browse/AIRFLOW-516
   - https://issues.apache.org/jira/browse/AIRFLOW-465
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR adds back the `auto_remove` option to the `DockerOperator`, which 
will automatically delete the Docker container once it exits. This was added 
before in PR #2411, but something went wrong there and it somehow got lost.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   I have changed one of the unit tests to work with the change, ran the test 
cases for the docker_operator, and testing airflow locally to check the docker 
container gets removed if `auto_remove=True` and not removed if 
`auto_remove=False`.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   I updated the doc comment.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-12 Thread GitBox
Fokko commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412359813
 
 
   Are people using clusters of webservers? I've never seen such a setup to be 
honest.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-08-12 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2524.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2799) Filtering UI objects by datetime is broken

2018-08-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577794#comment-16577794
 ] 

ASF GitHub Bot commented on AIRFLOW-2799:
-

kevcampb opened a new pull request #3743: [AIRFLOW-2799] Filtering UI objects 
by datetime is broken
URL: https://github.com/apache/incubator-airflow/pull/3743
 
 
   This change treats the datetimes supplied by the UI as being in UTC. This 
only
   works correctly where the system is using UTC throughout, but is an 
improvement
   over the current behavior (crash).
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2799
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   Yes there is a description
   
   ### Tests
   
   No tests as this involves UI code. 
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   Not required
   
   ### Code Quality
   
   - [No] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   
   utils.py never passed flake8 in the first place


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filtering UI objects by datetime is broken 
> ---
>
> Key: AIRFLOW-2799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 2.0.0
> Environment: Debian Stretch, Python 3.5.3
>Reporter: Kevin Campbell
>Priority: Major
>
> On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via 
> the user interface is broken for datetime fields.
> Create a new installation
>  Create a test dag (example_bash_operator)
>  Start webserver and scheduler
>  Enable dag
> On web UI, go to Browse > Task Instances
>  Search for task instances with execution_date greater than 5 days ago
>  You will get an exception
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: wave.diffractive.io
> ---
> Traceback (most recent call last):
>   File 
> 

[GitHub] caddac edited a comment on issue #3684: [AIRFLOW-2840] - add update connections cli option

2018-08-12 Thread GitBox
caddac edited a comment on issue #3684: [AIRFLOW-2840] - add update connections 
cli option
URL: 
https://github.com/apache/incubator-airflow/pull/3684#issuecomment-412351673
 
 
   @ashb @bolkedebruin  I've been working on refactoring the connection api and 
wanted to run the implementation by you before going too far. My thought is in 
`api/common/experimental/` there will be 4 files: `add_connection.py`, 
`list_connection.py`, `delete_connection.py`, `update_connection.py` where the 
DB access will occur. These functions will take explicitly defined arguments 
based on what they are doing (not dicts or something) and return `Connections` 
(or in the case of delete, `conn_id`(s)). 
   
   Then each client (cli, local, json) would be responsible for parsing 
arguments and constructing meaningful print/return statements to the user (e.g. 
the cli client would formulate the message `Successfully added 
`conn_id`={conn_id} : {uri}` where as the json client would just return http 
200 and the connection as json). 
   
   thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY

2018-08-12 Thread GitBox
XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
URL: 
https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412386245
 
 
   @Fokko @feng-tao :
   If users are running a cluster for `webservers`, we can leave it to users to 
ensure configuration settings being homogeneous across the cluster (just like 
how users set `sql_alchemy_conn` or `broker_url`).
   
   In a more common scenario, users have `webserver` on a single node (starting 
multiple processes as multiple workers), the solution in this PR should be able 
to improve the security in terms of SECRET_KEY without `CSRF` issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2894) Allow Users to "bake-in" DAGs in Airflow images

2018-08-12 Thread Daniel Imberman (JIRA)
Daniel Imberman created AIRFLOW-2894:


 Summary: Allow Users to "bake-in" DAGs in Airflow images
 Key: AIRFLOW-2894
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2894
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Daniel Imberman
Assignee: Daniel Imberman


Multiple Users have asked that we offer the ability to have DAGs baked in to 
their airflow images at launch (as opposed to using git-mode or a volume 
claim). This will save start-up time and allow for versioned DAGs via docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2893) Stuck dataflow job due to jobName mismatch.

2018-08-12 Thread Feng Lu (JIRA)
Feng Lu created AIRFLOW-2893:


 Summary: Stuck dataflow job due to jobName mismatch.
 Key: AIRFLOW-2893
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2893
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Affects Versions: 1.10.0, 1.9.1
Reporter: Feng Lu
Assignee: Feng Lu
 Fix For: 2.0.0


Depending on which version of dataflow SDK is used, the jobName supplied in the 
gcp_dataflow_hook may or may not be the one accepted by the cloud dataflow 
service. As a result, the dataflow job polling will run forever as it couldn't 
find a matched job_id for the gvien job_name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-12 Thread GitBox
bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and 
bash from task launch
URL: 
https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412364062
 
 
   @Fokko backwards compatibility isn’t really required I think. I haven’t 
heard about executors in the wild. A real bash task runner should (maybe it is 
not even required) join the args into a string but I don’t see a need for a 
bash runner tbh


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-2891) Option to set container_name for DockerOperator

2018-08-12 Thread Wouter Menninga (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wouter Menninga updated AIRFLOW-2891:
-
Description: It would be nice if the name of a Docker container created 
using the {{DockerOperator}} could be specified by passing it as an argument to 
{{DockerOperator}}, instead of having Docker generate its random name for the 
container.  (was: It would be nice if the name of a Docker container created 
using the {{DockerOperator}} could be specified by passing it as an argument to 
{{DockerOperator}}, instead of having Docker generate it's random name for the 
container.)

> Option to set container_name for DockerOperator
> ---
>
> Key: AIRFLOW-2891
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2891
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docker, operators
>Reporter: Wouter Menninga
>Priority: Minor
>
> It would be nice if the name of a Docker container created using the 
> {{DockerOperator}} could be specified by passing it as an argument to 
> {{DockerOperator}}, instead of having Docker generate its random name for the 
> container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2891) Option to set container_name for DockerOperator

2018-08-12 Thread Wouter Menninga (JIRA)
Wouter Menninga created AIRFLOW-2891:


 Summary: Option to set container_name for DockerOperator
 Key: AIRFLOW-2891
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2891
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docker, operators
Reporter: Wouter Menninga


It would be nice if the name of a Docker container created using the 
{{DockerOperator}} could be specified by passing it as an argument to 
{{DockerOperator}}, instead of having Docker generate it's random name for the 
container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] caddac commented on issue #3684: [AIRFLOW-2840] - add update connections cli option

2018-08-12 Thread GitBox
caddac commented on issue #3684: [AIRFLOW-2840] - add update connections cli 
option
URL: 
https://github.com/apache/incubator-airflow/pull/3684#issuecomment-412351673
 
 
   @ashb I've been working on refactoring the connection api and wanted to run 
the implementation by you before going too far. My thought is in 
`api/common/experimental/` there will be 4 files: `add_connection.py`, 
`list_connection.py`, `delete_connection.py`, `update_connection.py` where the 
DB access will occur. These functions will take explicitly defined arguments 
based on what they are doing (not dicts or something) and return `Connections` 
(or in the case of delete, `conn_id`(s)). 
   
   Then each client (cli, local, json) would be responsible for parsing 
arguments and constructing meaningful print/return statements to the user (e.g. 
the cli client would formulate the message `Successfully added 
`conn_id`={conn_id} : {uri}` where as the json client would just return http 
200 and the connection as json). 
   
   thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch

2018-08-12 Thread GitBox
Fokko commented on a change in pull request #3740: [AIRFLOW-2888] Remove 
shell=True and bash from task launch
URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r209462759
 
 

 ##
 File path: airflow/task/task_runner/base_task_runner.py
 ##
 @@ -106,7 +106,7 @@ def _read_task_logs(self, stream):
   self._task_instance.job_id, 
self._task_instance.task_id,
   line.rstrip('\n'))
 
-def run_command(self, run_with, join_args=False):
+def run_command(self, run_with=[], join_args=False):
 
 Review comment:
   This is a bad idea. @kaxil fixed some lately. But the `[]` is initialized 
just once in Python. Therefore if you change the object later on, it will be 
changed because it is by reference:
   
   ```
   MacBook-Pro-van-Fokko:~ fokkodriesprong$ python3
   Python 3.7.0 (default, Jun 29 2018, 20:13:13) 
   [Clang 9.1.0 (clang-902.0.39.2)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> def some_function(a=[]):
   ... return a
   ... 
   >>> 
   >>> b = some_function()
   >>> b += 'vo'
   >>> 
   >>> some_function()
   ['v', 'o']
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] linevahe commented on issue #2955: [AIRFLOW-1966] fix PasswordUser.password setter

2018-08-12 Thread GitBox
linevahe commented on issue #2955: [AIRFLOW-1966] fix PasswordUser.password 
setter
URL: 
https://github.com/apache/incubator-airflow/pull/2955#issuecomment-412324297
 
 
   @eschwartz Hello, I have the same problem as you. Could you tell me how you 
solved it
   
 / (  ()   )  \___
/( (  (  )   _))  )   )\
  (( (   )()  )   (   )  )
((/  ( _(   )   (   _) ) (  () )  )
   ( (  ( (_)   (((   )  .((_ ) .  )_
  ( (  )(  (  ))   ) . ) (   )
 (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
 ( (  (   ) (  )   (  )) ) _)(   )  )  )
( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
 (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
 ((  (   )(( _)   _) _(_ (  (_ )
  (_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
  ((__)\\||lll|l||///  \_))
   (   /(/ (  )  ) )\   )
 (( ( ( | | ) ) )\   )
  (   /(| / ( )) ) ) )) )
( ( _(|)_) )
 (  ||\(|(|)|/|| )
   (|(||(||))
 ( //|/l|||)|\\ \ )
   (/ / //  /|//\\  \ \  \ _)
   
---
   Node: ###
   
---
   Traceback (most recent call last):
 File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 
1988, in wsgi_app
   response = self.full_dispatch_request()
 File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 
1641, in full_dispatch_request
   rv = self.handle_user_exception(e)
 File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 
1544, in handle_user_exception
   reraise(exc_type, exc_value, tb)
 File "/home/helin/.local/lib/python3.6/site-packages/flask/_compat.py", 
line 33, in reraise
   raise value
 File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 
1639, in full_dispatch_request
   rv = self.dispatch_request()
 File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 
1625, in dispatch_request
   return self.view_functions[rule.endpoint](**req.view_args)
 File "/home/helin/.local/lib/python3.6/site-packages/flask_admin/base.py", 
line 69, in inner
   return self._run_view(f, *args, **kwargs)
 File "/home/helin/.local/lib/python3.6/site-packages/flask_admin/base.py", 
line 368, in _run_view
   return fn(self, *args, **kwargs)
 File 
"/home/helin/.local/lib/python3.6/site-packages/airflow/www/views.py", line 
650, in login
   return airflow.login.login(self, request)
 File 
"/home/helin/.local/lib/python3.6/site-packages/airflow/contrib/auth/backends/password_auth.py",
 line 137, in login
   if not user.authenticate(password):
 File 
"/home/helin/.local/lib/python3.6/site-packages/airflow/contrib/auth/backends/password_auth.py",
 line 68, in authenticate
   return check_password_hash(self._password, plaintext)
 File "/home/helin/.local/lib/python3.6/site-packages/flask_bcrypt.py", 
line 67, in check_password_hash
   return Bcrypt().check_password_hash(pw_hash, password)
 File "/home/helin/.local/lib/python3.6/site-packages/flask_bcrypt.py", 
line 193, in check_password_hash
   return safe_str_cmp(bcrypt.hashpw(password, pw_hash), pw_hash)
 File "/home/helin/.local/lib/python3.6/site-packages/bcrypt/__init__.py", 
line 87, in hashpw
   raise ValueError("Invalid salt")
   ValueError: Invalid salt


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ron819 commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor

2018-08-12 Thread GitBox
ron819 commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3702#issuecomment-412326321
 
 
   I must say that it would be nice if there will be a way that the "blackout" 
won't be Airflow wide.. For example if it would be possible to exclude specific 
pools that will continue to be scheduled.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-12 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2870.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)