[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577696#comment-16577696 ] ASF subversion and git services commented on AIRFLOW-2524: -- Commit 4d2f83b19af3489d6c9563d51210a3dab2f38b26 in incubator-airflow's branch refs/heads/master from Keliang Chen [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=4d2f83b ] [AIRFLOW-2524] Add Amazon SageMaker Training (#3658) Add SageMaker Hook, Training Operator & Sensor Co-authored-by: srrajeev-aws > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > Time Spent: 10m > Remaining Estimate: 0h > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/sagemaker_hook.py b/airflow/contrib/hooks/sagemaker_hook.py new file mode 100644 index 00..8b8e2e41e7 --- /dev/null +++ b/airflow/contrib/hooks/sagemaker_hook.py @@ -0,0 +1,241 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param training_config: training_config +:type training_config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param tuning_config: tuning_config +:type tuning_config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +sec = sec + self.check_interval + +if self.max_ingestion_time and sec > self.max_ingestion_time: +# ensure that the job gets killed if the max ingestion time is exceeded +raise
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577695#comment-16577695 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko closed pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/sagemaker_hook.py b/airflow/contrib/hooks/sagemaker_hook.py new file mode 100644 index 00..8b8e2e41e7 --- /dev/null +++ b/airflow/contrib/hooks/sagemaker_hook.py @@ -0,0 +1,241 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param training_config: training_config +:type training_config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param tuning_config: tuning_config +:type tuning_config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +
[GitHub] Fokko commented on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
Fokko commented on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#issuecomment-412360645 Thanks @troychen728. Looks good, thanks for the PR. Merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3 URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463470 ## File path: airflow/sensors/hdfs_sensor.py ## @@ -17,103 +17,231 @@ # specific language governing permissions and limitations # under the License. -import re -import sys -from builtins import str +import posixpath from airflow import settings -from airflow.hooks.hdfs_hook import HDFSHook +from airflow.hooks.hdfs_hook import HdfsHook from airflow.sensors.base_sensor_operator import BaseSensorOperator from airflow.utils.decorators import apply_defaults -from airflow.utils.log.logging_mixin import LoggingMixin -class HdfsSensor(BaseSensorOperator): -""" -Waits for a file or folder to land in HDFS +class HdfsFileSensor(BaseSensorOperator): +"""Sensor that waits for files matching a specific (glob) pattern to land in HDFS. + +:param str file_pattern: Glob pattern to match. +:param str conn_id: Connection to use. +:param Iterable[FilePathFilter] filters: Optional list of filters that can be +used to apply further filtering to any file paths matching the glob pattern. +Any files that fail a filter are dropped from consideration. +:param int min_size: Minimum size (in MB) for files to be considered. Can be used +to filter any intermediate files that are below the expected file size. +:param Set[str] ignore_exts: File extensions to ignore. By default, files with +a '_COPYING_' extension are ignored, as these represent temporary files. """ -template_fields = ('filepath',) -ui_color = settings.WEB_COLORS['LIGHTBLUE'] + +template_fields = ("_pattern",) +ui_color = settings.WEB_COLORS["LIGHTBLUE"] @apply_defaults -def __init__(self, - filepath, - hdfs_conn_id='hdfs_default', - ignored_ext=None, - ignore_copying=True, - file_size=None, - hook=HDFSHook, - *args, - **kwargs): -super(HdfsSensor, self).__init__(*args, **kwargs) -if ignored_ext is None: -ignored_ext = ['_COPYING_'] -self.filepath = filepath -self.hdfs_conn_id = hdfs_conn_id -self.file_size = file_size -self.ignored_ext = ignored_ext -self.ignore_copying = ignore_copying -self.hook = hook - -@staticmethod -def filter_for_filesize(result, size=None): -""" -Will test the filepath result and test if its size is at least self.filesize - -:param result: a list of dicts returned by Snakebite ls -:param size: the file size in MB a file should be at least to trigger True -:return: (bool) depending on the matching criteria -""" -if size: -log = LoggingMixin().log -log.debug( -'Filtering for file size >= %s in files: %s', -size, map(lambda x: x['path'], result) -) -size *= settings.MEGABYTE -result = [x for x in result if x['length'] >= size] -log.debug('HdfsSensor.poke: after size filter result is %s', result) -return result - -@staticmethod -def filter_for_ignored_ext(result, ignored_ext, ignore_copying): -""" -Will filter if instructed to do so the result to remove matching criteria - -:param result: (list) of dicts returned by Snakebite ls -:param ignored_ext: (list) of ignored extensions -:param ignore_copying: (bool) shall we ignore ? -:return: (list) of dicts which were not removed -""" -if ignore_copying: -log = LoggingMixin().log -regex_builder = "^.*\.(%s$)$" % '$|'.join(ignored_ext) -ignored_extentions_regex = re.compile(regex_builder) -log.debug( -'Filtering result for ignored extensions: %s in files %s', -ignored_extentions_regex.pattern, map(lambda x: x['path'], result) -) -result = [x for x in result if not ignored_extentions_regex.match(x['path'])] -log.debug('HdfsSensor.poke: after ext filter result is %s', result) -return result +def __init__( +self, +pattern, +conn_id="hdfs_default", +filters=None, +min_size=None, +ignore_exts=("_COPYING_",), +**kwargs +): +super(HdfsFileSensor, self).__init__(**kwargs) + +# Min-size and ignore-ext filters are added via +# arguments for backwards compatibility. +filters = list(filters or []) + +if min_size: +filters.append(SizeFilter(min_size=min_size)) + +if ignore_exts: +filters.append(ExtFilter(exts=ignore_exts)) + +self._pattern = pattern +self._conn_id = conn_id +
[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3 URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463485 ## File path: tests/contrib/sensors/test_hdfs_sensor.py ## @@ -7,247 +7,199 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. -import logging -import unittest +import datetime as dt import re -from datetime import timedelta +import unittest +import warnings -from airflow.contrib.sensors.hdfs_sensor import HdfsSensorFolder, HdfsSensorRegex -from airflow.exceptions import AirflowSensorTimeout +from airflow import models +from airflow.contrib.sensors.hdfs_sensor import (HdfsSensorFolder, + HdfsSensorRegex, + HdfsRegexFileSensor) +from tests.sensors.test_hdfs_sensor import MockHdfs3Client + + +class HdfsRegexFileSensorTests(unittest.TestCase): +"""Tests for the HdfsRegexFileSensor class.""" -class HdfsSensorFolderTests(unittest.TestCase): -def setUp(self): -from tests.core import FakeHDFSHook -self.hook = FakeHDFSHook -self.log = logging.getLogger() -self.log.setLevel(logging.DEBUG) - -def test_should_be_empty_directory(self): -""" -test the empty directory behaviour -:return: -""" -# Given -self.log.debug('#' * 10) -self.log.debug('Running %s', self._testMethodName) -self.log.debug('#' * 10) -task = HdfsSensorFolder(task_id='Should_be_empty_directory', -filepath='/datadirectory/empty_directory', -be_empty=True, -timeout=1, -retry_delay=timedelta(seconds=1), -poke_interval=1, -hook=self.hook) - -# When -task.execute(None) - -# Then -# Nothing happens, nothing is raised exec is ok - -def test_should_be_empty_directory_fail(self): -""" -test the empty directory behaviour -:return: -""" -# Given -self.log.debug('#' * 10) -self.log.debug('Running %s', self._testMethodName) -self.log.debug('#' * 10) -task = HdfsSensorFolder(task_id='Should_be_empty_directory_fail', -filepath='/datadirectory/not_empty_directory', -be_empty=True, -timeout=1, -retry_delay=timedelta(seconds=1), -poke_interval=1, -hook=self.hook) - -# When -# Then -with self.assertRaises(AirflowSensorTimeout): -task.execute(None) - -def test_should_be_a_non_empty_directory(self): -""" -test the empty directory behaviour -:return: -""" -# Given -self.log.debug('#' * 10) -self.log.debug('Running %s', self._testMethodName) -self.log.debug('#' * 10) -task = HdfsSensorFolder(task_id='Should_be_non_empty_directory', -filepath='/datadirectory/not_empty_directory', -timeout=1, -retry_delay=timedelta(seconds=1), -poke_interval=1, -hook=self.hook) - -# When -task.execute(None) - -# Then -# Nothing happens, nothing is raised exec is ok - -def test_should_be_non_empty_directory_fail(self): -""" -test the empty directory behaviour -:return: -""" -# Given -self.log.debug('#' * 10) -self.log.debug('Running %s', self._testMethodName) -self.log.debug('#' * 10) -task = HdfsSensorFolder(task_id='Should_be_empty_directory_fail', -filepath='/datadirectory/empty_directory', -timeout=1, -retry_delay=timedelta(seconds=1), -poke_interval=1, -hook=self.hook) - -# When -# Then -with self.assertRaises(AirflowSensorTimeout): -task.execute(None) - - -class HdfsSensorRegexTests(unittest.TestCase): def
[GitHub] Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3
Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3 URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r209463504 ## File path: tests/sensors/test_hdfs_sensor.py ## @@ -7,85 +7,368 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. -import unittest from datetime import timedelta +import fnmatch +import posixpath +import unittest + +import mock + +from airflow import models +from airflow.sensors.hdfs_sensor import HdfsFileSensor, HdfsFolderSensor, HdfsHook + + +class MockHdfs3Client(object): +"""Mock hdfs3 client for testing purposes.""" + +def __init__(self, file_details): +self._file_details = {entry["name"]: entry for entry in file_details} + +@classmethod +def from_file_details(cls, file_details, test_instance, conn_id="hdfs_default"): +"""Builds mock hdsf3 client using file_details.""" + +mock_client = cls(file_details) +mock_params = models.Connection(conn_id=conn_id) + +# Setup mock for get_connection. +patcher = mock.patch.object( +HdfsHook, "get_connection", return_value=mock_params +) +test_instance.addCleanup(patcher.stop) +patcher.start() + +# Setup mock for get_conn. +patcher = mock.patch.object(HdfsHook, "get_conn", return_value=mock_client) +test_instance.addCleanup(patcher.stop) +patcher.start() + +return mock_client, mock_params + +def glob(self, pattern): +"""Returns glob of files matching pattern.""" + +# Implements a non-recursive glob on file names. +pattern_dir = posixpath.dirname(pattern) +pattern_base = posixpath.basename(pattern) + +# Ensure pattern_dir ends with '/'. +pattern_dir = posixpath.join(pattern_dir, "") + +for file_path in self._file_details.keys(): +file_basename = posixpath.basename(file_path) +if file_path.startswith(pattern_dir) and \ +fnmatch.fnmatch(file_basename, pattern_base): +yield file_path + +def isdir(self, path): +"""Returns true if path is a directory.""" + +try: +details = self._file_details[path] +except KeyError: +raise IOError() -from airflow import configuration -from airflow.exceptions import AirflowSensorTimeout -from airflow.sensors.hdfs_sensor import HdfsSensor -from airflow.utils.timezone import datetime -from tests.core import FakeHDFSHook +return details["kind"] == "directory" -configuration.load_test_config() +def info(self, path): +"""Returns info for given file path.""" -DEFAULT_DATE = datetime(2015, 1, 1) -TEST_DAG_ID = 'unit_test_dag' +try: +return self._file_details[path] +except KeyError: +raise IOError() -class HdfsSensorTests(unittest.TestCase): +class HdfsFileSensorTests(unittest.TestCase): +"""Tests for the HdfsFileSensor class.""" def setUp(self): -self.hook = FakeHDFSHook +file_details = [ +{"kind": "directory", "name": "/data/empty", "size": 0}, +{"kind": "directory", "name": "/data/not_empty", "size": 0}, +{"kind": "file", "name": "/data/not_empty/small.txt", "size": 10}, +{"kind": "file", "name": "/data/not_empty/large.txt", "size": 1000}, +{"kind": "file", "name": "/data/not_empty/file.txt._COPYING_"}, +] -def test_legacy_file_exist(self): -""" -Test the legacy behaviour -:return: -""" -# When -task = HdfsSensor(task_id='Should_be_file_legacy', - filepath='/datadirectory/datafile', - timeout=1, - retry_delay=timedelta(seconds=1), - poke_interval=1, - hook=self.hook) -task.execute(None) - -# Then -# Nothing happens, nothing is raised exec is ok - -def test_legacy_file_exist_but_filesize(self): +self._mock_client, self._mock_params = MockHdfs3Client.from_file_details( +file_details, test_instance=self +) + +self._default_task_kws = { +"timeout": 1, +"retry_delay": timedelta(seconds=1), +"poke_interval": 1, +} + +def test_existing_file(self): +"""Tests poking
[GitHub] Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env
Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env URL: https://github.com/apache/incubator-airflow/pull/3703#discussion_r209463769 ## File path: setup.py ## @@ -161,6 +164,7 @@ def write_version(filename=os.path.join(*['airflow', databricks = ['requests>=2.5.1, <3'] datadog = ['datadog>=0.14.0'] doc = [ +'mock', Review comment: Just learned something, thanks @tedmiston ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anne-chu opened a new pull request #3742: [AIRFLOW-2892] Add CC, BCC for SLA/Retry/Failure Emailing
anne-chu opened a new pull request #3742: [AIRFLOW-2892] Add CC,BCC for SLA/Retry/Failure Emailing URL: https://github.com/apache/incubator-airflow/pull/3742 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2892 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Add CC,BCC for SLA/Retry/Failure Emailing. Currently, Airflow only supports "TO", no "CC" or "BCC". This may be useful in some business scenarios. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2892) Add CC for SLA/Retry/Failure Email Notification
Anne created AIRFLOW-2892: - Summary: Add CC for SLA/Retry/Failure Email Notification Key: AIRFLOW-2892 URL: https://issues.apache.org/jira/browse/AIRFLOW-2892 Project: Apache Airflow Issue Type: Improvement Reporter: Anne Assignee: Anne -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on a change in pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator
Fokko commented on a change in pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3741#discussion_r209462359 ## File path: airflow/operators/docker_operator.py ## @@ -115,6 +117,7 @@ def __init__( force_pull=False, mem_limit=None, network_mode=None, +auto_remove=False, Review comment: Could you add the argument at the very end? This is to minimize the impact, and keep backward compatibility as much as possible :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook
Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook URL: https://github.com/apache/incubator-airflow/pull/3475#issuecomment-412361667 @jbacon can you also squash your commits? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3475: [AIRFLOW-2315] Improve S3Hook
Fokko commented on a change in pull request #3475: [AIRFLOW-2315] Improve S3Hook URL: https://github.com/apache/incubator-airflow/pull/3475#discussion_r209463662 ## File path: airflow/hooks/S3_hook.py ## @@ -275,7 +275,8 @@ def load_file(self, key, bucket_name=None, replace=False, - encrypt=False): + encrypt=False, + upload_args={}): Review comment: This can be quite dangerous indeed if you're not familiar with its behaviour: ``` MacBook-Pro-van-Fokko:~ fokkodriesprong$ python3 Python 3.7.0 (default, Jun 29 2018, 20:13:13) [Clang 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def some_function(a={}): ... return a ... >>> some_function() {} >>> some_function()['lid'] = True >>> some_function() {'lid': True} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1368) Automatically remove the container when it exits
[ https://issues.apache.org/jira/browse/AIRFLOW-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577586#comment-16577586 ] ASF GitHub Bot commented on AIRFLOW-1368: - Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3741 Dear Airflow maintainers, Please take this PR into consideration. It adds the ability for containers created by the `DockerOperator` to be removed automatically after execution. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1368 Other issues which seem to be duplicate: - https://issues.apache.org/jira/browse/AIRFLOW-516 - https://issues.apache.org/jira/browse/AIRFLOW-465 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: This PR adds back the `auto_remove` option to the `DockerOperator`, which will automatically delete the Docker container once it exits. This was added before in PR #2411, but something went wrong there and it somehow got lost. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I have changed one of the unit tests to work with the change, ran the test cases for the docker_operator, and testing airflow locally to check the docker container gets removed if `auto_remove=True` and not removed if `auto_remove=False`. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. I updated the doc comment. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Automatically remove the container when it exits > > > Key: AIRFLOW-1368 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1368 > Project: Apache Airflow > Issue Type: Improvement > Components: docker, operators > Environment: MacOS Sierra Version 10.12.5 > Docker Community Edition Version 17.06.0-ce-mac18 Stable Channel > Docker Base Image: puckel/docker-airflow >Reporter: Nathaniel Varona >Assignee: Nathaniel Varona >Priority: Major > Labels: docker > Fix For: 1.9.0, 1.8.3 > > Original Estimate: 24h > Remaining Estimate: 24h > > A container should automatically remove when it exits for short-term > foreground processes. > Manual Example: > {{$ docker run -i --rm busybox echo 'Hello World!'}} > {{> Hello World!}} > {{$ docker ps -a}} > Command output should have a list of clean running processes without having > an {{Exited (0)}} status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator
Wouter-M opened a new pull request #3741: [AIRFLOW-1368] Add auto_remove for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3741 Dear Airflow maintainers, Please take this PR into consideration. It adds the ability for containers created by the `DockerOperator` to be removed automatically after execution. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1368 Other issues which seem to be duplicate: - https://issues.apache.org/jira/browse/AIRFLOW-516 - https://issues.apache.org/jira/browse/AIRFLOW-465 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: This PR adds back the `auto_remove` option to the `DockerOperator`, which will automatically delete the Docker container once it exits. This was added before in PR #2411, but something went wrong there and it somehow got lost. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I have changed one of the unit tests to work with the change, ran the test cases for the docker_operator, and testing airflow locally to check the docker container gets removed if `auto_remove=True` and not removed if `auto_remove=False`. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. I updated the doc comment. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
Fokko commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY URL: https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412359813 Are people using clusters of webservers? I've never seen such a setup to be honest. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2524. --- Resolution: Fixed Fix Version/s: 2.0.0 > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2799) Filtering UI objects by datetime is broken
[ https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577794#comment-16577794 ] ASF GitHub Bot commented on AIRFLOW-2799: - kevcampb opened a new pull request #3743: [AIRFLOW-2799] Filtering UI objects by datetime is broken URL: https://github.com/apache/incubator-airflow/pull/3743 This change treats the datetimes supplied by the UI as being in UTC. This only works correctly where the system is using UTC throughout, but is an improvement over the current behavior (crash). Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2799 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description Yes there is a description ### Tests No tests as this involves UI code. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. Not required ### Code Quality - [No] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` utils.py never passed flake8 in the first place This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Filtering UI objects by datetime is broken > --- > > Key: AIRFLOW-2799 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2799 > Project: Apache Airflow > Issue Type: Bug > Components: ui, webserver >Affects Versions: 2.0.0 > Environment: Debian Stretch, Python 3.5.3 >Reporter: Kevin Campbell >Priority: Major > > On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via > the user interface is broken for datetime fields. > Create a new installation > Create a test dag (example_bash_operator) > Start webserver and scheduler > Enable dag > On web UI, go to Browse > Task Instances > Search for task instances with execution_date greater than 5 days ago > You will get an exception > {code:java} > / ( () ) \___ > /( ( ( ) _)) ) )\ >(( ( )() ) ( ) ) > ((/ ( _( ) ( _) ) ( () ) ) > ( ( ( (_) ((( ) .((_ ) . )_ >( ( )( ( )) ) . ) ( ) > ( ( ( ( ) ( _ ( _) ). ) . ) ) ( ) > ( ( ( ) ( ) ( )) ) _)( ) ) ) > ( ( ( \ ) ((_ ( ) ( ) ) ) ) )) ( ) > ( ( ( ( (_ ( ) ( _) ) ( ) ) ) > ( ( ( ( ( ) (_ ) ) ) _) ) _( ( ) > (( ( )(( _) _) _(_ ( (_ ) >(_((__(_(__(( ( ( | ) ) ) )_))__))_)___) >((__)\\||lll|l||/// \_)) > ( /(/ ( ) ) )\ ) > (( ( ( | | ) ) )\ ) >( /(| / ( )) ) ) )) ) > ( ( _(|)_) ) > ( ||\(|(|)|/|| ) > (|(||(||)) > ( //|/l|||)|\\ \ ) > (/ / // /|//\\ \ \ \ _) > --- > Node: wave.diffractive.io > --- > Traceback (most recent call last): > File >
[GitHub] caddac edited a comment on issue #3684: [AIRFLOW-2840] - add update connections cli option
caddac edited a comment on issue #3684: [AIRFLOW-2840] - add update connections cli option URL: https://github.com/apache/incubator-airflow/pull/3684#issuecomment-412351673 @ashb @bolkedebruin I've been working on refactoring the connection api and wanted to run the implementation by you before going too far. My thought is in `api/common/experimental/` there will be 4 files: `add_connection.py`, `list_connection.py`, `delete_connection.py`, `update_connection.py` where the DB access will occur. These functions will take explicitly defined arguments based on what they are doing (not dicts or something) and return `Connections` (or in the case of delete, `conn_id`(s)). Then each client (cli, local, json) would be responsible for parsing arguments and constructing meaningful print/return statements to the user (e.g. the cli client would formulate the message `Successfully added `conn_id`={conn_id} : {uri}` where as the json client would just return http 200 and the connection as json). thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY
XD-DENG commented on issue #3738: [AIRFLOW-2886] Secure Flask SECRET_KEY URL: https://github.com/apache/incubator-airflow/pull/3738#issuecomment-412386245 @Fokko @feng-tao : If users are running a cluster for `webservers`, we can leave it to users to ensure configuration settings being homogeneous across the cluster (just like how users set `sql_alchemy_conn` or `broker_url`). In a more common scenario, users have `webserver` on a single node (starting multiple processes as multiple workers), the solution in this PR should be able to improve the security in terms of SECRET_KEY without `CSRF` issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2894) Allow Users to "bake-in" DAGs in Airflow images
Daniel Imberman created AIRFLOW-2894: Summary: Allow Users to "bake-in" DAGs in Airflow images Key: AIRFLOW-2894 URL: https://issues.apache.org/jira/browse/AIRFLOW-2894 Project: Apache Airflow Issue Type: New Feature Reporter: Daniel Imberman Assignee: Daniel Imberman Multiple Users have asked that we offer the ability to have DAGs baked in to their airflow images at launch (as opposed to using git-mode or a volume claim). This will save start-up time and allow for versioned DAGs via docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2893) Stuck dataflow job due to jobName mismatch.
Feng Lu created AIRFLOW-2893: Summary: Stuck dataflow job due to jobName mismatch. Key: AIRFLOW-2893 URL: https://issues.apache.org/jira/browse/AIRFLOW-2893 Project: Apache Airflow Issue Type: Bug Components: contrib Affects Versions: 1.10.0, 1.9.1 Reporter: Feng Lu Assignee: Feng Lu Fix For: 2.0.0 Depending on which version of dataflow SDK is used, the jobName supplied in the gcp_dataflow_hook may or may not be the one accepted by the cloud dataflow service. As a result, the dataflow job polling will run forever as it couldn't find a matched job_id for the gvien job_name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
bolkedebruin commented on issue #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#issuecomment-412364062 @Fokko backwards compatibility isn’t really required I think. I haven’t heard about executors in the wild. A real bash task runner should (maybe it is not even required) join the args into a string but I don’t see a need for a bash runner tbh This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2891) Option to set container_name for DockerOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wouter Menninga updated AIRFLOW-2891: - Description: It would be nice if the name of a Docker container created using the {{DockerOperator}} could be specified by passing it as an argument to {{DockerOperator}}, instead of having Docker generate its random name for the container. (was: It would be nice if the name of a Docker container created using the {{DockerOperator}} could be specified by passing it as an argument to {{DockerOperator}}, instead of having Docker generate it's random name for the container.) > Option to set container_name for DockerOperator > --- > > Key: AIRFLOW-2891 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2891 > Project: Apache Airflow > Issue Type: Improvement > Components: docker, operators >Reporter: Wouter Menninga >Priority: Minor > > It would be nice if the name of a Docker container created using the > {{DockerOperator}} could be specified by passing it as an argument to > {{DockerOperator}}, instead of having Docker generate its random name for the > container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2891) Option to set container_name for DockerOperator
Wouter Menninga created AIRFLOW-2891: Summary: Option to set container_name for DockerOperator Key: AIRFLOW-2891 URL: https://issues.apache.org/jira/browse/AIRFLOW-2891 Project: Apache Airflow Issue Type: Improvement Components: docker, operators Reporter: Wouter Menninga It would be nice if the name of a Docker container created using the {{DockerOperator}} could be specified by passing it as an argument to {{DockerOperator}}, instead of having Docker generate it's random name for the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] caddac commented on issue #3684: [AIRFLOW-2840] - add update connections cli option
caddac commented on issue #3684: [AIRFLOW-2840] - add update connections cli option URL: https://github.com/apache/incubator-airflow/pull/3684#issuecomment-412351673 @ashb I've been working on refactoring the connection api and wanted to run the implementation by you before going too far. My thought is in `api/common/experimental/` there will be 4 files: `add_connection.py`, `list_connection.py`, `delete_connection.py`, `update_connection.py` where the DB access will occur. These functions will take explicitly defined arguments based on what they are doing (not dicts or something) and return `Connections` (or in the case of delete, `conn_id`(s)). Then each client (cli, local, json) would be responsible for parsing arguments and constructing meaningful print/return statements to the user (e.g. the cli client would formulate the message `Successfully added `conn_id`={conn_id} : {uri}` where as the json client would just return http 200 and the connection as json). thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
Fokko commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r209462759 ## File path: airflow/task/task_runner/base_task_runner.py ## @@ -106,7 +106,7 @@ def _read_task_logs(self, stream): self._task_instance.job_id, self._task_instance.task_id, line.rstrip('\n')) -def run_command(self, run_with, join_args=False): +def run_command(self, run_with=[], join_args=False): Review comment: This is a bad idea. @kaxil fixed some lately. But the `[]` is initialized just once in Python. Therefore if you change the object later on, it will be changed because it is by reference: ``` MacBook-Pro-van-Fokko:~ fokkodriesprong$ python3 Python 3.7.0 (default, Jun 29 2018, 20:13:13) [Clang 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def some_function(a=[]): ... return a ... >>> >>> b = some_function() >>> b += 'vo' >>> >>> some_function() ['v', 'o'] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] linevahe commented on issue #2955: [AIRFLOW-1966] fix PasswordUser.password setter
linevahe commented on issue #2955: [AIRFLOW-1966] fix PasswordUser.password setter URL: https://github.com/apache/incubator-airflow/pull/2955#issuecomment-412324297 @eschwartz Hello, I have the same problem as you. Could you tell me how you solved it / ( () ) \___ /( ( ( ) _)) ) )\ (( ( )() ) ( ) ) ((/ ( _( ) ( _) ) ( () ) ) ( ( ( (_) ((( ) .((_ ) . )_ ( ( )( ( )) ) . ) ( ) ( ( ( ( ) ( _ ( _) ). ) . ) ) ( ) ( ( ( ) ( ) ( )) ) _)( ) ) ) ( ( ( \ ) ((_ ( ) ( ) ) ) ) )) ( ) ( ( ( ( (_ ( ) ( _) ) ( ) ) ) ( ( ( ( ( ) (_ ) ) ) _) ) _( ( ) (( ( )(( _) _) _(_ ( (_ ) (_((__(_(__(( ( ( | ) ) ) )_))__))_)___) ((__)\\||lll|l||/// \_)) ( /(/ ( ) ) )\ ) (( ( ( | | ) ) )\ ) ( /(| / ( )) ) ) )) ) ( ( _(|)_) ) ( ||\(|(|)|/|| ) (|(||(||)) ( //|/l|||)|\\ \ ) (/ / // /|//\\ \ \ \ _) --- Node: ### --- Traceback (most recent call last): File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 1988, in wsgi_app response = self.full_dispatch_request() File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 1641, in full_dispatch_request rv = self.handle_user_exception(e) File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 1544, in handle_user_exception reraise(exc_type, exc_value, tb) File "/home/helin/.local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise raise value File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/home/helin/.local/lib/python3.6/site-packages/flask/app.py", line 1625, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/home/helin/.local/lib/python3.6/site-packages/flask_admin/base.py", line 69, in inner return self._run_view(f, *args, **kwargs) File "/home/helin/.local/lib/python3.6/site-packages/flask_admin/base.py", line 368, in _run_view return fn(self, *args, **kwargs) File "/home/helin/.local/lib/python3.6/site-packages/airflow/www/views.py", line 650, in login return airflow.login.login(self, request) File "/home/helin/.local/lib/python3.6/site-packages/airflow/contrib/auth/backends/password_auth.py", line 137, in login if not user.authenticate(password): File "/home/helin/.local/lib/python3.6/site-packages/airflow/contrib/auth/backends/password_auth.py", line 68, in authenticate return check_password_hash(self._password, plaintext) File "/home/helin/.local/lib/python3.6/site-packages/flask_bcrypt.py", line 67, in check_password_hash return Bcrypt().check_password_hash(pw_hash, password) File "/home/helin/.local/lib/python3.6/site-packages/flask_bcrypt.py", line 193, in check_password_hash return safe_str_cmp(bcrypt.hashpw(password, pw_hash), pw_hash) File "/home/helin/.local/lib/python3.6/site-packages/bcrypt/__init__.py", line 87, in hashpw raise ValueError("Invalid salt") ValueError: Invalid salt This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
ron819 commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor URL: https://github.com/apache/incubator-airflow/pull/3702#issuecomment-412326321 I must say that it would be nice if there will be a way that the "blackout" won't be Airflow wide.. For example if it would be possible to exclude specific pools that will continue to be scheduled. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin closed AIRFLOW-2870. --- Resolution: Fixed Fix Version/s: 1.10.0 > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > Fix For: 1.10.0 > > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)