[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565368#comment-16565368 ] ASF GitHub Bot commented on AIRFLOW-2832: - tedmiston commented on issue #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409585654 @Fokko Thanks for the quick merge! I'll make a note to look into linting the bash code in Airflow and see if we have enough for a PR there. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565575#comment-16565575 ] ASF GitHub Bot commented on AIRFLOW-2658: - Noremac201 commented on issue #3532: [AIRFLOW-2658] Add GCP specific k8s pod operator URL: https://github.com/apache/incubator-airflow/pull/3532#issuecomment-409633871 Looks like Travis isn't posting, here's my personal Travis build: https://travis-ci.org/Noremac201/incubator-airflow/builds/410543165 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add GKE specific Kubernetes Pod Operator > > > Key: AIRFLOW-2658 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2658 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > > Currently there is a Kubernetes Pod operator, but it is not really easy to > have it work with GCP Kubernetes Engine, it would be nice to have one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2829) Brush up the CI script for minikube
[ https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565363#comment-16565363 ] ASF GitHub Bot commented on AIRFLOW-2829: - codecov-io commented on issue #3676: [AIRFLOW-2829] Brush up the CI script for minikube URL: https://github.com/apache/incubator-airflow/pull/3676#issuecomment-40958 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=h1) Report > Merging [#3676](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3676/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #3676 +/- ## == Coverage77.5% 77.5% == Files 205 205 Lines 15753 15753 == Hits12210 12210 Misses 35433543 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=footer). Last update [c37fc0b...bc5fa06](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Brush up the CI script for minikube > --- > > Key: AIRFLOW-2829 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2829 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > > Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found > some points that can be improved: > - minikube version is hard-coded > - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}} > - Undefined variables: {{$unameOut}} > - The following lines cause warnings if download is skipped: > {code} > 69 sudo mv bin/minikube /usr/local/bin/minikube > 70 sudo mv bin/kubectl /usr/local/bin/kubectl > {code} > - {{return}} s at line 81 and 96 won't work since it's outside of a function > - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See > https://github.com/kubernetes/minikube/issues/1883. > {code} > 105 _MINIKUBE="sudo PATH=$PATH minikube" > 106 > 107 $_MINIKUBE config set bootstrapper localkube > 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION} > --vm-driver=none > 109 $_MINIKUBE update-context > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567409#comment-16567409 ] ASF GitHub Bot commented on AIRFLOW-2658: - kaxil closed pull request #3532: [AIRFLOW-2658] Add GCP specific k8s pod operator URL: https://github.com/apache/incubator-airflow/pull/3532 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/gcp_container_operator.py b/airflow/contrib/operators/gcp_container_operator.py index 5648b4d8a0..615eac8a0f 100644 --- a/airflow/contrib/operators/gcp_container_operator.py +++ b/airflow/contrib/operators/gcp_container_operator.py @@ -17,8 +17,13 @@ # specific language governing permissions and limitations # under the License. # +import os +import subprocess +import tempfile + from airflow import AirflowException from airflow.contrib.hooks.gcp_container_hook import GKEClusterHook +from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults @@ -170,3 +175,147 @@ def execute(self, context): hook = GKEClusterHook(self.project_id, self.location) create_op = hook.create_cluster(cluster=self.body) return create_op + + +KUBE_CONFIG_ENV_VAR = "KUBECONFIG" +G_APP_CRED = "GOOGLE_APPLICATION_CREDENTIALS" + + +class GKEPodOperator(KubernetesPodOperator): +template_fields = ('project_id', 'location', + 'cluster_name') + KubernetesPodOperator.template_fields + +@apply_defaults +def __init__(self, + project_id, + location, + cluster_name, + gcp_conn_id='google_cloud_default', + *args, + **kwargs): +""" +Executes a task in a Kubernetes pod in the specified Google Kubernetes +Engine cluster + +This Operator assumes that the system has gcloud installed and either +has working default application credentials or has configured a +connection id with a service account. + +The **minimum** required to define a cluster to create are the variables +``task_id``, ``project_id``, ``location``, ``cluster_name``, ``name``, +``namespace``, and ``image`` + +**Operator Creation**: :: + +operator = GKEPodOperator(task_id='pod_op', + project_id='my-project', + location='us-central1-a', + cluster_name='my-cluster-name', + name='task-name', + namespace='default', + image='perl') + +.. seealso:: +For more detail about application authentication have a look at the reference: + https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application + +:param project_id: The Google Developers Console project id +:type project_id: str +:param location: The name of the Google Kubernetes Engine zone in which the +cluster resides, e.g. 'us-central1-a' +:type location: str +:param cluster_name: The name of the Google Kubernetes Engine cluster the pod +should be spawned in +:type cluster_name: str +:param gcp_conn_id: The google cloud connection id to use. This allows for +users to specify a service account. +:type gcp_conn_id: str +""" +super(GKEPodOperator, self).__init__(*args, **kwargs) +self.project_id = project_id +self.location = location +self.cluster_name = cluster_name +self.gcp_conn_id = gcp_conn_id + +def execute(self, context): +# Specifying a service account file allows the user to using non default +# authentication for creating a Kubernetes Pod. This is done by setting the +# environment variable `GOOGLE_APPLICATION_CREDENTIALS` that gcloud looks at. +key_file = None + +# If gcp_conn_id is not specified gcloud will use the default +# service account credentials. +if self.gcp_conn_id: +from airflow.hooks.base_hook import BaseHook +# extras is a deserialized json object +extras = BaseHook.get_connection(self.gcp_conn_id).extra_dejson +# key_file only gets set if a json file is created from a JSON string in +# the web ui, else none +key_file = self._set_env_from_extras(extras=extras) + +# Write config to a temp
[jira] [Commented] (AIRFLOW-2238) Update dev/airflow-pr to work with gitub for merge targets
[ https://issues.apache.org/jira/browse/AIRFLOW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567407#comment-16567407 ] ASF GitHub Bot commented on AIRFLOW-2238: - kaxil closed pull request #3680: [AIRFLOW-2238] Use SSH protocol for pushing to Github URL: https://github.com/apache/incubator-airflow/pull/3680 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update dev/airflow-pr to work with gitub for merge targets > -- > > Key: AIRFLOW-2238 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2238 > Project: Apache Airflow > Issue Type: Improvement > Components: PR tool >Reporter: Ash Berlin-Taylor >Priority: Major > > We are planning on migrating the to the Apache "GitBox" project which lets > committers work directly on github. This will mean we might not _need_ to use > the pr tool, but we should update it so that it merges and pushes back to > github, not the ASF repo. > I think we need to do this before we ask the ASF infra team to migrate our > repo over. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567954#comment-16567954 ] ASF GitHub Bot commented on AIRFLOW-2814: - kaxil closed pull request #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0 URL: https://github.com/apache/incubator-airflow/pull/3669 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2843) ExternalTaskSensor: Add option to cease waiting immediately if the external task doesn't exist
[ https://issues.apache.org/jira/browse/AIRFLOW-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568260#comment-16568260 ] ASF GitHub Bot commented on AIRFLOW-2843: - XD-DENG opened a new pull request #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists URL: https://github.com/apache/incubator-airflow/pull/3688 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2843 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Background `ExternalTaskSensor` will keep waiting (given restrictions of retries, poke_interval, etc), even if the external task specified doesn't exist at all. In some cases, this waiting may still make sense as new DAG may backfill. But it may be good to provide an option to cease waiting immediately if the external task specified doesn't exist. Proposal Provide an argument `check_existence`. Set to `True` to check if the external task exists, and immediately cease waiting if the external task does not exist. **The default value is set to `False` (no check or ceasing will happen), so it will not affect any existing DAGs or user expectation.** ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ExternalTaskSensor: Add option to cease waiting immediately if the external > task doesn't exist > -- > > Key: AIRFLOW-2843 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2843 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h2. Background > *ExternalTaskSensor* will keep waiting (given restrictions of retries, > poke_interval, etc), even if the external task specified doesn't exist at > all. In some cases, this waiting may still make sense as new DAG may backfill. > But it may be good to provide an option to cease waiting immediately if the > external task specified doesn't exist. > h2. Proposal > Provide an argument "check_existence". Set to *True* to check if the external > task exists, and immediately cease waiting if the external task does not > exist. > The default value is set to *False* (no check or ceasing will happen) so it > will not affect any existing DAGs or user expectation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2796) Improve code coverage for utils/helpers.py
[ https://issues.apache.org/jira/browse/AIRFLOW-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568205#comment-16568205 ] ASF GitHub Bot commented on AIRFLOW-2796: - Fokko closed pull request #3686: [AIRFLOW-2796] Expand code coverage for utils/helpers.py URL: https://github.com/apache/incubator-airflow/pull/3686 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py index 1005671e9e..b2e79560f4 100644 --- a/tests/utils/test_helpers.py +++ b/tests/utils/test_helpers.py @@ -117,5 +117,62 @@ def test_reduce_in_chunks(self): 14) +class HelpersTest(unittest.TestCase): +def test_as_tuple_iter(self): +test_list = ['test_str'] +as_tup = helpers.as_tuple(test_list) +self.assertTupleEqual(tuple(test_list), as_tup) + +def test_as_tuple_no_iter(self): +test_str = 'test_str' +as_tup = helpers.as_tuple(test_str) +self.assertTupleEqual((test_str,), as_tup) + +def test_is_in(self): +from airflow.utils import helpers +# `is_in` expects an object, and a list as input + +test_dict = {'test': 1} +test_list = ['test', 1, dict()] +small_i = 3 +big_i = 2 ** 31 +test_str = 'test_str' +test_tup = ('test', 'tuple') + +test_container = [test_dict, test_list, small_i, big_i, test_str, test_tup] + +# Test that integers are referenced as the same object +self.assertTrue(helpers.is_in(small_i, test_container)) +self.assertTrue(helpers.is_in(3, test_container)) + +# python caches small integers, so i is 3 will be True, +# but `big_i is 2 ** 31` is False. +self.assertTrue(helpers.is_in(big_i, test_container)) +self.assertFalse(helpers.is_in(2 ** 31, test_container)) + +self.assertTrue(helpers.is_in(test_dict, test_container)) +self.assertFalse(helpers.is_in({'test': 1}, test_container)) + +self.assertTrue(helpers.is_in(test_list, test_container)) +self.assertFalse(helpers.is_in(['test', 1, dict()], test_container)) + +self.assertTrue(helpers.is_in(test_str, test_container)) +self.assertTrue(helpers.is_in('test_str', test_container)) +bad_str = 'test_' +bad_str += 'str' +self.assertFalse(helpers.is_in(bad_str, test_container)) + +self.assertTrue(helpers.is_in(test_tup, test_container)) +self.assertFalse(helpers.is_in(('test', 'tuple'), test_container)) +bad_tup = ('test', 'tuple', 'hello') +self.assertFalse(helpers.is_in(bad_tup[:2], test_container)) + +def test_is_container(self): +self.assertTrue(helpers.is_container(['test_list'])) +self.assertFalse(helpers.is_container('test_str_not_iterable')) +# Pass an object that is not iter nor a string. +self.assertFalse(helpers.is_container(10)) + + if __name__ == '__main__': unittest.main() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve code coverage for utils/helpers.py > -- > > Key: AIRFLOW-2796 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2796 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andy Cooper >Priority: Trivial > > Improve code coverage by adding tests for > * is_container > * is_in > * as_tuple -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568223#comment-16568223 ] ASF GitHub Bot commented on AIRFLOW-2836: - Fokko closed pull request #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor URL: https://github.com/apache/incubator-airflow/pull/3674 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/sensors/file_sensor.py b/airflow/contrib/sensors/file_sensor.py index 3f7bb24e08..3e49abdfb5 100644 --- a/airflow/contrib/sensors/file_sensor.py +++ b/airflow/contrib/sensors/file_sensor.py @@ -46,7 +46,7 @@ class FileSensor(BaseSensorOperator): @apply_defaults def __init__(self, filepath, - fs_conn_id='fs_default2', + fs_conn_id='fs_default', *args, **kwargs): super(FileSensor, self).__init__(*args, **kwargs) @@ -56,7 +56,7 @@ def __init__(self, def poke(self, context): hook = FSHook(self.fs_conn_id) basepath = hook.get_path() -full_path = "/".join([basepath, self.filepath]) +full_path = os.path.join(basepath, self.filepath) self.log.info('Poking for file {full_path}'.format(**locals())) try: if stat.S_ISDIR(os.stat(full_path).st_mode): diff --git a/tests/contrib/sensors/test_file_sensor.py b/tests/contrib/sensors/test_file_sensor.py index d78400e317..0bb0007c60 100644 --- a/tests/contrib/sensors/test_file_sensor.py +++ b/tests/contrib/sensors/test_file_sensor.py @@ -125,6 +125,18 @@ def test_file_in_dir(self): finally: shutil.rmtree(dir) +def test_default_fs_conn_id(self): +with tempfile.NamedTemporaryFile() as tmp: +task = FileSensor( +task_id="test", +filepath=tmp.name[1:], +dag=self.dag, +timeout=0, +) +task._hook = self.hook +task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, + ignore_ti_state=True) + if __name__ == '__main__': unittest.main() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Minor improvement of contrib.sensors.FileSensor > --- > > Key: AIRFLOW-2836 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2836 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h4. *Background* > The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. > However, when we initiate the database > (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), > there isn't such an entry. It doesn't exist anywhere else. > h4. *Issue* > The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file > system (of course can also be used for NAS). Then the path ("/") from default > connection 'fs_default' would suffice. > However, given the default value for *fs_conn_id* in > contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will > make the situation much more complex. > When users intend to check local file system only, they should be able to > leave *fs_conn_id* default directly, instead of going setting up another > connection separately. > h4. Proposal > Change default value for *fs_conn_id* in contrib.sensors.FileSensor from > "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* > are all specified to be "fs_default"). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562270#comment-16562270 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273134 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,177 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_is is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + *args, **kwargs): +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +super(SageMakerHook, self).__init__(*args, **kwargs) Review comment: You are right, Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562269#comment-16562269 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273081 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,177 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_is is required for using Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562273#comment-16562273 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273440 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,177 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_is is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, Review comment: No it doesn't. Its only used if user want to use config stored in db. Sagemaker hook still uses aws_conn_id to get credentials. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562276#comment-16562276 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273826 ## File path: airflow/contrib/sensors/sagemaker_base_sensor.py ## @@ -0,0 +1,63 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerBaseSensor(BaseSensorOperator): +""" +Contains general sensor behavior for SageMaker. +Subclasses should implement get_emr_response() and state_from_response() methods. +Subclasses should also implement NON_TERMINAL_STATES and FAILED_STATE constants. Review comment: I replaced the constant with a method that raises an error if not implemented. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562458#comment-16562458 ] ASF GitHub Bot commented on AIRFLOW-2825: - feng-tao commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to case URL: https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408999062 could you add a test? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase > ext in S3 > --- > > Key: AIRFLOW-2825 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2825 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > Because upper/lower case was not considered in the extension check, > S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is > not a GZIP file and raise exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562468#comment-16562468 ] ASF GitHub Bot commented on AIRFLOW-2803: - codecov-io edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-408503531 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=h1) Report > Merging [#3656](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/a338f3276835af45765d24a6e6d43ad4ba4d66ba?src=pr=desc) will **increase** coverage by `0.39%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3656/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3656 +/- ## == + Coverage 77.12% 77.51% +0.39% == Files 206 205 -1 Lines 1577215751 -21 == + Hits1216412210 +46 + Misses 3608 3541 -67 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `99.01% <0%> (-0.99%)` | :arrow_down: | | [airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=) | | | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.58% <0%> (+0.04%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `89.87% <0%> (+0.42%)` | :arrow_up: | | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `100% <0%> (+100%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=footer). Last update [a338f32...b65388a](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot
[ https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562663#comment-16562663 ] ASF GitHub Bot commented on AIRFLOW-2670: - codecov-io commented on issue #3666: [AIRFLOW-2670] Update SSH Operator's Hook to respect timeout URL: https://github.com/apache/incubator-airflow/pull/3666#issuecomment-409045376 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=h1) Report > Merging [#3666](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3666/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3666 +/- ## === Coverage 77.51% 77.51% === Files 205 205 Lines 1575115751 === Hits1221012210 Misses 3541 3541 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=footer). Last update [dfa7b26...42b907c](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SSHOperator's timeout parameter doesn't affect SSHook timeoot > - > > Key: AIRFLOW-2670 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2670 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: Airflow 2.0 >Reporter: jin zhang >Priority: Major > > when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook > and it's just effect exce_command. > old version: > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id) > I change it to : > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot
[ https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562541#comment-16562541 ] ASF GitHub Bot commented on AIRFLOW-2670: - Noremac201 opened a new pull request #3666: [AIRFLOW-2670] Update SSH Operator's Hook to respect timeout URL: https://github.com/apache/incubator-airflow/pull/3666 ### JIRA - [x] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2670 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Previously the SSH operator was not respecting the passed in timeout to the operator. Changed the Operator to pass the timeout to hook, as well as add a test to make sure the hook is being created correctly. Extension of #3553, mistakenly closed after I thought it was fixed elsewhere. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SSHOperator's timeout parameter doesn't affect SSHook timeoot > - > > Key: AIRFLOW-2670 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2670 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: Airflow 2.0 >Reporter: jin zhang >Priority: Major > > when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook > and it's just effect exce_command. > old version: > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id) > I change it to : > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2795) Oracle to Oracle Transfer Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563058#comment-16563058 ] ASF GitHub Bot commented on AIRFLOW-2795: - marcusrehm commented on issue #3639: [AIRFLOW-2795] Oracle to Oracle Transfer Operator URL: https://github.com/apache/incubator-airflow/pull/3639#issuecomment-409075763 Just bumping up This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Oracle to Oracle Transfer Operator > --- > > Key: AIRFLOW-2795 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2795 > Project: Apache Airflow > Issue Type: New Feature > Components: operators >Reporter: Marcus Rehm >Assignee: Marcus Rehm >Priority: Trivial > > This operator should help in transfer data from one Oracle instance to > another or between tables in the same instance. t's suitable in use cases > where you don't want to or it's not allowed use dblink. > The operator needs a sql query and a destination table in order to work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563098#comment-16563098 ] ASF GitHub Bot commented on AIRFLOW-2825: - XD-DENG commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to case URL: https://github.com/apache/incubator-airflow/pull/3665#issuecomment-409081714 Hi @feng-tao, thanks for suggesting this. I have updated the related test. Instead of adding separate testing items, I updated the existing ones. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase > ext in S3 > --- > > Key: AIRFLOW-2825 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2825 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > Because upper/lower case was not considered in the extension check, > S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is > not a GZIP file and raise exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563096#comment-16563096 ] ASF GitHub Bot commented on AIRFLOW-2825: - codecov-io edited a comment on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to case URL: https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408920953 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=h1) Report > Merging [#3665](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3665/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3665 +/- ## == - Coverage 77.51% 77.51% -0.01% == Files 205 205 Lines 1575115751 == - Hits1221012209 -1 - Misses 3541 3542 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/s3\_to\_hive\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfdG9faGl2ZV9vcGVyYXRvci5weQ==) | `93.96% <ø> (ø)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.54% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=footer). Last update [dfa7b26...c7e5446](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase > ext in S3 > --- > > Key: AIRFLOW-2825 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2825 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > Because upper/lower case was not considered in the extension check, > S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is > not a GZIP file and raise exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564269#comment-16564269 ] ASF GitHub Bot commented on AIRFLOW-2814: - kaxil opened a new pull request #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0 URL: https://github.com/apache/incubator-airflow/pull/3669 Make sure you have checked _all_ steps below. ### JIRA - [x] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2795) Oracle to Oracle Transfer Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564244#comment-16564244 ] ASF GitHub Bot commented on AIRFLOW-2795: - Fokko closed pull request #3639: [AIRFLOW-2795] Oracle to Oracle Transfer Operator URL: https://github.com/apache/incubator-airflow/pull/3639 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/oracle_to_oracle_transfer.py b/airflow/contrib/operators/oracle_to_oracle_transfer.py new file mode 100644 index 00..31eb89b7dd --- /dev/null +++ b/airflow/contrib/operators/oracle_to_oracle_transfer.py @@ -0,0 +1,90 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.hooks.oracle_hook import OracleHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +class OracleToOracleTransfer(BaseOperator): +""" +Moves data from Oracle to Oracle. + + +:param oracle_destination_conn_id: destination Oracle connection. +:type oracle_destination_conn_id: str +:param destination_table: destination table to insert rows. +:type destination_table: str +:param oracle_source_conn_id: source Oracle connection. +:type oracle_source_conn_id: str +:param source_sql: SQL query to execute against the source Oracle +database. (templated) +:type source_sql: str +:param source_sql_params: Parameters to use in sql query. (templated) +:type source_sql_params: dict +:param rows_chunk: number of rows per chunk to commit. +:type rows_chunk: int +""" + +template_fields = ('source_sql', 'source_sql_params') +ui_color = '#e08c8c' + +@apply_defaults +def __init__( +self, +oracle_destination_conn_id, +destination_table, +oracle_source_conn_id, +source_sql, +source_sql_params={}, +rows_chunk=5000, +*args, **kwargs): +super(OracleToOracleTransfer, self).__init__(*args, **kwargs) +self.oracle_destination_conn_id = oracle_destination_conn_id +self.destination_table = destination_table +self.oracle_source_conn_id = oracle_source_conn_id +self.source_sql = source_sql +self.source_sql_params = source_sql_params +self.rows_chunk = rows_chunk + +def _execute(self, src_hook, dest_hook, context): +with src_hook.get_conn() as src_conn: +cursor = src_conn.cursor() +self.log.info("Querying data from source: {0}".format( +self.oracle_source_conn_id)) +cursor.execute(self.source_sql, self.source_sql_params) +target_fields = list(map(lambda field: field[0], cursor.description)) + +rows_total = 0 +rows = cursor.fetchmany(self.rows_chunk) +while len(rows) > 0: +rows_total = rows_total + len(rows) +dest_hook.bulk_insert_rows(self.destination_table, rows, + target_fields=target_fields, + commit_every=self.rows_chunk) +rows = cursor.fetchmany(self.rows_chunk) +self.log.info("Total inserted: {0} rows".format(rows_total)) + +self.log.info("Finished data transfer.") +cursor.close() + +def execute(self, context): +src_hook = OracleHook(oracle_conn_id=self.oracle_source_conn_id) +dest_hook = OracleHook(oracle_conn_id=self.oracle_destination_conn_id) +self._execute(src_hook, dest_hook, context) diff --git a/docs/code.rst b/docs/code.rst index 4f1b301711..f4f55b7b38 100644 --- a/docs/code.rst +++ b/docs/code.rst @@ -172,6 +172,7 @@ Operators .. autoclass:: airflow.contrib.operators.mongo_to_s3.MongoToS3Operator .. autoclass:: airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator ..
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564262#comment-16564262 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654107 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string Review comment: Should be `sagemaker_conn_id` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564264#comment-16564264 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206655197 ## File path: tests/contrib/hooks/test_sagemaker_hook.py ## @@ -0,0 +1,341 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + + +import json +import unittest +import copy +try: +from unittest import mock +except ImportError: +try: +import mock +except ImportError: +mock = None + +from airflow import configuration +from airflow import models +from airflow.utils import db +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.hooks.S3_hook import S3Hook +from airflow.exceptions import AirflowException + + +role = 'test-role' + +bucket = 'test-bucket' + +key = 'test/data' +data_url = 's3://{}/{}'.format(bucket, key) + +job_name = 'test-job-name' + +image = 'test-image' + +test_arn_return = {'TrainingJobArn': 'testarn'} + +test_list_training_job_return = { +'TrainingJobSummaries': [ +{ +'TrainingJobName': job_name, +'TrainingJobStatus': 'InProgress' +}, +], +'NextToken': 'test-token' +} + +test_list_tuning_job_return = { +'TrainingJobSummaries': [ +{ +'TrainingJobName': job_name, +'TrainingJobArn': 'testarn', +'TunedHyperParameters': { +'k': '3' +}, +'TrainingJobStatus': 'InProgress' +}, +], +'NextToken': 'test-token' +} + +output_url = 's3://{}/test/output'.format(bucket) +create_training_params = \ +{ +'AlgorithmSpecification': { +'TrainingImage': image, +'TrainingInputMode': 'File' +}, +'RoleArn': role, +'OutputDataConfig': { +'S3OutputPath': output_url +}, +'ResourceConfig': { +'InstanceCount': 2, +'InstanceType': 'ml.c4.8xlarge', +'VolumeSizeInGB': 50 +}, +'TrainingJobName': job_name, +'HyperParameters': { +'k': '10', +'feature_dim': '784', +'mini_batch_size': '500', +'force_dense': 'True' +}, +'StoppingCondition': { +'MaxRuntimeInSeconds': 60 * 60 +}, +'InputDataConfig': [ +{ +'ChannelName': 'train', +'DataSource': { +'S3DataSource': { +'S3DataType': 'S3Prefix', +'S3Uri': data_url, +'S3DataDistributionType': 'FullyReplicated' +} +}, +'CompressionType': 'None', +'RecordWrapperType': 'None' +} +] +} + +create_tuning_params = {'HyperParameterTuningJobName': job_name, +'HyperParameterTuningJobConfig': { +'Strategy': 'Bayesian', +'HyperParameterTuningJobObjective': { +'Type': 'Maximize', +'MetricName': 'test_metric' +}, +'ResourceLimits': { +'MaxNumberOfTrainingJobs': 123, +'MaxParallelTrainingJobs': 123 +}, +'ParameterRanges': { +'IntegerParameterRanges': [ +{ +'Name': 'k', +'MinValue': '2', +'MaxValue': '10' +}, +] +} +}, +'TrainingJobDefinition': { +
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564263#comment-16564263 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654727 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string + :param use_db_config: Whether or not to use db config + associated with sagemaker_conn_id. + If set to true, will automatically update the training config + with what's in db, so the db config doesn't need to + included everything, but what's there does replace the ones + in the training_job_config, so be careful + :type use_db_config: + :param aws_conn_id: The AWS connection ID to use. + :type aws_conn_id: string + + **Example**: + The following operator would start a training job when executed + +sagemaker_training = + SageMakerCreateTrainingJobOperator( + task_id='sagemaker_training', + training_job_config=config, + use_db_config=True, + region_name='us-west-2' + sagemaker_conn_id='sagemaker_customers_conn', + aws_conn_id='aws_customers_conn' + ) + """ + +template_fields = ['training_job_config'] +template_ext = () +ui_color = '#ededed' + +@apply_defaults +def __init__(self, + sagemaker_conn_id=None, Review comment: Please make the order of the arguments congruent with the docstring, or the other way around This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564265#comment-16564265 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654353 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string + :param use_db_config: Whether or not to use db config + associated with sagemaker_conn_id. Review comment: Missing `:type use_db_config: bool` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564270#comment-16564270 ] ASF GitHub Bot commented on AIRFLOW-2814: - kaxil commented on issue #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0 URL: https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409342022 @Fokko PTAL. Also, shouldn't we be reducing `dag_dir_list_interval` as well? It is 5 mins by default. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564316#comment-16564316 ] ASF GitHub Bot commented on AIRFLOW-2814: - kaxil commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config URL: https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409351337 Agreed with everyone. Do you guys think we should decrease the time duration for `dag_dir_list_interval` as well? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563849#comment-16563849 ] ASF GitHub Bot commented on AIRFLOW-2803: - ashb commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266779 FWIW I too am in favour of atomic/fixup! commits that then get squashed pre merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563847#comment-16563847 ] ASF GitHub Bot commented on AIRFLOW-2803: - tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326 @verdan Sure! Typically I keep atomic commits while I'm working so everyone can follow small changes instead of one big diff, then squash down to one commit at the end. I updated the title to make it clear this is WIP. Since you're doing most of the reviewing here, do you have a preference on squashing throughout working or just thinking about preparing for merge? I should have an update later today btw. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563848#comment-16563848 ] ASF GitHub Bot commented on AIRFLOW-2803: - tedmiston edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326 @verdan Sure! Typically I keep atomic commits while I'm working so everyone can follow small changes instead of one big diff, then squash down to one commit at the end. I updated the title to make it clear this is WIP. Since you're doing most of the reviewing here, do you have a preference on squashing throughout working vs just thinking about preparing for the merge with squashing at the end? I should have an update later today btw. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2800) Remove airflow/ low-hanging linting errors
[ https://issues.apache.org/jira/browse/AIRFLOW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563857#comment-16563857 ] ASF GitHub Bot commented on AIRFLOW-2800: - r39132 commented on issue #3638: [AIRFLOW-2800] Remove low-hanging linting errors URL: https://github.com/apache/incubator-airflow/pull/3638#issuecomment-409269190 Cool. Running `flake8 airflow | wc -l` on master and this PR branch, I see a decrease from `458` down to `235`! Thanks for making these changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove airflow/ low-hanging linting errors > -- > > Key: AIRFLOW-2800 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2800 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andy Cooper >Assignee: Andy Cooper >Priority: Major > > Removing low hanging linting errors from airflow directory > Focuses on > * E226 > * W291 > as well as *some* E501 (line too long) where it did not risk reducing > readability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564226#comment-16564226 ] ASF GitHub Bot commented on AIRFLOW-2814: - Fokko commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config URL: https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409335193 I would keep it at 0 by default. 3 minutes is definitely too high. 1 would also work for me as a compromise. Making changes to your dag, and not see them in the UI would feel awkward to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564228#comment-16564228 ] ASF GitHub Bot commented on AIRFLOW-2825: - Fokko commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to case URL: https://github.com/apache/incubator-airflow/pull/3665#issuecomment-409335560 LGTM, thanks @XD-DENG This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase > ext in S3 > --- > > Key: AIRFLOW-2825 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2825 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > Because upper/lower case was not considered in the extension check, > S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is > not a GZIP file and raise exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot
[ https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564246#comment-16564246 ] ASF GitHub Bot commented on AIRFLOW-2670: - Fokko commented on issue #3666: [AIRFLOW-2670] Update SSH Operator's Hook to respect timeout URL: https://github.com/apache/incubator-airflow/pull/3666#issuecomment-409338606 Nice one @Noremac201 Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SSHOperator's timeout parameter doesn't affect SSHook timeoot > - > > Key: AIRFLOW-2670 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2670 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: Airflow 2.0 >Reporter: jin zhang >Priority: Major > > when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook > and it's just effect exce_command. > old version: > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id) > I change it to : > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot
[ https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564247#comment-16564247 ] ASF GitHub Bot commented on AIRFLOW-2670: - Fokko closed pull request #3666: [AIRFLOW-2670] Update SSH Operator's Hook to respect timeout URL: https://github.com/apache/incubator-airflow/pull/3666 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/ssh_operator.py b/airflow/contrib/operators/ssh_operator.py index 2e890f463e..747ad04ff0 100644 --- a/airflow/contrib/operators/ssh_operator.py +++ b/airflow/contrib/operators/ssh_operator.py @@ -69,16 +69,17 @@ def __init__(self, def execute(self, context): try: if self.ssh_conn_id and not self.ssh_hook: -self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id) +self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, +timeout=self.timeout) if not self.ssh_hook: -raise AirflowException("can not operate without ssh_hook or ssh_conn_id") +raise AirflowException("Cannot operate without ssh_hook or ssh_conn_id.") if self.remote_host is not None: self.ssh_hook.remote_host = self.remote_host if not self.command: -raise AirflowException("no command specified so nothing to execute here.") +raise AirflowException("SSH command not specified. Aborting.") with self.ssh_hook.get_conn() as ssh_client: # Auto apply tty when its required in case of sudo diff --git a/tests/contrib/operators/test_ssh_operator.py b/tests/contrib/operators/test_ssh_operator.py index b97ba84a01..7ddd24b2ac 100644 --- a/tests/contrib/operators/test_ssh_operator.py +++ b/tests/contrib/operators/test_ssh_operator.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -58,6 +58,23 @@ def setUp(self): self.hook = hook self.dag = dag +def test_hook_created_correctly(self): +TIMEOUT = 20 +SSH_ID = "ssh_default" +task = SSHOperator( +task_id="test", +command="echo -n airflow", +dag=self.dag, +timeout=TIMEOUT, +ssh_conn_id="ssh_default" +) +self.assertIsNotNone(task) + +task.execute(None) + +self.assertEquals(TIMEOUT, task.ssh_hook.timeout) +self.assertEquals(SSH_ID, task.ssh_hook.ssh_conn_id) + def test_json_command_execution(self): configuration.conf.set("core", "enable_xcom_pickling", "False") task = SSHOperator( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SSHOperator's timeout parameter doesn't affect SSHook timeoot > - > > Key: AIRFLOW-2670 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2670 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: Airflow 2.0 >Reporter: jin zhang >Priority: Major > > when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook > and it's just effect exce_command. > old version: > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id) > I change it to : > self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564274#comment-16564274 ] ASF GitHub Bot commented on AIRFLOW-1104: - kaxil commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409343719 @dan-sf Can you please resolve the conflicts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564085#comment-16564085 ] ASF GitHub Bot commented on AIRFLOW-2658: - fenglu-g commented on a change in pull request #3532: [AIRFLOW-2658] Add GCP specific k8s pod operator URL: https://github.com/apache/incubator-airflow/pull/3532#discussion_r206629560 ## File path: airflow/contrib/operators/gcp_container_operator.py ## @@ -170,3 +175,147 @@ def execute(self, context): hook = GKEClusterHook(self.project_id, self.location) create_op = hook.create_cluster(cluster=self.body) return create_op + + +KUBE_CONFIG_ENV_VAR = "KUBECONFIG" +G_APP_CRED = "GOOGLE_APPLICATION_CREDENTIALS" + + +class GKEPodOperator(KubernetesPodOperator): +template_fields = ('project_id', 'location', + 'cluster_name') + KubernetesPodOperator.template_fields + +@apply_defaults +def __init__(self, + project_id, + location, + cluster_name, + gcp_conn_id='google_cloud_default', + *args, + **kwargs): +""" +Executes a task in a Kubernetes pod in the specified Google Kubernetes +Engine cluster + +This Operator assumes that the system has gcloud installed and either +has working default application credentials or has configured a +connection id with a service account. + +The **minimum** required to define a cluster to create are the variables +``task_id``, ``project_id``, ``location``, ``cluster_name``, ``name``, +``namespace``, and ``image`` + +**Operator Creation**: :: + +operator = GKEPodOperator(task_id='pod_op', + project_id='my-project', + location='us-central1-a', + cluster_name='my-cluster-name', + name='task-name', + namespace='default', + image='perl') + +.. seealso:: +For more detail about application authentication have a look at the reference: + https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application + +:param project_id: The Google Developers Console project id +:type project_id: str +:param location: The name of the Google Kubernetes Engine zone in which the +cluster resides, e.g. 'us-central1-a' +:type location: str +:param cluster_name: The name of the Google Kubernetes Engine cluster the pod +should be spawned in +:type cluster_name: str +:param gcp_conn_id: The google cloud connection id to use. This allows for +users to specify a service account. +:type gcp_conn_id: str +""" +super(GKEPodOperator, self).__init__(*args, **kwargs) +self.project_id = project_id +self.location = location +self.cluster_name = cluster_name +self.gcp_conn_id = gcp_conn_id + +def execute(self, context): +# Specifying a service account file allows the user to using non default +# authentication for creating a Kubernetes Pod. This is done by setting the +# environment variable `GOOGLE_APPLICATION_CREDENTIALS` that gcloud looks at. +key_file = None + +# If gcp_conn_id is not specified gcloud will use the default +# service account credentials. +if self.gcp_conn_id: +from airflow.hooks.base_hook import BaseHook +# extras is a deserialized json object +extras = BaseHook.get_connection(self.gcp_conn_id).extra_dejson +# key_file only gets set if a json file is created from a JSON string in +# the web ui, else none +key_file = self._set_env_from_extras(extras=extras) + +# Write config to a temp file and set the environment variable to point to it. +# This is to avoid race conditions of reading/writing a single file +with tempfile.NamedTemporaryFile() as conf_file: +os.environ[KUBE_CONFIG_ENV_VAR] = conf_file.name +# Attempt to get/update credentials +# We call gcloud directly instead of using google-cloud-python api +# because there is no way to write kubernetes config to a file, which is +# required by KubernetesPodOperator. +# The gcloud command looks at the env variable `KUBECONFIG` for where to save +# the kubernetes config file. +subprocess.check_call( +["gcloud", "container", "clusters", "get-credentials", + self.cluster_name, + "--zone", self.location, +
[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564241#comment-16564241 ] ASF GitHub Bot commented on AIRFLOW-2825: - Fokko closed pull request #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to case URL: https://github.com/apache/incubator-airflow/pull/3665 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/operators/s3_to_hive_operator.py b/airflow/operators/s3_to_hive_operator.py index 09eb8363c0..5faaf916b7 100644 --- a/airflow/operators/s3_to_hive_operator.py +++ b/airflow/operators/s3_to_hive_operator.py @@ -153,7 +153,7 @@ def execute(self, context): root, file_ext = os.path.splitext(s3_key_object.key) if (self.select_expression and self.input_compressed and -file_ext != '.gz'): +file_ext.lower() != '.gz'): raise AirflowException("GZIP is the only compression " + "format Amazon S3 Select supports") diff --git a/tests/operators/s3_to_hive_operator.py b/tests/operators/s3_to_hive_operator.py index 482e7fefc8..6ca6274a2c 100644 --- a/tests/operators/s3_to_hive_operator.py +++ b/tests/operators/s3_to_hive_operator.py @@ -89,6 +89,11 @@ def setUp(self): mode="wb") as f_gz_h: self._set_fn(fn_gz, '.gz', True) f_gz_h.writelines([header, line1, line2]) +fn_gz_upper = self._get_fn('.txt', True) + ".GZ" +with gzip.GzipFile(filename=fn_gz_upper, + mode="wb") as f_gz_upper_h: +self._set_fn(fn_gz_upper, '.GZ', True) +f_gz_upper_h.writelines([header, line1, line2]) fn_bz2 = self._get_fn('.txt', True) + '.bz2' with bz2.BZ2File(filename=fn_bz2, mode="wb") as f_bz2_h: @@ -105,6 +110,11 @@ def setUp(self): mode="wb") as f_gz_nh: self._set_fn(fn_gz, '.gz', False) f_gz_nh.writelines([line1, line2]) +fn_gz_upper = self._get_fn('.txt', False) + ".GZ" +with gzip.GzipFile(filename=fn_gz_upper, + mode="wb") as f_gz_upper_nh: +self._set_fn(fn_gz_upper, '.GZ', False) +f_gz_upper_nh.writelines([line1, line2]) fn_bz2 = self._get_fn('.txt', False) + '.bz2' with bz2.BZ2File(filename=fn_bz2, mode="wb") as f_bz2_nh: @@ -143,7 +153,7 @@ def _check_file_equality(self, fn_1, fn_2, ext): # gz files contain mtime and filename in the header that # causes filecmp to return False even if contents are identical # Hence decompress to test for equality -if(ext == '.gz'): +if(ext.lower() == '.gz'): with gzip.GzipFile(fn_1, 'rb') as f_1,\ NamedTemporaryFile(mode='wb') as f_txt_1,\ gzip.GzipFile(fn_2, 'rb') as f_2,\ @@ -220,14 +230,14 @@ def test_execute(self, mock_hiveclihook): conn.create_bucket(Bucket='bucket') # Testing txt, zip, bz2 files with and without header row -for (ext, has_header) in product(['.txt', '.gz', '.bz2'], [True, False]): +for (ext, has_header) in product(['.txt', '.gz', '.bz2', '.GZ'], [True, False]): self.kwargs['headers'] = has_header self.kwargs['check_headers'] = has_header logging.info("Testing {0} format {1} header". format(ext, ('with' if has_header else 'without')) ) -self.kwargs['input_compressed'] = ext != '.txt' +self.kwargs['input_compressed'] = ext.lower() != '.txt' self.kwargs['s3_key'] = 's3://bucket/' + self.s3_key + ext ip_fn = self._get_fn(ext, self.kwargs['headers']) op_fn = self._get_fn(ext, False) @@ -260,8 +270,8 @@ def test_execute_with_select_expression(self, mock_hiveclihook): # Only testing S3ToHiveTransfer calls S3Hook.select_key with # the right parameters and its execute method succeeds here, # since Moto doesn't support select_object_content as of 1.3.2. -for (ext, has_header) in product(['.txt', '.gz'], [True, False]): -input_compressed = ext != '.txt' +for (ext, has_header) in product(['.txt', '.gz', '.GZ'], [True, False]): +input_compressed = ext.lower() != '.txt' key = self.s3_key + ext self.kwargs['check_headers'] = False This is
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564311#comment-16564311 ] ASF GitHub Bot commented on AIRFLOW-1104: - dan-sf commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409350564 @kaxil Conflicts have been updated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564312#comment-16564312 ] ASF GitHub Bot commented on AIRFLOW-2814: - feng-tao commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config URL: https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409350792 +1 on keeping 0. 180 seconds is surely too high... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564313#comment-16564313 ] ASF GitHub Bot commented on AIRFLOW-1104: - kaxil commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409350840 Can you squash your commits as well? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563923#comment-16563923 ] ASF GitHub Bot commented on AIRFLOW-2803: - r39132 commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409282209 @verdan once @tedmiston is done, please provide your +1 and notify some of the committers on this PR that the PR is ready for validation and merge. Thx for your help on reviewing this PR! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563963#comment-16563963 ] ASF GitHub Bot commented on AIRFLOW-2803: - tedmiston edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326 @verdan Sure! Typically I keep atomic commits while I'm working so everyone can follow small changes instead of one big diff, then squash down to one commit at the end. I updated the title to make it clear this is WIP. Since you're doing most of the reviewing here, do you have a preference on squashing throughout working vs just squashing pre-merge? I should have an update later today btw. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564022#comment-16564022 ] ASF GitHub Bot commented on AIRFLOW-2803: - codecov-io edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-408503531 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=h1) Report > Merging [#3656](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/a338f3276835af45765d24a6e6d43ad4ba4d66ba?src=pr=desc) will **increase** coverage by `0.38%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3656/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3656 +/- ## == + Coverage 77.12% 77.51% +0.38% == Files 206 205 -1 Lines 1577215751 -21 == + Hits1216412209 +45 + Misses 3608 3542 -66 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `99.01% <0%> (-0.99%)` | :arrow_down: | | [airflow/plugins\_manager.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=) | `92.59% <0%> (ø)` | :arrow_up: | | [airflow/www/validators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmFsaWRhdG9ycy5weQ==) | `100% <0%> (ø)` | :arrow_up: | | [airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==) | `80.43% <0%> (ø)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.74% <0%> (ø)` | :arrow_up: | | [airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=) | | | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `89.87% <0%> (+0.42%)` | :arrow_up: | | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `100% <0%> (+100%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=footer). Last update [a338f32...ecbc873](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2800) Remove airflow/ low-hanging linting errors
[ https://issues.apache.org/jira/browse/AIRFLOW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563861#comment-16563861 ] ASF GitHub Bot commented on AIRFLOW-2800: - r39132 closed pull request #3638: [AIRFLOW-2800] Remove low-hanging linting errors URL: https://github.com/apache/incubator-airflow/pull/3638 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/__init__.py b/airflow/__init__.py index f40b08aab5..bc6a7bbe19 100644 --- a/airflow/__init__.py +++ b/airflow/__init__.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -80,11 +80,12 @@ class AirflowMacroPlugin(object): def __init__(self, namespace): self.namespace = namespace -from airflow import operators + +from airflow import operators # noqa: E402 from airflow import sensors # noqa: E402 -from airflow import hooks -from airflow import executors -from airflow import macros +from airflow import hooks # noqa: E402 +from airflow import executors # noqa: E402 +from airflow import macros # noqa: E402 operators._integrate_plugins() sensors._integrate_plugins() # noqa: E402 diff --git a/airflow/contrib/auth/backends/ldap_auth.py b/airflow/contrib/auth/backends/ldap_auth.py index eefaa1263b..516e121c9b 100644 --- a/airflow/contrib/auth/backends/ldap_auth.py +++ b/airflow/contrib/auth/backends/ldap_auth.py @@ -62,7 +62,7 @@ def get_ldap_connection(dn=None, password=None): cacert = configuration.conf.get("ldap", "cacert") tls_configuration = Tls(validate=ssl.CERT_REQUIRED, ca_certs_file=cacert) use_ssl = True -except: +except Exception: pass server = Server(configuration.conf.get("ldap", "uri"), use_ssl, tls_configuration) @@ -94,7 +94,7 @@ def groups_user(conn, search_base, user_filter, user_name_att, username): search_filter = "(&({0})({1}={2}))".format(user_filter, user_name_att, username) try: memberof_attr = configuration.conf.get("ldap", "group_member_attr") -except: +except Exception: memberof_attr = "memberOf" res = conn.search(native(search_base), native(search_filter), attributes=[native(memberof_attr)]) diff --git a/airflow/contrib/hooks/aws_hook.py b/airflow/contrib/hooks/aws_hook.py index 69a1b0bed3..8ca1f3d744 100644 --- a/airflow/contrib/hooks/aws_hook.py +++ b/airflow/contrib/hooks/aws_hook.py @@ -72,7 +72,7 @@ def _parse_s3_config(config_file_name, config_format='boto', profile=None): try: access_key = config.get(cred_section, key_id_option) secret_key = config.get(cred_section, secret_key_option) -except: +except Exception: logging.warning("Option Error in parsing s3 config file") raise return access_key, secret_key diff --git a/airflow/contrib/operators/awsbatch_operator.py b/airflow/contrib/operators/awsbatch_operator.py index a5c86afce6..353fbbb0a0 100644 --- a/airflow/contrib/operators/awsbatch_operator.py +++ b/airflow/contrib/operators/awsbatch_operator.py @@ -139,7 +139,7 @@ def _wait_for_task_ended(self): if response['jobs'][-1]['status'] in ['SUCCEEDED', 'FAILED']: retry = False -sleep( 1 + pow(retries * 0.1, 2)) +sleep(1 + pow(retries * 0.1, 2)) retries += 1 def _check_success_task(self): diff --git a/airflow/contrib/operators/mlengine_prediction_summary.py b/airflow/contrib/operators/mlengine_prediction_summary.py index 17fc2c0903..4efe81e641 100644 --- a/airflow/contrib/operators/mlengine_prediction_summary.py +++ b/airflow/contrib/operators/mlengine_prediction_summary.py @@ -112,14 +112,14 @@ def decode(self, x): @beam.ptransform_fn def MakeSummary(pcoll, metric_fn, metric_keys): # pylint: disable=invalid-name return ( -pcoll -| "ApplyMetricFnPerInstance" >> beam.Map(metric_fn) -| "PairWith1" >> beam.Map(lambda tup: tup + (1,)) -| "SumTuple" >> beam.CombineGlobally(beam.combiners.TupleCombineFn( -*([sum] * (len(metric_keys) + 1 -| "AverageAndMakeDict" >> beam.Map( +pcoll | +"ApplyMetricFnPerInstance" >> beam.Map(metric_fn) | +"PairWith1" >> beam.Map(lambda tup: tup + (1,)) | +"SumTuple" >>
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563965#comment-16563965 ] ASF GitHub Bot commented on AIRFLOW-2803: - tedmiston commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206602944 ## File path: airflow/www_rbac/static/js/clock.js ## @@ -18,24 +18,25 @@ */ require('./jqClock.min'); -$(document).ready(function () { - x = new Date(); +$(document).ready(() => { Review comment: Sounds good. I will stick with the ES5 for now for this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2310) Enable AWS Glue Job Integration
[ https://issues.apache.org/jira/browse/AIRFLOW-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564040#comment-16564040 ] ASF GitHub Bot commented on AIRFLOW-2310: - suma-ps commented on issue #3504: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/3504#issuecomment-409303864 @OElesin Do you plan to resolve the merge issues soon? Looking forward to using the Glue operator soon, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable AWS Glue Job Integration > --- > > Key: AIRFLOW-2310 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2310 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Olalekan Elesin >Assignee: Olalekan Elesin >Priority: Major > Labels: AWS > > Would it be possible to integrate AWS Glue into Airflow, such that Glue jobs > and ETL pipelines can be orchestrated with Airflow -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564330#comment-16564330 ] ASF GitHub Bot commented on AIRFLOW-1104: - kaxil closed pull request #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index 224ff185fb..a4252473cd 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -1075,9 +1075,6 @@ def _find_executable_task_instances(self, simple_dag_bag, states, session=None): :type states: Tuple[State] :return: List[TaskInstance] """ -# TODO(saguziel): Change this to include QUEUED, for concurrency -# purposes we may want to count queued tasks -states_to_count_as_running = [State.RUNNING] executable_tis = [] # Get all the queued task instances from associated with scheduled @@ -1123,6 +1120,7 @@ def _find_executable_task_instances(self, simple_dag_bag, states, session=None): for task_instance in task_instances_to_examine: pool_to_task_instances[task_instance.pool].append(task_instance) +states_to_count_as_running = [State.RUNNING, State.QUEUED] task_concurrency_map = self.__get_task_concurrency_map( states=states_to_count_as_running, session=session) @@ -1173,7 +1171,6 @@ def _find_executable_task_instances(self, simple_dag_bag, states, session=None): simple_dag = simple_dag_bag.get_dag(dag_id) if dag_id not in dag_id_to_possibly_running_task_count: -# TODO(saguziel): also check against QUEUED state, see AIRFLOW-1104 dag_id_to_possibly_running_task_count[dag_id] = \ DAG.get_num_task_instances( dag_id, diff --git a/tests/jobs.py b/tests/jobs.py index 93f6574df4..c701214f1e 100644 --- a/tests/jobs.py +++ b/tests/jobs.py @@ -1493,6 +1493,39 @@ def test_find_executable_task_instances_concurrency(self): self.assertEqual(0, len(res)) +def test_find_executable_task_instances_concurrency_queued(self): +dag_id = 'SchedulerJobTest.test_find_executable_task_instances_concurrency_queued' +dag = DAG(dag_id=dag_id, start_date=DEFAULT_DATE, concurrency=3) +task1 = DummyOperator(dag=dag, task_id='dummy1') +task2 = DummyOperator(dag=dag, task_id='dummy2') +task3 = DummyOperator(dag=dag, task_id='dummy3') +dagbag = self._make_simple_dag_bag([dag]) + +scheduler = SchedulerJob() +session = settings.Session() +dag_run = scheduler.create_dag_run(dag) + +ti1 = TI(task1, dag_run.execution_date) +ti2 = TI(task2, dag_run.execution_date) +ti3 = TI(task3, dag_run.execution_date) +ti1.state = State.RUNNING +ti2.state = State.QUEUED +ti3.state = State.SCHEDULED + +session.merge(ti1) +session.merge(ti2) +session.merge(ti3) + +session.commit() + +res = scheduler._find_executable_task_instances( +dagbag, +states=[State.SCHEDULED], +session=session) + +self.assertEqual(1, len(res)) +self.assertEqual(res[0].key, ti3.key) + def test_find_executable_task_instances_task_concurrency(self): dag_id = 'SchedulerJobTest.test_find_executable_task_instances_task_concurrency' task_id_1 = 'dummy' This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564359#comment-16564359 ] ASF GitHub Bot commented on AIRFLOW-1104: - codecov-io edited a comment on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568#issuecomment-401878707 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=h1) Report > Merging [#3568](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/3b35d360f6ff8694b6fb4387901c182ca39160b5?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3568/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3568 +/- ## == + Coverage 77.51% 77.51% +<.01% == Files 205 205 Lines 1575115751 == + Hits1220912210 +1 + Misses 3542 3541 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3568/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.74% <100%> (ø)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3568/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.58% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=footer). Last update [3b35d36...b04c9b1](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564370#comment-16564370 ] ASF GitHub Bot commented on AIRFLOW-2803: - ashb commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206684313 ## File path: airflow/www_rbac/templates/airflow/circles.html ## @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564387#comment-16564387 ] ASF GitHub Bot commented on AIRFLOW-2803: - tedmiston commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206688518 ## File path: airflow/www_rbac/templates/airflow/circles.html ## @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564395#comment-16564395 ] ASF GitHub Bot commented on AIRFLOW-2832: - codecov-io commented on issue #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409376218 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=h1) Report > Merging [#3670](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3670/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3670 +/- ## === Coverage 77.51% 77.51% === Files 205 205 Lines 1575115751 === Hits1221012210 Misses 3541 3541 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=footer). Last update [ed97204...eef6fc8](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564396#comment-16564396 ] ASF GitHub Bot commented on AIRFLOW-2832: - codecov-io edited a comment on issue #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409376218 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=h1) Report > Merging [#3670](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3670/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3670 +/- ## === Coverage 77.51% 77.51% === Files 205 205 Lines 1575115751 === Hits1221012210 Misses 3541 3541 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=footer). Last update [ed97204...eef6fc8](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564402#comment-16564402 ] ASF GitHub Bot commented on AIRFLOW-2658: - fenglu-g commented on issue #3532: [AIRFLOW-2658] Add GCP specific k8s pod operator URL: https://github.com/apache/incubator-airflow/pull/3532#issuecomment-409378846 @Noremac201 please fix travis-ci, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add GKE specific Kubernetes Pod Operator > > > Key: AIRFLOW-2658 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2658 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > > Currently there is a Kubernetes Pod operator, but it is not really easy to > have it work with GCP Kubernetes Engine, it would be nice to have one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564327#comment-16564327 ] ASF GitHub Bot commented on AIRFLOW-1104: - dan-sf commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not over schedule tasks URL: https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409355510 Sure, the changes have been rebased on master This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564342#comment-16564342 ] ASF GitHub Bot commented on AIRFLOW-2832: - tedmiston commented on issue #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409358478 This PR is now squashed and ready for review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564354#comment-16564354 ] ASF GitHub Bot commented on AIRFLOW-2832: - tedmiston edited a comment on issue #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409358478 This PR is now squashed and ready for review. I'm not sure that there's any one best person to review these changes but in a git log, I see that @bolkedebruin, @Fokko, and @r39132 have modified some of these files in recent history. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564341#comment-16564341 ] ASF GitHub Bot commented on AIRFLOW-2832: - tedmiston opened a new pull request #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670 Make sure you have checked _all_ steps below. ### JIRA - [x] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2832 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - Inspired by other recent issues related to linter errors in Python and JS (AIRFLOW-2783, AIRFLOW-2800, AIRFLOW-2803) - This PR does a few things: - Resolves linter errors in markdown files across the project (ignores errors that aren't super useful on GitHub such as line wrapping and putting `` in brackets) - Clarifies that commit message length of 50 characters doesn't include the Jira issue tag - Replaces usage of JIRA with Jira the way it's styled nowadays by [Atlassian](https://www.atlassian.com/software/jira) and [Wikipedia](https://en.wikipedia.org/wiki/Jira_(software)) - Makes code block formatting consistent ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: The changes in this PR are restricted to linting documentation. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. n/a ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564400#comment-16564400 ] ASF GitHub Bot commented on AIRFLOW-2814: - XD-DENG commented on issue #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0 URL: https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409378082 Hi @kaxil , please be reminded to update the comment in https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L592 as well, otherwise the comment will be inconsistent with the configuration value again. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564422#comment-16564422 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206700100 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string Review comment: Hi Fokko, Thank you so much for your review. I really appreciate your feedback. I didn't figure out how to reply to your request, so I'll just reply to you here. The main reason why I separate it to operator and sensor is that the success of the training job have two stages: successfully kick off a training job, and the training job successfully finishes. The operator tells about the first status, and the sensor tells the latter one. Also, since a training job is hosted at an AWS instance, not the instance that is hosting Airflow, so this way, other operators can set upstream to the operator, rather than the sensor, if they aren't dependent on the model actually being created. Also, by using the sensor, users can set parameters like poke_interval, which makes more sense for a sensor rather than an operator. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2835) Remove python-selinux
[ https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564864#comment-16564864 ] ASF GitHub Bot commented on AIRFLOW-2835: - Fokko opened a new pull request #3673: [AIRFLOW-2835] Remove python-selinux URL: https://github.com/apache/incubator-airflow/pull/3673 This package is not used and it sometimes breaks the CI because it is not available. Therefore it makes sense to just remove it :-) Example failed builds on the master branch: https://travis-ci.org/apache/incubator-airflow/jobs/410483664 https://travis-ci.org/apache/incubator-airflow/jobs/410483665 https://travis-ci.org/apache/incubator-airflow/jobs/410484305 Make sure you have checked _all_ steps below. ### JIRA - [x] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-2835\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-2835\], code changes always need a JIRA issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove python-selinux > - > > Key: AIRFLOW-2835 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2835 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > > This package sometimes crashes the CI and is not required. Therefore it does > not make sense to install it since it will take ci-time and make things > brittle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564908#comment-16564908 ] ASF GitHub Bot commented on AIRFLOW-2832: - Fokko closed pull request #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 6000d0e5ff..90452d954b 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,33 +1,34 @@ Make sure you have checked _all_ steps below. -### JIRA -- [ ] My PR addresses the following [Airflow JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" -- https://issues.apache.org/jira/browse/AIRFLOW-XXX -- In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. +### Jira +- [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" + - https://issues.apache.org/jira/browse/AIRFLOW-XXX + - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description -- [ ] Here are some details about my PR, including screenshots of any UI changes: +- [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests -- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: +- [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits -- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": -1. Subject is separated from body by a blank line -2. Subject is limited to 50 characters -3. Subject does not end with a period -4. Subject uses the imperative mood ("add", not "adding") -5. Body wraps at 72 characters -6. Body explains "what" and "why", not "how" +- [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": + 1. Subject is separated from body by a blank line + 1. Subject is limited to 50 characters (not including Jira issue reference) + 1. Subject does not end with a period + 1. Subject uses the imperative mood ("add", not "adding") + 1. Body wraps at 72 characters + 1. Body explains "what" and "why", not "how" ### Documentation -- [ ] In case of new functionality, my PR adds documentation that describes how to use it. -- When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. +- [ ] In case of new functionality, my PR adds documentation that describes how to use it. + - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality + - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 47a1a80549..2cf8e0218e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -3,22 +3,21 @@ Contributions are welcome and are greatly appreciated! Every little bit helps, and credit will always be given. - -# Table of Contents - * [TOC](#table-of-contents) - * [Types of Contributions](#types-of-contributions) - - [Report Bugs](#report-bugs) - - [Fix Bugs](#fix-bugs) - - [Implement Features](#implement-features) - - [Improve Documentation](#improve-documentation) - - [Submit Feedback](#submit-feedback) - * [Documentation](#documentation) - * [Development and Testing](#development-and-testing) - - [Setting up a development environment](#setting-up-a-development-environment) - - [Pull requests guidelines](#pull-request-guidelines) - - [Testing Locally](#testing-locally) - * [Changing the Metadata Database](#changing-the-metadata-database) - +## Table of Contents + +- [TOC](#table-of-contents) +- [Types of Contributions](#types-of-contributions) + - [Report Bugs](#report-bugs) + - [Fix Bugs](#fix-bugs) + - [Implement
[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files
[ https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564907#comment-16564907 ] ASF GitHub Bot commented on AIRFLOW-2832: - Fokko commented on a change in pull request #3670: [AIRFLOW-2832] Lint and resolve inconsistencies in Markdown files URL: https://github.com/apache/incubator-airflow/pull/3670#discussion_r206783822 ## File path: dev/README.md ## @@ -72,25 +76,33 @@ origin https://github.com//airflow (push) ``` JIRA + Users should set environment variables `JIRA_USERNAME` and `JIRA_PASSWORD` corresponding to their ASF JIRA login. This will allow the tool to automatically close issues. If they are not set, the user will be prompted every time. GitHub OAuth Token + Unauthenticated users can only make 60 requests/hour to the Github API. If you get an error about exceeding the rate, you will need to set a `GITHUB_OAUTH_KEY` environment variable that contains a token value. Users can generate tokens from their GitHub profile. ## Airflow release signing tool + The release signing tool can be used to create the SHA512/MD5 and ASC files that required for Apache releases. ### Execution -To create a release tar ball execute following command from Airflow's root. -`python setup.py compile_assets sdist --formats=gztar` +To create a release tarball execute following command from Airflow's root. -*Note: `compile_assets` command build the frontend assets (JS and CSS) files for the +```bash +python setup.py compile_assets sdist --formats=gztar +``` + +*Note: `compile_assets` command build the frontend assets (JS and CSS) files for the Web UI using webpack and npm. Please make sure you have `npm` installed on your local machine globally. Details on how to install `npm` can be found in CONTRIBUTING.md file.* After that navigate to relative directory i.e., `cd dist` and sign the release files. -`../dev/sign.sh Inconsistencies and linter errors across markdown files > --- > > Key: AIRFLOW-2832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2832 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > There are a number of inconsistencies within and across markdown files in the > Airflow project. Most of these are simple formatting issues easily fixed by > linting (e.g., with mdl). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564941#comment-16564941 ] ASF GitHub Bot commented on AIRFLOW-2803: - verdan commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206791865 ## File path: airflow/www_rbac/templates/airflow/circles.html ## @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
[jira] [Commented] (AIRFLOW-2835) Remove python-selinux
[ https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564947#comment-16564947 ] ASF GitHub Bot commented on AIRFLOW-2835: - bolkedebruin closed pull request #3673: [AIRFLOW-2835] Remove python-selinux URL: https://github.com/apache/incubator-airflow/pull/3673 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.travis.yml b/.travis.yml index 81e43fb4b8..4e490c74e1 100644 --- a/.travis.yml +++ b/.travis.yml @@ -40,7 +40,6 @@ addons: - krb5-kdc - krb5-admin-server - oracle-java8-installer - - python-selinux postgresql: "9.2" python: - "2.7" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove python-selinux > - > > Key: AIRFLOW-2835 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2835 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > > This package sometimes crashes the CI and is not required. Therefore it does > not make sense to install it since it will take ci-time and make things > brittle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2756) Marking DAG run does not set start_time and end_time correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564958#comment-16564958 ] ASF GitHub Bot commented on AIRFLOW-2756: - kaxil closed pull request #3606: [AIRFLOW-2756] Fix bug in set DAG run state workflow URL: https://github.com/apache/incubator-airflow/pull/3606 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/api/common/experimental/mark_tasks.py b/airflow/api/common/experimental/mark_tasks.py index 681864dfbe..88c5275f5a 100644 --- a/airflow/api/common/experimental/mark_tasks.py +++ b/airflow/api/common/experimental/mark_tasks.py @@ -206,7 +206,10 @@ def _set_dag_run_state(dag_id, execution_date, state, session=None): DR.execution_date == execution_date ).one() dr.state = state -dr.end_date = timezone.utcnow() +if state == State.RUNNING: +dr.start_date = timezone.utcnow() +else: +dr.end_date = timezone.utcnow() session.commit() diff --git a/airflow/jobs.py b/airflow/jobs.py index 00ede5451d..70891ab4c3 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -1023,8 +1023,7 @@ def _change_state_for_tis_without_dagrun(self, models.TaskInstance.dag_id == subq.c.dag_id, models.TaskInstance.task_id == subq.c.task_id, models.TaskInstance.execution_date == -subq.c.execution_date, -models.TaskInstance.task_id == subq.c.task_id)) \ +subq.c.execution_date)) \ .update({models.TaskInstance.state: new_state}, synchronize_session=False) session.commit() diff --git a/airflow/www/views.py b/airflow/www/views.py index d37c0db45d..1ee5a2df86 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -2741,7 +2741,8 @@ def after_model_change(self, form, dagrun, is_created, session=None): altered_tis = set_dag_run_state_to_success( dagbag.get_dag(dagrun.dag_id), dagrun.execution_date, -commit=True) +commit=True, +session=session) elif dagrun.state == State.FAILED: altered_tis = set_dag_run_state_to_failed( dagbag.get_dag(dagrun.dag_id), diff --git a/tests/api/common/experimental/mark_tasks.py b/tests/api/common/experimental/mark_tasks.py index 181d10d8a1..9bba91bee0 100644 --- a/tests/api/common/experimental/mark_tasks.py +++ b/tests/api/common/experimental/mark_tasks.py @@ -267,11 +267,25 @@ def _create_test_dag_run(self, state, date): def _verify_dag_run_state(self, dag, date, state): drs = models.DagRun.find(dag_id=dag.dag_id, execution_date=date) dr = drs[0] + self.assertEqual(dr.get_state(), state) +def _verify_dag_run_dates(self, dag, date, state, middle_time): +# When target state is RUNNING, we should set start_date, +# otherwise we should set end_date. +drs = models.DagRun.find(dag_id=dag.dag_id, execution_date=date) +dr = drs[0] +if state == State.RUNNING: +self.assertGreater(dr.start_date, middle_time) +self.assertIsNone(dr.end_date) +else: +self.assertLess(dr.start_date, middle_time) +self.assertGreater(dr.end_date, middle_time) + def test_set_running_dag_run_to_success(self): date = self.execution_dates[0] dr = self._create_test_dag_run(State.RUNNING, date) +middle_time = timezone.utcnow() self._set_default_task_instance_states(dr) altered = set_dag_run_state_to_success(self.dag1, date, commit=True) @@ -280,10 +294,12 @@ def test_set_running_dag_run_to_success(self): self.assertEqual(len(altered), 5) self._verify_dag_run_state(self.dag1, date, State.SUCCESS) self._verify_task_instance_states(self.dag1, date, State.SUCCESS) +self._verify_dag_run_dates(self.dag1, date, State.SUCCESS, middle_time) def test_set_running_dag_run_to_failed(self): date = self.execution_dates[0] dr = self._create_test_dag_run(State.RUNNING, date) +middle_time = timezone.utcnow() self._set_default_task_instance_states(dr) altered = set_dag_run_state_to_failed(self.dag1, date, commit=True) @@ -292,10 +308,12 @@ def test_set_running_dag_run_to_failed(self): self.assertEqual(len(altered), 1) self._verify_dag_run_state(self.dag1, date, State.FAILED) self.assertEqual(dr.get_task_instance('run_after_loop').state, State.FAILED) +self._verify_dag_run_dates(self.dag1, date, State.FAILED, middle_time)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564918#comment-16564918 ] ASF GitHub Bot commented on AIRFLOW-2524: - Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206786344 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string Review comment: Hi Keliang, thanks for explaining the Sagemaker process. I think it is very similar to for example the Druid hook that we have: https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/druid_hook.py#L93 This hook will kick of a job using a HTTP POST of a json document to the druid cluster, and make sure that it receives a http 200. And then it will continue to poll the job by invoking the API periodically. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2835) Remove python-selinux
[ https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564919#comment-16564919 ] ASF GitHub Bot commented on AIRFLOW-2835: - codecov-io commented on issue #3673: [AIRFLOW-2835] Remove python-selinux URL: https://github.com/apache/incubator-airflow/pull/3673#issuecomment-409485914 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=h1) Report > Merging [#3673](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3673/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3673 +/- ## === Coverage 77.51% 77.51% === Files 205 205 Lines 1575115751 === Hits1221012210 Misses 3541 3541 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=footer). Last update [ed97204...ed2a781](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove python-selinux > - > > Key: AIRFLOW-2835 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2835 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > > This package sometimes crashes the CI and is not required. Therefore it does > not make sense to install it since it will take ci-time and make things > brittle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564938#comment-16564938 ] ASF GitHub Bot commented on AIRFLOW-2803: - verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409491349 @tedmiston please tag me once it is ready for the next review. I see you're still working on this PR. i.e., Jinja template tags, indentation and some commented out code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564514#comment-16564514 ] ASF GitHub Bot commented on AIRFLOW-2524: - codecov-io edited a comment on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#issuecomment-408564225 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=h1) Report > Merging [#3658](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/096ba9ecd961cdaebd062599f408571ffb21165a?src=pr=desc) will **increase** coverage by `0.4%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3658/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree) ```diff @@Coverage Diff@@ ## master#3658 +/- ## = + Coverage 77.11% 77.51% +0.4% = Files 206 205 -1 Lines 1577215751 -21 = + Hits1216212210 +48 + Misses 3610 3541 -69 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `99.01% <0%> (-0.99%)` | :arrow_down: | | [airflow/www/validators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmFsaWRhdG9ycy5weQ==) | `100% <0%> (ø)` | :arrow_up: | | [airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==) | `80.43% <0%> (ø)` | :arrow_up: | | [airflow/plugins\_manager.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=) | `92.59% <0%> (ø)` | :arrow_up: | | [airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=) | | | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.74% <0%> (+0.26%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `89.87% <0%> (+0.42%)` | :arrow_up: | | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `100% <0%> (+100%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=footer). Last update [096ba9e...3f1e4b1](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564478#comment-16564478 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711354 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string + :param use_db_config: Whether or not to use db config + associated with sagemaker_conn_id. Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564480#comment-16564480 ] ASF GitHub Bot commented on AIRFLOW-2814: - codecov-io commented on issue #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0 URL: https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409396427 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=h1) Report > Merging [#3669](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc) will **increase** coverage by `0.27%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3669/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3669 +/- ## == + Coverage 77.51% 77.79% +0.27% == Files 205 205 Lines 1575116079 +328 == + Hits1221012508 +298 - Misses 3541 3571 +30 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3669/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `84.63% <ø> (+1.88%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3669/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `89.45% <0%> (-0.43%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=footer). Last update [ed97204...1ee1fc4](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564482#comment-16564482 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711545 ## File path: tests/contrib/hooks/test_sagemaker_hook.py ## @@ -0,0 +1,341 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + + +import json +import unittest +import copy +try: +from unittest import mock +except ImportError: +try: +import mock +except ImportError: +mock = None + +from airflow import configuration +from airflow import models +from airflow.utils import db +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.hooks.S3_hook import S3Hook +from airflow.exceptions import AirflowException + + +role = 'test-role' + +bucket = 'test-bucket' + +key = 'test/data' +data_url = 's3://{}/{}'.format(bucket, key) + +job_name = 'test-job-name' + +image = 'test-image' + +test_arn_return = {'TrainingJobArn': 'testarn'} + +test_list_training_job_return = { +'TrainingJobSummaries': [ +{ +'TrainingJobName': job_name, +'TrainingJobStatus': 'InProgress' +}, +], +'NextToken': 'test-token' +} + +test_list_tuning_job_return = { +'TrainingJobSummaries': [ +{ +'TrainingJobName': job_name, +'TrainingJobArn': 'testarn', +'TunedHyperParameters': { +'k': '3' +}, +'TrainingJobStatus': 'InProgress' +}, +], +'NextToken': 'test-token' +} + +output_url = 's3://{}/test/output'.format(bucket) +create_training_params = \ +{ +'AlgorithmSpecification': { +'TrainingImage': image, +'TrainingInputMode': 'File' +}, +'RoleArn': role, +'OutputDataConfig': { +'S3OutputPath': output_url +}, +'ResourceConfig': { +'InstanceCount': 2, +'InstanceType': 'ml.c4.8xlarge', +'VolumeSizeInGB': 50 +}, +'TrainingJobName': job_name, +'HyperParameters': { +'k': '10', +'feature_dim': '784', +'mini_batch_size': '500', +'force_dense': 'True' +}, +'StoppingCondition': { +'MaxRuntimeInSeconds': 60 * 60 +}, +'InputDataConfig': [ +{ +'ChannelName': 'train', +'DataSource': { +'S3DataSource': { +'S3DataType': 'S3Prefix', +'S3Uri': data_url, +'S3DataDistributionType': 'FullyReplicated' +} +}, +'CompressionType': 'None', +'RecordWrapperType': 'None' +} +] +} + +create_tuning_params = {'HyperParameterTuningJobName': job_name, +'HyperParameterTuningJobConfig': { +'Strategy': 'Bayesian', +'HyperParameterTuningJobObjective': { +'Type': 'Maximize', +'MetricName': 'test_metric' +}, +'ResourceLimits': { +'MaxNumberOfTrainingJobs': 123, +'MaxParallelTrainingJobs': 123 +}, +'ParameterRanges': { +'IntegerParameterRanges': [ +{ +'Name': 'k', +'MinValue': '2', +'MaxValue': '10' +}, +] +} +}, +'TrainingJobDefinition': { +
[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker
[ https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564481#comment-16564481 ] ASF GitHub Bot commented on AIRFLOW-2524: - troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711515 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string + :param use_db_config: Whether or not to use db config + associated with sagemaker_conn_id. + If set to true, will automatically update the training config + with what's in db, so the db config doesn't need to + included everything, but what's there does replace the ones + in the training_job_config, so be careful + :type use_db_config: + :param aws_conn_id: The AWS connection ID to use. + :type aws_conn_id: string + + **Example**: + The following operator would start a training job when executed + +sagemaker_training = + SageMakerCreateTrainingJobOperator( + task_id='sagemaker_training', + training_job_config=config, + use_db_config=True, + region_name='us-west-2' + sagemaker_conn_id='sagemaker_customers_conn', + aws_conn_id='aws_customers_conn' + ) + """ + +template_fields = ['training_job_config'] +template_ext = () +ui_color = '#ededed' + +@apply_defaults +def __init__(self, + sagemaker_conn_id=None, Review comment: Changed the order This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow integration with AWS Sagemaker > -- > > Key: AIRFLOW-2524 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2524 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib >Reporter: Rajeev Srinivasan >Assignee: Yang Yu >Priority: Major > Labels: AWS > > Would it be possible to orchestrate an end to end AWS Sagemaker job using > Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564491#comment-16564491 ] ASF GitHub Bot commented on AIRFLOW-2814: - XD-DENG commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config URL: https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409398992 Hi all, thanks for the inputs. Agree with you on the desired value as well (the objective of this PR was to fix inconsistency between `.cfg` and comment in `jobs.py`, instead of proposing another value for this configuration item). Hi @kaxil , regarding `dag_dir_list_interval`, personally I think it should be reduced. 5 minutes is quite long for users to wait until new DAG file is reflected. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565043#comment-16565043 ] ASF GitHub Bot commented on AIRFLOW-2817: - ashb commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409513668 If not I think vendoring python-nvd3 and slugify to use the non-GPL is probably the way to go. (Or perhaps replacing python-nvd3 entirely. That's a bigger job though. https://medium.com/@Elijah_Meeks/introducing-semiotic-for-data-visualization-88dc3c6b6926 looks interesting ,but uses React (which is fine from a licensing PoV now.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Force explicit choice on GPL dependency > --- > > Key: AIRFLOW-2817 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2817 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > > A more explicit choice on GPL dependency was required by the IPMC -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565046#comment-16565046 ] ASF GitHub Bot commented on AIRFLOW-2817: - ashb edited a comment on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409513668 If not I think vendoring python-nvd3 and slugify to use the non-GPL is probably the way to go. (Or perhaps replacing python-nvd3 entirely. That's a bigger job though. https://medium.com/@Elijah_Meeks/introducing-semiotic-for-data-visualization-88dc3c6b6926 looks interesting ,but uses React (which is fine from a licensing PoV now.) Edit: If we did use this I wouldn't suggest React-ifying the whole app, just the chart part of the page itself. If that's possible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Force explicit choice on GPL dependency > --- > > Key: AIRFLOW-2817 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2817 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > > A more explicit choice on GPL dependency was required by the IPMC -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2655) Default Kubernetes worker configurations are inconsistent
[ https://issues.apache.org/jira/browse/AIRFLOW-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565049#comment-16565049 ] ASF GitHub Bot commented on AIRFLOW-2655: - johnchenghk01 commented on issue #3529: [AIRFLOW-2655] Fix inconsistency of default config of kubernetes worker URL: https://github.com/apache/incubator-airflow/pull/3529#issuecomment-409515471 It will expose the DB password when doing a kubectl describe. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Kubernetes worker configurations are inconsistent > - > > Key: AIRFLOW-2655 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2655 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: 1.10.0 >Reporter: Shintaro Murakami >Priority: Minor > Fix For: 2.0.0 > > > if optional config `airflow_configmap` is not set, the worker configured with > `LocalExecutor` and sql_alchemy_conn starts with `sqlite`. > This combination is not allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565036#comment-16565036 ] ASF GitHub Bot commented on AIRFLOW-2817: - bolkedebruin closed pull request #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.travis.yml b/.travis.yml index 81e43fb4b8..e078d7c9ae 100644 --- a/.travis.yml +++ b/.travis.yml @@ -47,6 +47,7 @@ python: - "3.5" env: global: +- SLUGIFY_USES_TEXT_UNIDECODE=yes - TRAVIS_CACHE=$HOME/.travis_cache/ - KRB5_CONFIG=/etc/krb5.conf - KRB5_KTNAME=/etc/airflow.keytab diff --git a/INSTALL b/INSTALL index 5c8f03eb66..596ce25814 100644 --- a/INSTALL +++ b/INSTALL @@ -1,13 +1,30 @@ -# INSTALL / BUILD instruction for Apache Airflow (incubating) -# fetch the tarball and untar the source +# INSTALL / BUILD instructions for Apache Airflow (incubating) + +# [required] fetch the tarball and untar the source +# change into the directory that was untarred. # [optional] run Apache RAT (release audit tool) to validate license headers -# RAT docs here: https://creadur.apache.org/rat/ +# RAT docs here: https://creadur.apache.org/rat/. Requires Java and Apache Rat java -jar apache-rat.jar -E ./.rat-excludes -d . -# [optional] by default one of Apache Airflow's dependencies pulls in a GPL -# library. If this is a concern issue (also every upgrade): -# export SLUGIFY_USES_TEXT_UNIDECODE=yes +# [optional] Airflow pulls in quite a lot of dependencies in order +# to connect to other services. You might want to test or run Airflow +# from a virtual env to make sure those dependencies are separated +# from your system wide versions +python -m my_env +source my_env/bin/activate + +# [required] by default one of Apache Airflow's dependencies pulls in a GPL +# library. Airflow will not install (and upgrade) without an explicit choice. +# +# To make sure not to install the GPL dependency: +# export SLUGIFY_USES_TEXT_UNIDECODE=yes +# In case you do not mind: +# export GPL_UNIDECODE=yes + +# [required] building and installing +# by pip (preferred) +pip install . -# install the release +# or directly python setup.py install diff --git a/UPDATING.md b/UPDATING.md index da80f56fcb..ef29e1d3a4 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -5,6 +5,12 @@ assists users migrating to a new version. ## Airflow Master +## Airflow 1.10 + +Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` in your environment or +`AIRFLOW_GPL_UNIDECODE=yes`. In case of the latter a GPL runtime dependency will be installed due to a +dependency (python-nvd3 -> python-slugify -> unidecode). + ### Replace DataProcHook.await calls to DataProcHook.wait The method name was changed to be compatible with the Python 3.7 async/await keywords diff --git a/scripts/ci/kubernetes/docker/Dockerfile b/scripts/ci/kubernetes/docker/Dockerfile index 498c47b21a..93b20dbcd2 100644 --- a/scripts/ci/kubernetes/docker/Dockerfile +++ b/scripts/ci/kubernetes/docker/Dockerfile @@ -17,6 +17,8 @@ FROM ubuntu:16.04 +ENV SLUGIFY_USES_TEXT_UNIDECODE=yes + # install deps RUN apt-get update -y && apt-get install -y \ wget \ @@ -33,7 +35,6 @@ RUN apt-get update -y && apt-get install -y \ unzip \ && apt-get clean - RUN pip install --upgrade pip # Since we install vanilla Airflow, we also want to have support for Postgres and Kubernetes diff --git a/setup.py b/setup.py index 50af30944e..e69572c51d 100644 --- a/setup.py +++ b/setup.py @@ -35,6 +35,17 @@ PY3 = sys.version_info[0] == 3 +# See LEGAL-362 +def verify_gpl_dependency(): +if (not os.getenv("AIRFLOW_GPL_UNIDECODE") +and not os.getenv("SLUGIFY_USES_TEXT_UNIDECODE") == "yes"): +raise RuntimeError("By default one of Airflow's dependencies installs a GPL " + "dependency (unidecode). To avoid this dependency set " + "SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when you " + "install or upgrade Airflow. To force installing the GPL " + "version set AIRFLOW_GPL_UNIDECODE") + + class Tox(TestCommand): user_options = [('tox-args=', None, "Arguments to pass to tox")] @@ -258,6 +269,7 @@ def write_version(filename=os.path.join(*['airflow', def do_setup(): +verify_gpl_dependency() write_version() setup( name='apache-airflow', @@ -376,6 +388,7 @@ def do_setup(): 'License :: OSI Approved :: Apache Software License', 'Programming Language :: Python ::
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565038#comment-16565038 ] ASF GitHub Bot commented on AIRFLOW-2817: - bolkedebruin commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409512201 Will see if we can address the issue with upstream This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Force explicit choice on GPL dependency > --- > > Key: AIRFLOW-2817 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2817 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > > A more explicit choice on GPL dependency was required by the IPMC -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565082#comment-16565082 ] ASF GitHub Bot commented on AIRFLOW-2817: - verdan commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409522600 @ashb I believe we can remove the python-nvd3 entirely and use the custom javascript to render the charts using d3 and nvd3 JS libraries, just the way we are using Graph View on DAG detail page i.e., sending all the data from python and implement charts on the frontend in templates. But as you said, it will take some time to implement on the frontend, and won't be ready for the release 1.10. P.S: Yes, it is possible to make a part of the application/page use the React. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Force explicit choice on GPL dependency > --- > > Key: AIRFLOW-2817 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2817 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > > A more explicit choice on GPL dependency was required by the IPMC -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565025#comment-16565025 ] ASF GitHub Bot commented on AIRFLOW-2836: - XD-DENG opened a new pull request #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor URL: https://github.com/apache/incubator-airflow/pull/3674 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2836 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Background The default `fs_conn_id` in `contrib.sensors.FileSensor` is 'fs_default2'. However, when we initiate the database (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), there isn't such an entry. It doesn't exist anywhere else. Issue The purpose of `contrib.sensors.FileSensor` is mainly for checking local file system (of course can also be used for NAS). Then the path ("/") from default connection 'fs_default' would suffice. However, given the default value for fs_conn_id in contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will make the situation much more complex. When users intend to check local file system only, they should be able to leave fs_conn_id default directly, instead of going setting up another connection separately. Proposal Change default value for `fs_conn_id` in `contrib.sensors.FileSensor` from "fs_default2" to "fs_default" (actually in the related test, the `fs_conn_id` are all specified to be "fs_default"). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Minor improvement of contrib.sensors.FileSensor > --- > > Key: AIRFLOW-2836 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2836 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h4. *Background* > The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. > However, when we initiate the database > (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), > there isn't such an entry. It doesn't exist anywhere else. > h4. *Issue* > The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file > system (of course can also be used for NAS). Then the path ("/") from default > connection 'fs_default' would suffice. > However, given the default value for *fs_conn_id* in > contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will > make the situation much more complex. > When users intend to check local file system only, they should be able to > leave *fs_conn_id* default directly, instead of going setting up another > connection separately. > h4. Proposal > Change default value for *fs_conn_id* in contrib.sensors.FileSensor from > "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* > are all specified to be "fs_default"). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency
[ https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565070#comment-16565070 ] ASF GitHub Bot commented on AIRFLOW-2817: - ashb commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409521040 Something about the logic isn't right - everything on Travis is failing on the env check. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Force explicit choice on GPL dependency > --- > > Key: AIRFLOW-2817 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2817 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > > A more explicit choice on GPL dependency was required by the IPMC -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563309#comment-16563309 ] ASF GitHub Bot commented on AIRFLOW-2814: - kaxil commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config URL: https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409144039 @bolkedebruin @Fokko Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Arg "file_process_interval" for class SchedulerJob is inconsistent > with doc > --- > > Key: AIRFLOW-2814 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2814 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > > h2. Backgrond > In > [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592] > , it was mentioned the default value of argument *file_process_interval* > should be 3 minutes (*file_process_interval:* Parse and schedule each file no > faster than this interval). > The value is normally parsed from the default configuration. However, in the > default config_template, its value is 0 rather than 180 seconds > ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432] > ). > h2. Issue > This means that actually that each file is parsed and scheduled without > letting Airflow "rest". This conflicts with the design purpose (by default > let it be 180 seconds) and may affect performance significantly. > h2. My Proposal > Change the value in the config template from 0 to 180. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563319#comment-16563319 ] ASF GitHub Bot commented on AIRFLOW-2803: - verdan commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409147448 @tedmiston can you please make sure: - you squash your commits - your commit message adheres the [commit guidelines](https://github.com/apache/incubator-airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md#commits) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues
[ https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563312#comment-16563312 ] ASF GitHub Bot commented on AIRFLOW-2803: - verdan commented on a change in pull request #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206443837 ## File path: airflow/www_rbac/static/js/clock.js ## @@ -18,24 +18,25 @@ */ require('./jqClock.min'); -$(document).ready(function () { - x = new Date(); +$(document).ready(() => { Review comment: Please note that most of the custom JS is written inline in .html files, and we are not yet considering that javascript in webpack, that means, we won't be able to transpile that javascript to ES5. (which is fine for now) I am working on another issue to extract all inline JS from html files to separate .js files. https://issues.apache.org/jira/browse/AIRFLOW-2804 My suggestion would be to implement the ES6->ES5 tranpilation as part of this issue. And once this PR gets merged, we'll be able to extract all inline JS into separate .js files. We already have a JIRA issue for that: https://issues.apache.org/jira/browse/AIRFLOW-2730 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix all ESLint issues > - > > Key: AIRFLOW-2803 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2803 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Taylor Edmiston >Priority: Major > > Most of the JS code in Apache Airflow has linting issues which are > highlighted after the integration of ESLint. > Once AIRFLOW-2783 merged in master branch, please fix all the javascript > styling issues that we have in .js and .html files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2834) can not see the dag page after build from the newest code in github
[ https://issues.apache.org/jira/browse/AIRFLOW-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565151#comment-16565151 ] ASF GitHub Bot commented on AIRFLOW-2834: - yeluolei opened a new pull request #3675: [AIRFLOW-2834] fix build script for k8s docker URL: https://github.com/apache/incubator-airflow/pull/3675 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2834 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description the kubernetes docker build airflow without rbac support, but the configmap need rbac. so need to change the build script to build js and css files. currently when open airflow web ui deployed in kubernetes, the webpage is blank and will be some file missing. - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > can not see the dag page after build from the newest code in github > --- > > Key: AIRFLOW-2834 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2834 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 2.0 >Reporter: Rurui Ye >Assignee: Rurui Ye >Priority: Blocker > Attachments: image-2018-08-01-14-20-09-256.png > > > after build and deploy the newest version of code from github. got the web > server opened and the dags page blank with the following error in request > resource. > > !image-2018-08-01-14-20-09-256.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565171#comment-16565171 ] ASF GitHub Bot commented on AIRFLOW-2836: - XD-DENG commented on issue #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor URL: https://github.com/apache/incubator-airflow/pull/3674#issuecomment-409545344 Thanks @ashb . Green now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Minor improvement of contrib.sensors.FileSensor > --- > > Key: AIRFLOW-2836 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2836 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h4. *Background* > The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. > However, when we initiate the database > (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), > there isn't such an entry. It doesn't exist anywhere else. > h4. *Issue* > The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file > system (of course can also be used for NAS). Then the path ("/") from default > connection 'fs_default' would suffice. > However, given the default value for *fs_conn_id* in > contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will > make the situation much more complex. > When users intend to check local file system only, they should be able to > leave *fs_conn_id* default directly, instead of going setting up another > connection separately. > h4. Proposal > Change default value for *fs_conn_id* in contrib.sensors.FileSensor from > "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* > are all specified to be "fs_default"). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565170#comment-16565170 ] ASF GitHub Bot commented on AIRFLOW-2836: - codecov-io commented on issue #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor URL: https://github.com/apache/incubator-airflow/pull/3674#issuecomment-409544984 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=h1) Report > Merging [#3674](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3674/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=tree) ```diff @@Coverage Diff@@ ## master #3674 +/- ## = - Coverage77.5% 77.5% -0.01% = Files 205 205 Lines 15753 15753 = - Hits12210 12209 -1 - Misses 35433544 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3674/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.54% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=footer). Last update [c37fc0b...4d8abd8](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Minor improvement of contrib.sensors.FileSensor > --- > > Key: AIRFLOW-2836 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2836 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h4. *Background* > The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. > However, when we initiate the database > (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), > there isn't such an entry. It doesn't exist anywhere else. > h4. *Issue* > The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file > system (of course can also be used for NAS). Then the path ("/") from default > connection 'fs_default' would suffice. > However, given the default value for *fs_conn_id* in > contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will > make the situation much more complex. > When users intend to check local file system only, they should be able to > leave *fs_conn_id* default directly, instead of going setting up another > connection separately. > h4. Proposal > Change default value for *fs_conn_id* in contrib.sensors.FileSensor from > "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* > are all specified to be "fs_default"). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2846) devel requirement is not sufficient to run tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568846#comment-16568846 ] ASF GitHub Bot commented on AIRFLOW-2846: - holdenk opened a new pull request #3691: [AIRFLOW-2846] Add missing python test dependency to setup.py URL: https://github.com/apache/incubator-airflow/pull/3691 Add missing python test dependency (tox) to setup.py dev requirement. Make sure you have checked _all_ steps below. ### Jira - [ X ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ X ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Adds test dependency. ### Commits - [ X ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ X ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > devel requirement is not sufficient to run tests > > > Key: AIRFLOW-2846 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2846 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > > The devel requirement doesn't list tox, but `python setup.py test` requires > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2845) Remove asserts from the contrib code (change to legal exceptions)
[ https://issues.apache.org/jira/browse/AIRFLOW-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568769#comment-16568769 ] ASF GitHub Bot commented on AIRFLOW-2845: - xnuinside opened a new pull request #3690: [AIRFLOW-2845] Remove asserts from the contrib package URL: https://github.com/apache/incubator-airflow/pull/3690 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [AIRFLOW-2845](https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2845) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: `asserts` is used in Airflow contrib package code . And from point of view for which purposes asserts are really is, it's not correct. If we look at documentation we could find information what asserts is debug tool: https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement and also it is could be disabled globally by default. So, I just want to change debug asserts to ValueError and TypeError. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: It's covered by existing tests. No new features or important changes. ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove asserts from the contrib code (change to legal exceptions) > -- > > Key: AIRFLOW-2845 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2845 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: 1.10.1 >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > Labels: easyfix > Fix For: 1.9.0 > > > Hi guys. `asserts` is used in Airflow contrib package code . And from point > of view for which purposes asserts are really is, it's not correct. > If we look at documentation we could find information what asserts is debug > tool: > [https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement] > and also it is could be disabled globally by default. > If you do not mind, I will be happy to prepare PR for remove asserts from the > contrib module with changing it to raising errors with correct Exceptions and > messages and not just "Assertion Error". > I talk only about src (not about asserts in tests). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2849) devel requirement is not sufficient to check code quality locally
[ https://issues.apache.org/jira/browse/AIRFLOW-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569221#comment-16569221 ] ASF GitHub Bot commented on AIRFLOW-2849: - ashb closed pull request #3694: [AIRFLOW-2849] Add missing dependency flake8 to setup to allow running code quality checks locally URL: https://github.com/apache/incubator-airflow/pull/3694 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/setup.py b/setup.py index e69572c51d..d84c981ccb 100644 --- a/setup.py +++ b/setup.py @@ -246,7 +246,8 @@ def write_version(filename=os.path.join(*['airflow', 'pywinrm', 'qds-sdk>=1.9.6', 'rednose', -'requests_mock' +'requests_mock', +'flake8' ] if not PY3: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > devel requirement is not sufficient to check code quality locally > - > > Key: AIRFLOW-2849 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2849 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Reporter: Eyal Trabelsi >Assignee: Eyal Trabelsi >Priority: Trivial > Fix For: 2.0.0 > > > The devel requirement doesn't list flake8, but in order to check code quality > locally one need to install it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports
[ https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569284#comment-16569284 ] ASF GitHub Bot commented on AIRFLOW-2851: - tedmiston opened a new pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2851 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: This PR: 1. Replaces `import foo as _foo` style imports with the more common `import foo` used everywhere else across the codebase. I dug through history and couldn't find special reasons to maintain the as style imports here (I think it's just old code). Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), there are just a handful of import lines using `as _...` vs thousands not using it, so the goal here is to improve consistency. 2. It also simplifies `import foo.bar as bar` style imports to equivalent `from foo import bar`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Coverage by existing tests. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Canonicalize "as _..." etc imports > -- > > Key: AIRFLOW-2851 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2851 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > This PR: > 1. Replaces `import foo as _foo` style imports with the more common `import > foo` used everywhere else across the codebase. I dug through history and > couldn't find special reasons to maintain the as style imports here (I think > it's just old code). Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), > there are just a handful of import lines using `as _...` vs thousands not > using it, so the goal here is to improve consistency. > 2. It also simplifies `import foo.bar as bar` style imports to equivalent > `from foo import bar`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2850) Remove deprecated airflow.utils.apply_defaults
[ https://issues.apache.org/jira/browse/AIRFLOW-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569280#comment-16569280 ] ASF GitHub Bot commented on AIRFLOW-2850: - tedmiston opened a new pull request #3695: [AIRFLOW-2850] Remove deprecated airflow.utils.apply_defaults URL: https://github.com/apache/incubator-airflow/pull/3695 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2850 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: This PR removes the wrapper function `apply_defaults` that's had a deprecation warning since 2016. As similar "to be deprecated" stuff is removed for 2.0 in #3692, this felt like a good time to take care of related things. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Coverage by existing tests. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove deprecated airflow.utils.apply_defaults > -- > > Key: AIRFLOW-2850 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2850 > Project: Apache Airflow > Issue Type: Improvement > Components: utils >Affects Versions: 2.0.0 >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > This PR removes the wrapper function apply_defaults that's had a deprecation > warning since 2016. As similar "to be deprecated" stuff is removed for 2.0 > in #3692 ([AIRFLOW-2847]), this felt like a good time to take care of related > things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2806) test_mark_success_no_kill test breaks intermittently on CI
[ https://issues.apache.org/jira/browse/AIRFLOW-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569314#comment-16569314 ] ASF GitHub Bot commented on AIRFLOW-2806: - tedmiston closed pull request #3646: [WIP][AIRFLOW-2806] test_mark_success_no_kill test breaks intermittently on CI URL: https://github.com/apache/incubator-airflow/pull/3646 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.travis.yml b/.travis.yml index 81e43fb4b8..3f41d6525d 100644 --- a/.travis.yml +++ b/.travis.yml @@ -54,15 +54,15 @@ env: # does not work with python 3 - BOTO_CONFIG=/tmp/bogusvalue matrix: -- TOX_ENV=py27-backend_mysql -- TOX_ENV=py27-backend_sqlite -- TOX_ENV=py27-backend_postgres -- TOX_ENV=py35-backend_mysql -- TOX_ENV=py35-backend_sqlite +# - TOX_ENV=py27-backend_mysql +# - TOX_ENV=py27-backend_sqlite +# - TOX_ENV=py27-backend_postgres +# - TOX_ENV=py35-backend_mysql +# - TOX_ENV=py35-backend_sqlite - TOX_ENV=py35-backend_postgres -- TOX_ENV=flake8 -- TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0 -- TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 +# - TOX_ENV=flake8 +# - TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0 +# - TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 matrix: exclude: - python: "3.5" diff --git a/scripts/ci/kubernetes/docker/Dockerfile b/scripts/ci/kubernetes/docker/Dockerfile index 498c47b21a..ef72a6c08c 100644 --- a/scripts/ci/kubernetes/docker/Dockerfile +++ b/scripts/ci/kubernetes/docker/Dockerfile @@ -40,7 +40,7 @@ RUN pip install --upgrade pip RUN pip install -U setuptools && \ pip install kubernetes && \ pip install cryptography && \ -pip install psycopg2-binary==2.7.4 # I had issues with older versions of psycopg2, just a warning +pip install psycopg2-binary>=2.7.4 # I had issues with older versions of psycopg2, just a warning # install airflow COPY airflow.tar.gz /tmp/airflow.tar.gz diff --git a/setup.py b/setup.py index 50af30944e..bf4ce1d1cf 100644 --- a/setup.py +++ b/setup.py @@ -299,7 +299,7 @@ def do_setup(): 'python-nvd3==0.15.0', 'requests>=2.5.1, <3', 'setproctitle>=1.1.8, <2', -'sqlalchemy>=1.1.15, <1.2.0', +'sqlalchemy>=1.1.15, <1.3.0', 'sqlalchemy-utc>=0.9.0', 'tabulate>=0.7.5, <0.8.0', 'tenacity==4.8.0', diff --git a/tests/jobs.py b/tests/jobs.py index 93f6574df4..d4184236d8 100644 --- a/tests/jobs.py +++ b/tests/jobs.py @@ -1086,10 +1086,10 @@ def test_localtaskjob_heartbeat(self, mock_pid): mock_pid.return_value = 2 self.assertRaises(AirflowException, job1.heartbeat_callback) -@unittest.skipIf('mysql' in configuration.conf.get('core', 'sql_alchemy_conn'), - "flaky when run on mysql") -@unittest.skipIf('postgresql' in configuration.conf.get('core', 'sql_alchemy_conn'), - 'flaky when run on postgresql') +# @unittest.skipIf('mysql' in configuration.conf.get('core', 'sql_alchemy_conn'), +# "flaky when run on mysql") +# @unittest.skipIf('postgresql' in configuration.conf.get('core', 'sql_alchemy_conn'), +# 'flaky when run on postgresql') def test_mark_success_no_kill(self): """ Test that ensures that mark_success in the UI doesn't cause This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > test_mark_success_no_kill test breaks intermittently on CI > -- > > Key: AIRFLOW-2806 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2806 > Project: Apache Airflow > Issue Type: Bug >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > The test_mark_success_no_kill test is breaking intermittently on the CI for > some versions of Python and some databases, particularly Python 3.5 for both > PostgreSQL and MySQL. > A traceback of the error is > ([link|https://travis-ci.org/apache/incubator-airflow/jobs/407522994#L5668-L5701]): > {code:java} > 10) ERROR: test_mark_success_no_kill (tests.transplant_class..C) > -- > Traceback (most recent call last): > tests/jobs.py line 1116 in
[jira] [Commented] (AIRFLOW-2796) Improve code coverage for utils/helpers.py
[ https://issues.apache.org/jira/browse/AIRFLOW-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569327#comment-16569327 ] ASF GitHub Bot commented on AIRFLOW-2796: - feng-tao closed pull request #3637: [AIRFLOW-2796] Improve utils helpers code coverage URL: https://github.com/apache/incubator-airflow/pull/3637 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py index 1005671e9e..5fa941c55d 100644 --- a/tests/utils/test_helpers.py +++ b/tests/utils/test_helpers.py @@ -116,6 +116,43 @@ def test_reduce_in_chunks(self): 2), 14) +def test_is_in(self): +obj = ["list", "object"] +# Check for existence of a list object within a list +self.assertTrue( +helpers.is_in(obj, [obj]) +) + +# Check that an empty list returns false +self.assertFalse( +helpers.is_in(obj, []) +) + +# Check to ensure it handles None types +self.assertFalse( +helpers.is_in(None, [obj]) +) + +# Check to ensure true will be returned of multiple objects exist +self.assertTrue( +helpers.is_in(obj, [obj, obj]) +) + +def test_is_container(self): +self.assertFalse(helpers.is_container("a string is not a container")) +self.assertTrue(helpers.is_container(["a", "list", "is", "a", "container"])) + +def test_as_tuple(self): +self.assertEquals( +helpers.as_tuple("a string is not a container"), +("a string is not a container",) +) + +self.assertEquals( +helpers.as_tuple(["a", "list", "is", "a", "container"]), +("a", "list", "is", "a", "container") +) + if __name__ == '__main__': unittest.main() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve code coverage for utils/helpers.py > -- > > Key: AIRFLOW-2796 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2796 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andy Cooper >Assignee: Andy Cooper >Priority: Trivial > Fix For: 2.0.0 > > > Improve code coverage by adding tests for > * is_container > * is_in > * as_tuple -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1749) AirflowConfigParser fails to override has_option from ConfigParser, causing broken LDAP config
[ https://issues.apache.org/jira/browse/AIRFLOW-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569229#comment-16569229 ] ASF GitHub Bot commented on AIRFLOW-1749: - ashb closed pull request #2722: [AIRFLOW-1749] Fix has_option to consider environment and cmd overrides URL: https://github.com/apache/incubator-airflow/pull/2722 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index ff81d9827b..adefb3fc20 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -175,10 +175,10 @@ def _get_cmd_option(self, section, key): # if this is a valid command key... if (section, key) in AirflowConfigParser.as_command_stdout: # if the original key is present, return it no matter what -if self.has_option(section, key): +if ConfigParser.has_option(self, section, key): return ConfigParser.get(self, section, key) # otherwise, execute the fallback key -elif self.has_option(section, fallback_key): +elif ConfigParser.has_option(self, section, fallback_key): command = self.get(section, fallback_key) return run_command(command) @@ -192,7 +192,7 @@ def get(self, section, key, **kwargs): return option # ...then the config file -if self.has_option(section, key): +if ConfigParser.has_option(self, section, key) return expand_env_var( ConfigParser.get(self, section, key, **kwargs)) @@ -229,6 +229,11 @@ def getint(self, section, key): def getfloat(self, section, key): return float(self.get(section, key)) +def has_option(self, section, key): +return ((self._get_env_var_option(section, key) is not None) or + ConfigParser.has_option(self, section, key) or + (self._get_cmd_option(section, key) is not None)) + def read(self, filenames): ConfigParser.read(self, filenames) self._validate() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AirflowConfigParser fails to override has_option from ConfigParser, causing > broken LDAP config > -- > > Key: AIRFLOW-1749 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1749 > Project: Apache Airflow > Issue Type: Bug > Components: configuration >Affects Versions: Airflow 2.0, Airflow 1.8 > Environment: Ubuntu 16.04 >Reporter: Nick McNutt >Priority: Minor > Labels: easyfix > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > In configuration.py, class {{AirflowConfigParser}} fails to override > {{has_option}} from {{ConfigParser}}. This breaks the following in > ldap_auth.py: > {{if configuration.has_option("ldap", "search_scope"): > search_scope = SUBTREE if configuration.get("ldap", > "search_scope") == "SUBTREE" else LEVEL}} > This code fails to consider whether any environment variable (e.g., > {{AIRFLOW__LDAP__SEARCH_SCOPE}}) or command override's are set, meaning that > LDAP configuration cannot be entirely set up through environment variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-763) Vertica Check Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569240#comment-16569240 ] ASF GitHub Bot commented on AIRFLOW-763: ashb closed pull request #1998: [AIRFLOW-763] Add contrib check operator for Vertica URL: https://github.com/apache/incubator-airflow/pull/1998 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/vertica_check_operator.py b/airflow/contrib/operators/vertica_check_operator.py new file mode 100644 index 00..1f936cab3c --- /dev/null +++ b/airflow/contrib/operators/vertica_check_operator.py @@ -0,0 +1,125 @@ +# -*- coding: utf-8 -*- +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from airflow.contrib.hooks.vertica_hook import VerticaHook +from airflow.operators.check_operator import CheckOperator, ValueCheckOperator, IntervalCheckOperator +from airflow.utils.decorators import apply_defaults + +class VerticaCheckOperator(CheckOperator): +""" +Performs checks against Vertica. The ``VerticaCheckOperator`` expects +a sql query that will return a single row. Each value on that +first row is evaluated using python ``bool`` casting. If any of the +values return ``False`` the check is failed and errors out. + +Note that Python bool casting evals the following as ``False``: + +* ``False`` +* ``0`` +* Empty string (``""``) +* Empty list (``[]``) +* Empty dictionary or set (``{}``) + +Given a query like ``SELECT COUNT(*) FROM foo``, it will fail only if +the count ``== 0``. You can craft much more complex query that could, +for instance, check that the table has the same number of rows as +the source table upstream, or that the count of today's partition is +greater than yesterday's partition, or that a set of metrics are less +than 3 standard deviation for the 7 day average. + +This operator can be used as a data quality check in your pipeline, and +depending on where you put it in your DAG, you have the choice to +stop the critical path, preventing from +publishing dubious data, or on the side and receive email alerts +without stopping the progress of the DAG. + +:param sql: the sql to be executed +:type sql: string +:param vertica_conn_id: reference to the Vertica database +:type vertica_conn_id: string +""" + +@apply_defaults +def __init__( +self, +sql, +vertica_conn_id='vertica_default', +*args, +**kwargs): +super(VerticaCheckOperator, self).__init__(sql=sql, *args, **kwargs) +self.vertica_conn_id = vertica_conn_id +self.sql = sql + +def get_db_hook(self): +return VerticaHook(vertica_conn_id=self.vertica_conn_id) + + +class VerticaValueCheckOperator(ValueCheckOperator): +""" +Performs a simple value check using sql code. + +:param sql: the sql to be executed +:type sql: string +""" + +@apply_defaults +def __init__( +self, sql, pass_value, tolerance=None, +vertica_conn_id='vertica_default', +*args, **kwargs): +super(VerticaValueCheckOperator, self).__init__( +sql=sql, pass_value=pass_value, tolerance=tolerance, +*args, **kwargs) +self.vertica_conn_id = vertica_conn_id + +def get_db_hook(self): +return VerticaHook(vertica_conn_id=self.vertica_conn_id) + + +class VerticaIntervalCheckOperator(IntervalCheckOperator): +""" +Checks that the values of metrics given as SQL expressions are within +a certain tolerance of the ones from days_back before. + +This method constructs a query like so: + +SELECT {metrics_threshold_dict_key} FROM {table} +WHERE {date_filter_column}= + +:param table: the table name +:type table: str +:param days_back: number of days between ds and the ds we want to check +against. Defaults to 7 days +:type days_back: int +:param metrics_threshold: a dictionary of ratios indexed by metrics, for +example 'COUNT(*)': 1.5 would require a 50 percent or less difference +between the current day, and the prior
[jira] [Commented] (AIRFLOW-661) Celery Task Result Expiry
[ https://issues.apache.org/jira/browse/AIRFLOW-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569238#comment-16569238 ] ASF GitHub Bot commented on AIRFLOW-661: ashb closed pull request #2143: [AIRFLOW-661] Add Celery broker_transport_options config URL: https://github.com/apache/incubator-airflow/pull/2143 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index cfccbe9a25..47106e7347 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -319,6 +319,10 @@ def run_command(command): # information. broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow +# Celery broker transport options. Provide options in JSON format. Refer to +# the Celery documentation for more information. +broker_transport_options = {{}} + # Another key Celery setting celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow diff --git a/airflow/executors/celery_executor.py b/airflow/executors/celery_executor.py index 04414fbc08..a7d7114711 100644 --- a/airflow/executors/celery_executor.py +++ b/airflow/executors/celery_executor.py @@ -16,6 +16,7 @@ import logging import subprocess import time +import json from celery import Celery from celery import states as celery_states @@ -39,6 +40,7 @@ class CeleryConfig(object): CELERYD_PREFETCH_MULTIPLIER = 1 CELERY_ACKS_LATE = True BROKER_URL = configuration.get('celery', 'BROKER_URL') +BROKER_TRANSPORT_OPTIONS = json.loads(configuration.get('celery', 'BROKER_TRANSPORT_OPTIONS')) CELERY_RESULT_BACKEND = configuration.get('celery', 'CELERY_RESULT_BACKEND') CELERYD_CONCURRENCY = configuration.getint('celery', 'CELERYD_CONCURRENCY') CELERY_DEFAULT_QUEUE = DEFAULT_QUEUE This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Celery Task Result Expiry > - > > Key: AIRFLOW-661 > URL: https://issues.apache.org/jira/browse/AIRFLOW-661 > Project: Apache Airflow > Issue Type: Improvement > Components: celery, executor >Reporter: Robin Miller >Assignee: Robin Miller >Priority: Minor > > When using RabbitMQ as the Celery Results Backend, it is desirable to be able > to set the CELERY_TASK_RESULT_EXPIRES config option to reduce the time out > period of the task tombstones to less than a day. As such we should pull this > option from the airflow.cfg file and pass it through. -- This message was sent by Atlassian JIRA (v7.6.3#76005)