[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565368#comment-16565368
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

tedmiston commented on issue #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: 
https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409585654
 
 
   @Fokko Thanks for the quick merge!  I'll make a note to look into linting 
the bash code in Airflow and see if we have enough for a PR there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565575#comment-16565575
 ] 

ASF GitHub Bot commented on AIRFLOW-2658:
-

Noremac201 commented on issue #3532: [AIRFLOW-2658] Add GCP specific k8s pod 
operator
URL: 
https://github.com/apache/incubator-airflow/pull/3532#issuecomment-409633871
 
 
   Looks like Travis isn't posting, here's my personal Travis build:
   https://travis-ci.org/Noremac201/incubator-airflow/builds/410543165
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add GKE specific Kubernetes Pod Operator
> 
>
> Key: AIRFLOW-2658
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2658
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>
> Currently there is a Kubernetes Pod operator, but it is not really easy to 
> have it work with GCP Kubernetes Engine, it would be nice to have one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2829) Brush up the CI script for minikube

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565363#comment-16565363
 ] 

ASF GitHub Bot commented on AIRFLOW-2829:
-

codecov-io commented on issue #3676: [AIRFLOW-2829] Brush up the CI script for 
minikube
URL: 
https://github.com/apache/incubator-airflow/pull/3676#issuecomment-40958
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=h1)
 Report
   > Merging 
[#3676](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3676/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=tree)
   
   ```diff
   @@  Coverage Diff   @@
   ##   master   #3676   +/-   ##
   ==
 Coverage77.5%   77.5%   
   ==
 Files 205 205   
 Lines   15753   15753   
   ==
 Hits12210   12210   
 Misses   35433543
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=footer).
 Last update 
[c37fc0b...bc5fa06](https://codecov.io/gh/apache/incubator-airflow/pull/3676?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Brush up the CI script for minikube
> ---
>
> Key: AIRFLOW-2829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2829
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>
> Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found 
> some points that can be improved:
> - minikube version is hard-coded
> - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}}
> - Undefined variables: {{$unameOut}}
> - The following lines cause warnings if download is skipped:
> {code}
>  69 sudo mv bin/minikube /usr/local/bin/minikube
>  70 sudo mv bin/kubectl /usr/local/bin/kubectl
> {code}
> - {{return}} s at line 81 and 96 won't work since it's outside of a function
> - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See 
> https://github.com/kubernetes/minikube/issues/1883.
> {code}
> 105 _MINIKUBE="sudo PATH=$PATH minikube"
> 106 
> 107 $_MINIKUBE config set bootstrapper localkube
> 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION}  
> --vm-driver=none
> 109 $_MINIKUBE update-context
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567409#comment-16567409
 ] 

ASF GitHub Bot commented on AIRFLOW-2658:
-

kaxil closed pull request #3532: [AIRFLOW-2658] Add GCP specific k8s pod 
operator
URL: https://github.com/apache/incubator-airflow/pull/3532
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/gcp_container_operator.py 
b/airflow/contrib/operators/gcp_container_operator.py
index 5648b4d8a0..615eac8a0f 100644
--- a/airflow/contrib/operators/gcp_container_operator.py
+++ b/airflow/contrib/operators/gcp_container_operator.py
@@ -17,8 +17,13 @@
 # specific language governing permissions and limitations
 # under the License.
 #
+import os
+import subprocess
+import tempfile
+
 from airflow import AirflowException
 from airflow.contrib.hooks.gcp_container_hook import GKEClusterHook
+from airflow.contrib.operators.kubernetes_pod_operator import 
KubernetesPodOperator
 from airflow.models import BaseOperator
 from airflow.utils.decorators import apply_defaults
 
@@ -170,3 +175,147 @@ def execute(self, context):
 hook = GKEClusterHook(self.project_id, self.location)
 create_op = hook.create_cluster(cluster=self.body)
 return create_op
+
+
+KUBE_CONFIG_ENV_VAR = "KUBECONFIG"
+G_APP_CRED = "GOOGLE_APPLICATION_CREDENTIALS"
+
+
+class GKEPodOperator(KubernetesPodOperator):
+template_fields = ('project_id', 'location',
+   'cluster_name') + KubernetesPodOperator.template_fields
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ location,
+ cluster_name,
+ gcp_conn_id='google_cloud_default',
+ *args,
+ **kwargs):
+"""
+Executes a task in a Kubernetes pod in the specified Google Kubernetes
+Engine cluster
+
+This Operator assumes that the system has gcloud installed and either
+has working default application credentials or has configured a
+connection id with a service account.
+
+The **minimum** required to define a cluster to create are the 
variables
+``task_id``, ``project_id``, ``location``, ``cluster_name``, ``name``,
+``namespace``, and ``image``
+
+**Operator Creation**: ::
+
+operator = GKEPodOperator(task_id='pod_op',
+  project_id='my-project',
+  location='us-central1-a',
+  cluster_name='my-cluster-name',
+  name='task-name',
+  namespace='default',
+  image='perl')
+
+.. seealso::
+For more detail about application authentication have a look at 
the reference:
+
https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application
+
+:param project_id: The Google Developers Console project id
+:type project_id: str
+:param location: The name of the Google Kubernetes Engine zone in 
which the
+cluster resides, e.g. 'us-central1-a'
+:type location: str
+:param cluster_name: The name of the Google Kubernetes Engine cluster 
the pod
+should be spawned in
+:type cluster_name: str
+:param gcp_conn_id: The google cloud connection id to use. This allows 
for
+users to specify a service account.
+:type gcp_conn_id: str
+"""
+super(GKEPodOperator, self).__init__(*args, **kwargs)
+self.project_id = project_id
+self.location = location
+self.cluster_name = cluster_name
+self.gcp_conn_id = gcp_conn_id
+
+def execute(self, context):
+# Specifying a service account file allows the user to using non 
default
+# authentication for creating a Kubernetes Pod. This is done by 
setting the
+# environment variable `GOOGLE_APPLICATION_CREDENTIALS` that gcloud 
looks at.
+key_file = None
+
+# If gcp_conn_id is not specified gcloud will use the default
+# service account credentials.
+if self.gcp_conn_id:
+from airflow.hooks.base_hook import BaseHook
+# extras is a deserialized json object
+extras = BaseHook.get_connection(self.gcp_conn_id).extra_dejson
+# key_file only gets set if a json file is created from a JSON 
string in
+# the web ui, else none
+key_file = self._set_env_from_extras(extras=extras)
+
+# Write config to a temp 

[jira] [Commented] (AIRFLOW-2238) Update dev/airflow-pr to work with gitub for merge targets

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567407#comment-16567407
 ] 

ASF GitHub Bot commented on AIRFLOW-2238:
-

kaxil closed pull request #3680: [AIRFLOW-2238] Use SSH protocol for pushing to 
Github
URL: https://github.com/apache/incubator-airflow/pull/3680
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update dev/airflow-pr to work with gitub for merge targets
> --
>
> Key: AIRFLOW-2238
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2238
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: PR tool
>Reporter: Ash Berlin-Taylor
>Priority: Major
>
> We are planning on migrating the to the Apache "GitBox" project which lets 
> committers work directly on github. This will mean we might not _need_ to use 
> the pr tool, but we should update it so that it merges and pushes back to 
> github, not the ASF repo.
> I think we need to do this before we ask the ASF infra team to migrate our 
> repo over.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567954#comment-16567954
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

kaxil closed pull request #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: https://github.com/apache/incubator-airflow/pull/3669
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2843) ExternalTaskSensor: Add option to cease waiting immediately if the external task doesn't exist

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568260#comment-16568260
 ] 

ASF GitHub Bot commented on AIRFLOW-2843:
-

XD-DENG opened a new pull request #3688: [AIRFLOW-2843] 
ExternalTaskSensor-check if external task exists
URL: https://github.com/apache/incubator-airflow/pull/3688
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2843
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
    Background
   `ExternalTaskSensor` will keep waiting (given restrictions of retries, 
poke_interval, etc), even if the external task specified doesn't exist at all. 
In some cases, this waiting may still make sense as new DAG may backfill.
   
   But it may be good to provide an option to cease waiting immediately if the 
external task specified doesn't exist.
   
    Proposal
   Provide an argument `check_existence`. Set to `True` to check if the 
external task exists, and immediately cease waiting if the external task does 
not exist.
   
   **The default value is set to `False` (no check or ceasing will happen), so 
it will not affect any existing DAGs or user expectation.**
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ExternalTaskSensor: Add option to cease waiting immediately if the external 
> task doesn't exist
> --
>
> Key: AIRFLOW-2843
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2843
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h2. Background
> *ExternalTaskSensor* will keep waiting (given restrictions of retries, 
> poke_interval, etc), even if the external task specified doesn't exist at 
> all. In some cases, this waiting may still make sense as new DAG may backfill.
> But it may be good to provide an option to cease waiting immediately if the 
> external task specified doesn't exist.
> h2. Proposal
> Provide an argument "check_existence". Set to *True* to check if the external 
> task exists, and immediately cease waiting if the external task does not 
> exist.
> The default value is set to *False* (no check or ceasing will happen) so it 
> will not affect any existing DAGs or user expectation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2796) Improve code coverage for utils/helpers.py

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568205#comment-16568205
 ] 

ASF GitHub Bot commented on AIRFLOW-2796:
-

Fokko closed pull request #3686: [AIRFLOW-2796] Expand code coverage for 
utils/helpers.py
URL: https://github.com/apache/incubator-airflow/pull/3686
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index 1005671e9e..b2e79560f4 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -117,5 +117,62 @@ def test_reduce_in_chunks(self):
  14)
 
 
+class HelpersTest(unittest.TestCase):
+def test_as_tuple_iter(self):
+test_list = ['test_str']
+as_tup = helpers.as_tuple(test_list)
+self.assertTupleEqual(tuple(test_list), as_tup)
+
+def test_as_tuple_no_iter(self):
+test_str = 'test_str'
+as_tup = helpers.as_tuple(test_str)
+self.assertTupleEqual((test_str,), as_tup)
+
+def test_is_in(self):
+from airflow.utils import helpers
+# `is_in` expects an object, and a list as input
+
+test_dict = {'test': 1}
+test_list = ['test', 1, dict()]
+small_i = 3
+big_i = 2 ** 31
+test_str = 'test_str'
+test_tup = ('test', 'tuple')
+
+test_container = [test_dict, test_list, small_i, big_i, test_str, 
test_tup]
+
+# Test that integers are referenced as the same object
+self.assertTrue(helpers.is_in(small_i, test_container))
+self.assertTrue(helpers.is_in(3, test_container))
+
+# python caches small integers, so i is 3 will be True,
+# but `big_i is 2 ** 31` is False.
+self.assertTrue(helpers.is_in(big_i, test_container))
+self.assertFalse(helpers.is_in(2 ** 31, test_container))
+
+self.assertTrue(helpers.is_in(test_dict, test_container))
+self.assertFalse(helpers.is_in({'test': 1}, test_container))
+
+self.assertTrue(helpers.is_in(test_list, test_container))
+self.assertFalse(helpers.is_in(['test', 1, dict()], test_container))
+
+self.assertTrue(helpers.is_in(test_str, test_container))
+self.assertTrue(helpers.is_in('test_str', test_container))
+bad_str = 'test_'
+bad_str += 'str'
+self.assertFalse(helpers.is_in(bad_str, test_container))
+
+self.assertTrue(helpers.is_in(test_tup, test_container))
+self.assertFalse(helpers.is_in(('test', 'tuple'), test_container))
+bad_tup = ('test', 'tuple', 'hello')
+self.assertFalse(helpers.is_in(bad_tup[:2], test_container))
+
+def test_is_container(self):
+self.assertTrue(helpers.is_container(['test_list']))
+self.assertFalse(helpers.is_container('test_str_not_iterable'))
+# Pass an object that is not iter nor a string.
+self.assertFalse(helpers.is_container(10))
+
+
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve code coverage for utils/helpers.py
> --
>
> Key: AIRFLOW-2796
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2796
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andy Cooper
>Priority: Trivial
>
> Improve code coverage by adding tests for 
>  * is_container
>  * is_in
>  * as_tuple



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568223#comment-16568223
 ] 

ASF GitHub Bot commented on AIRFLOW-2836:
-

Fokko closed pull request #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: https://github.com/apache/incubator-airflow/pull/3674
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/sensors/file_sensor.py 
b/airflow/contrib/sensors/file_sensor.py
index 3f7bb24e08..3e49abdfb5 100644
--- a/airflow/contrib/sensors/file_sensor.py
+++ b/airflow/contrib/sensors/file_sensor.py
@@ -46,7 +46,7 @@ class FileSensor(BaseSensorOperator):
 @apply_defaults
 def __init__(self,
  filepath,
- fs_conn_id='fs_default2',
+ fs_conn_id='fs_default',
  *args,
  **kwargs):
 super(FileSensor, self).__init__(*args, **kwargs)
@@ -56,7 +56,7 @@ def __init__(self,
 def poke(self, context):
 hook = FSHook(self.fs_conn_id)
 basepath = hook.get_path()
-full_path = "/".join([basepath, self.filepath])
+full_path = os.path.join(basepath, self.filepath)
 self.log.info('Poking for file {full_path}'.format(**locals()))
 try:
 if stat.S_ISDIR(os.stat(full_path).st_mode):
diff --git a/tests/contrib/sensors/test_file_sensor.py 
b/tests/contrib/sensors/test_file_sensor.py
index d78400e317..0bb0007c60 100644
--- a/tests/contrib/sensors/test_file_sensor.py
+++ b/tests/contrib/sensors/test_file_sensor.py
@@ -125,6 +125,18 @@ def test_file_in_dir(self):
 finally:
 shutil.rmtree(dir)
 
+def test_default_fs_conn_id(self):
+with tempfile.NamedTemporaryFile() as tmp:
+task = FileSensor(
+task_id="test",
+filepath=tmp.name[1:],
+dag=self.dag,
+timeout=0,
+)
+task._hook = self.hook
+task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE,
+ ignore_ti_state=True)
+
 
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Minor improvement of contrib.sensors.FileSensor
> ---
>
> Key: AIRFLOW-2836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2836
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h4. *Background*
> The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. 
> However, when we initiate the database 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
>  there isn't such an entry. It doesn't exist anywhere else.
> h4. *Issue*
> The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file 
> system (of course can also be used for NAS). Then the path ("/") from default 
> connection 'fs_default' would suffice.
> However, given the default value for *fs_conn_id* in 
> contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
> make the situation much more complex. 
> When users intend to check local file system only, they should be able to 
> leave *fs_conn_id* default directly, instead of going setting up another 
> connection separately.
> h4. Proposal
> Change default value for *fs_conn_id* in contrib.sensors.FileSensor from 
> "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* 
> are all specified to be "fs_default").



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562270#comment-16562270
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273134
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
+ use_db_config=False,
+ region_name=None,
+ *args, **kwargs):
+self.sagemaker_conn_id = sagemaker_conn_id
+self.use_db_config = use_db_config
+self.region_name = region_name
+super(SageMakerHook, self).__init__(*args, **kwargs)
 
 Review comment:
   You are right, Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562269#comment-16562269
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273081
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562273#comment-16562273
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273440
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
 
 Review comment:
   No it doesn't. Its only used if user want to use config stored in db. 
Sagemaker hook still uses aws_conn_id to get credentials. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562276#comment-16562276
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273826
 
 

 ##
 File path: airflow/contrib/sensors/sagemaker_base_sensor.py
 ##
 @@ -0,0 +1,63 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerBaseSensor(BaseSensorOperator):
+"""
+Contains general sensor behavior for SageMaker.
+Subclasses should implement get_emr_response() and state_from_response() 
methods.
+Subclasses should also implement NON_TERMINAL_STATES and FAILED_STATE 
constants.
 
 Review comment:
   I replaced the constant with a method that raises an error if not 
implemented. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562458#comment-16562458
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

feng-tao commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due 
to case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408999062
 
 
   could you add a test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562468#comment-16562468
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

codecov-io edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-408503531
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=h1)
 Report
   > Merging 
[#3656](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/a338f3276835af45765d24a6e6d43ad4ba4d66ba?src=pr=desc)
 will **increase** coverage by `0.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3656/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3656  +/-   ##
   ==
   + Coverage   77.12%   77.51%   +0.39% 
   ==
 Files 206  205   -1 
 Lines   1577215751  -21 
   ==
   + Hits1216412210  +46 
   + Misses   3608 3541  -67
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <0%> (-0.99%)` | :arrow_down: |
   | 
[airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=)
 | | |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.58% <0%> (+0.04%)` | :arrow_up: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `89.87% <0%> (+0.42%)` | :arrow_up: |
   | 
[airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==)
 | `100% <0%> (+100%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=footer).
 Last update 
[a338f32...b65388a](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562663#comment-16562663
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

codecov-io commented on issue #3666: [AIRFLOW-2670] Update SSH Operator's Hook 
to respect timeout
URL: 
https://github.com/apache/incubator-airflow/pull/3666#issuecomment-409045376
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=h1)
 Report
   > Merging 
[#3666](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3666/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3666   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 205  205   
 Lines   1575115751   
   ===
 Hits1221012210   
 Misses   3541 3541
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=footer).
 Last update 
[dfa7b26...42b907c](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562541#comment-16562541
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

Noremac201 opened a new pull request #3666: [AIRFLOW-2670] Update SSH 
Operator's Hook to respect timeout
URL: https://github.com/apache/incubator-airflow/pull/3666
 
 
   ### JIRA
   - [x] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
   - https://issues.apache.org/jira/browse/AIRFLOW-2670
   
   ### Description
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Previously the SSH operator was not respecting the passed in timeout to the 
operator. Changed the Operator to pass the timeout to hook, as well as add a 
test to make sure the hook is being created correctly.
   
   Extension of #3553, mistakenly closed after I thought it was fixed elsewhere.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Code Quality
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2795) Oracle to Oracle Transfer Operator

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563058#comment-16563058
 ] 

ASF GitHub Bot commented on AIRFLOW-2795:
-

marcusrehm commented on issue #3639: [AIRFLOW-2795] Oracle to Oracle Transfer 
Operator
URL: 
https://github.com/apache/incubator-airflow/pull/3639#issuecomment-409075763
 
 
   Just bumping up


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Oracle to Oracle Transfer Operator 
> ---
>
> Key: AIRFLOW-2795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2795
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Reporter: Marcus Rehm
>Assignee: Marcus Rehm
>Priority: Trivial
>
> This operator should help in transfer data from one Oracle instance to 
> another or between tables in the same instance. t's suitable in use cases 
> where you don't want to or it's not allowed use dblink.
> The operator needs a sql query and a destination table in order to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563098#comment-16563098
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

XD-DENG commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to 
case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-409081714
 
 
   Hi @feng-tao, thanks for suggesting this.
   
   I have updated the related test. Instead of adding separate testing items, I 
updated the existing ones.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563096#comment-16563096
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

codecov-io edited a comment on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer 
bug due to case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408920953
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=h1)
 Report
   > Merging 
[#3665](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3665/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3665  +/-   ##
   ==
   - Coverage   77.51%   77.51%   -0.01% 
   ==
 Files 205  205  
 Lines   1575115751  
   ==
   - Hits1221012209   -1 
   - Misses   3541 3542   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/s3\_to\_hive\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfdG9faGl2ZV9vcGVyYXRvci5weQ==)
 | `93.96% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=footer).
 Last update 
[dfa7b26...c7e5446](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564269#comment-16564269
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

kaxil opened a new pull request #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: https://github.com/apache/incubator-airflow/pull/3669
 
 
   Make sure you have checked _all_ steps below.
   
   ### JIRA
   - [x] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
   - https://issues.apache.org/jira/browse/AIRFLOW-XXX
   - In case you are fixing a typo in the documentation you can prepend 
your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue.
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Documentation
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
   - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   
   ### Code Quality
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2795) Oracle to Oracle Transfer Operator

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564244#comment-16564244
 ] 

ASF GitHub Bot commented on AIRFLOW-2795:
-

Fokko closed pull request #3639: [AIRFLOW-2795] Oracle to Oracle Transfer 
Operator
URL: https://github.com/apache/incubator-airflow/pull/3639
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/oracle_to_oracle_transfer.py 
b/airflow/contrib/operators/oracle_to_oracle_transfer.py
new file mode 100644
index 00..31eb89b7dd
--- /dev/null
+++ b/airflow/contrib/operators/oracle_to_oracle_transfer.py
@@ -0,0 +1,90 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.hooks.oracle_hook import OracleHook
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class OracleToOracleTransfer(BaseOperator):
+"""
+Moves data from Oracle to Oracle.
+
+
+:param oracle_destination_conn_id: destination Oracle connection.
+:type oracle_destination_conn_id: str
+:param destination_table: destination table to insert rows.
+:type destination_table: str
+:param oracle_source_conn_id: source Oracle connection.
+:type oracle_source_conn_id: str
+:param source_sql: SQL query to execute against the source Oracle
+database. (templated)
+:type source_sql: str
+:param source_sql_params: Parameters to use in sql query. (templated)
+:type source_sql_params: dict
+:param rows_chunk: number of rows per chunk to commit.
+:type rows_chunk: int
+"""
+
+template_fields = ('source_sql', 'source_sql_params')
+ui_color = '#e08c8c'
+
+@apply_defaults
+def __init__(
+self,
+oracle_destination_conn_id,
+destination_table,
+oracle_source_conn_id,
+source_sql,
+source_sql_params={},
+rows_chunk=5000,
+*args, **kwargs):
+super(OracleToOracleTransfer, self).__init__(*args, **kwargs)
+self.oracle_destination_conn_id = oracle_destination_conn_id
+self.destination_table = destination_table
+self.oracle_source_conn_id = oracle_source_conn_id
+self.source_sql = source_sql
+self.source_sql_params = source_sql_params
+self.rows_chunk = rows_chunk
+
+def _execute(self, src_hook, dest_hook, context):
+with src_hook.get_conn() as src_conn:
+cursor = src_conn.cursor()
+self.log.info("Querying data from source: {0}".format(
+self.oracle_source_conn_id))
+cursor.execute(self.source_sql, self.source_sql_params)
+target_fields = list(map(lambda field: field[0], 
cursor.description))
+
+rows_total = 0
+rows = cursor.fetchmany(self.rows_chunk)
+while len(rows) > 0:
+rows_total = rows_total + len(rows)
+dest_hook.bulk_insert_rows(self.destination_table, rows,
+   target_fields=target_fields,
+   commit_every=self.rows_chunk)
+rows = cursor.fetchmany(self.rows_chunk)
+self.log.info("Total inserted: {0} rows".format(rows_total))
+
+self.log.info("Finished data transfer.")
+cursor.close()
+
+def execute(self, context):
+src_hook = OracleHook(oracle_conn_id=self.oracle_source_conn_id)
+dest_hook = OracleHook(oracle_conn_id=self.oracle_destination_conn_id)
+self._execute(src_hook, dest_hook, context)
diff --git a/docs/code.rst b/docs/code.rst
index 4f1b301711..f4f55b7b38 100644
--- a/docs/code.rst
+++ b/docs/code.rst
@@ -172,6 +172,7 @@ Operators
 .. autoclass:: airflow.contrib.operators.mongo_to_s3.MongoToS3Operator
 .. autoclass:: 
airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator
 .. 

[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564262#comment-16564262
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654107
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   Should be `sagemaker_conn_id`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564264#comment-16564264
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206655197
 
 

 ##
 File path: tests/contrib/hooks/test_sagemaker_hook.py
 ##
 @@ -0,0 +1,341 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+
+import json
+import unittest
+import copy
+try:
+from unittest import mock
+except ImportError:
+try:
+import mock
+except ImportError:
+mock = None
+
+from airflow import configuration
+from airflow import models
+from airflow.utils import db
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.hooks.S3_hook import S3Hook
+from airflow.exceptions import AirflowException
+
+
+role = 'test-role'
+
+bucket = 'test-bucket'
+
+key = 'test/data'
+data_url = 's3://{}/{}'.format(bucket, key)
+
+job_name = 'test-job-name'
+
+image = 'test-image'
+
+test_arn_return = {'TrainingJobArn': 'testarn'}
+
+test_list_training_job_return = {
+'TrainingJobSummaries': [
+{
+'TrainingJobName': job_name,
+'TrainingJobStatus': 'InProgress'
+},
+],
+'NextToken': 'test-token'
+}
+
+test_list_tuning_job_return = {
+'TrainingJobSummaries': [
+{
+'TrainingJobName': job_name,
+'TrainingJobArn': 'testarn',
+'TunedHyperParameters': {
+'k': '3'
+},
+'TrainingJobStatus': 'InProgress'
+},
+],
+'NextToken': 'test-token'
+}
+
+output_url = 's3://{}/test/output'.format(bucket)
+create_training_params = \
+{
+'AlgorithmSpecification': {
+'TrainingImage': image,
+'TrainingInputMode': 'File'
+},
+'RoleArn': role,
+'OutputDataConfig': {
+'S3OutputPath': output_url
+},
+'ResourceConfig': {
+'InstanceCount': 2,
+'InstanceType': 'ml.c4.8xlarge',
+'VolumeSizeInGB': 50
+},
+'TrainingJobName': job_name,
+'HyperParameters': {
+'k': '10',
+'feature_dim': '784',
+'mini_batch_size': '500',
+'force_dense': 'True'
+},
+'StoppingCondition': {
+'MaxRuntimeInSeconds': 60 * 60
+},
+'InputDataConfig': [
+{
+'ChannelName': 'train',
+'DataSource': {
+'S3DataSource': {
+'S3DataType': 'S3Prefix',
+'S3Uri': data_url,
+'S3DataDistributionType': 'FullyReplicated'
+}
+},
+'CompressionType': 'None',
+'RecordWrapperType': 'None'
+}
+]
+}
+
+create_tuning_params = {'HyperParameterTuningJobName': job_name,
+'HyperParameterTuningJobConfig': {
+'Strategy': 'Bayesian',
+'HyperParameterTuningJobObjective': {
+'Type': 'Maximize',
+'MetricName': 'test_metric'
+},
+'ResourceLimits': {
+'MaxNumberOfTrainingJobs': 123,
+'MaxParallelTrainingJobs': 123
+},
+'ParameterRanges': {
+'IntegerParameterRanges': [
+{
+'Name': 'k',
+'MinValue': '2',
+'MaxValue': '10'
+},
+]
+}
+},
+'TrainingJobDefinition': {
+

[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564263#comment-16564263
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654727
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
+   :param use_db_config: Whether or not to use db config
+   associated with sagemaker_conn_id.
+   If set to true, will automatically update the training config
+   with what's in db, so the db config doesn't need to
+   included everything, but what's there does replace the ones
+   in the training_job_config, so be careful
+   :type use_db_config:
+   :param aws_conn_id: The AWS connection ID to use.
+   :type aws_conn_id: string
+
+   **Example**:
+   The following operator would start a training job when executed
+
+sagemaker_training =
+   SageMakerCreateTrainingJobOperator(
+   task_id='sagemaker_training',
+   training_job_config=config,
+   use_db_config=True,
+   region_name='us-west-2'
+   sagemaker_conn_id='sagemaker_customers_conn',
+   aws_conn_id='aws_customers_conn'
+   )
+   """
+
+template_fields = ['training_job_config']
+template_ext = ()
+ui_color = '#ededed'
+
+@apply_defaults
+def __init__(self,
+ sagemaker_conn_id=None,
 
 Review comment:
   Please make the order of the arguments congruent with the docstring, or the 
other way around


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564265#comment-16564265
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206654353
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
+   :param use_db_config: Whether or not to use db config
+   associated with sagemaker_conn_id.
 
 Review comment:
   Missing `:type use_db_config: bool`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564270#comment-16564270
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

kaxil commented on issue #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: 
https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409342022
 
 
   @Fokko PTAL. Also, shouldn't we be reducing `dag_dir_list_interval` as well? 
It is 5 mins by default.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564316#comment-16564316
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

kaxil commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config
URL: 
https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409351337
 
 
   Agreed with everyone. Do you guys think we should decrease the time duration 
for `dag_dir_list_interval` as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563849#comment-16563849
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

ashb commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266779
 
 
   FWIW I too am in favour of atomic/fixup! commits that then get squashed pre 
merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563847#comment-16563847
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326
 
 
   @verdan Sure!  Typically I keep atomic commits while I'm working so everyone 
can follow small changes instead of one big diff, then squash down to one 
commit at the end.  I updated the title to make it clear this is WIP.  Since 
you're doing most of the reviewing here, do you have a preference on squashing 
throughout working or just thinking about preparing for merge?
   
   I should have an update later today btw.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563848#comment-16563848
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

tedmiston edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint 
issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326
 
 
   @verdan Sure!  Typically I keep atomic commits while I'm working so everyone 
can follow small changes instead of one big diff, then squash down to one 
commit at the end.  I updated the title to make it clear this is WIP.  Since 
you're doing most of the reviewing here, do you have a preference on squashing 
throughout working vs just thinking about preparing for the merge with 
squashing at the end?
   
   I should have an update later today btw.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2800) Remove airflow/ low-hanging linting errors

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563857#comment-16563857
 ] 

ASF GitHub Bot commented on AIRFLOW-2800:
-

r39132 commented on issue #3638: [AIRFLOW-2800] Remove low-hanging linting 
errors
URL: 
https://github.com/apache/incubator-airflow/pull/3638#issuecomment-409269190
 
 
   Cool. Running `flake8 airflow | wc -l` on master and this PR branch, I see a 
decrease from `458` down to `235`!
   
   Thanks for making these changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove airflow/ low-hanging linting errors
> --
>
> Key: AIRFLOW-2800
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2800
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andy Cooper
>Assignee: Andy Cooper
>Priority: Major
>
> Removing low hanging linting errors from airflow directory
> Focuses on
>  * E226
>  * W291
> as well as *some* E501 (line too long) where it did not risk reducing 
> readability



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564226#comment-16564226
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

Fokko commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config
URL: 
https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409335193
 
 
   I would keep it at 0 by default. 3 minutes is definitely too high. 1 would 
also work for me as a compromise. Making changes to your dag, and not see them 
in the UI would feel awkward to me. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564228#comment-16564228
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

Fokko commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to 
case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-409335560
 
 
   LGTM, thanks @XD-DENG 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564246#comment-16564246
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

Fokko commented on issue #3666: [AIRFLOW-2670] Update SSH Operator's Hook to 
respect timeout
URL: 
https://github.com/apache/incubator-airflow/pull/3666#issuecomment-409338606
 
 
   Nice one @Noremac201 Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564247#comment-16564247
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

Fokko closed pull request #3666: [AIRFLOW-2670] Update SSH Operator's Hook to 
respect timeout
URL: https://github.com/apache/incubator-airflow/pull/3666
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/ssh_operator.py 
b/airflow/contrib/operators/ssh_operator.py
index 2e890f463e..747ad04ff0 100644
--- a/airflow/contrib/operators/ssh_operator.py
+++ b/airflow/contrib/operators/ssh_operator.py
@@ -69,16 +69,17 @@ def __init__(self,
 def execute(self, context):
 try:
 if self.ssh_conn_id and not self.ssh_hook:
-self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
+self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id,
+timeout=self.timeout)
 
 if not self.ssh_hook:
-raise AirflowException("can not operate without ssh_hook or 
ssh_conn_id")
+raise AirflowException("Cannot operate without ssh_hook or 
ssh_conn_id.")
 
 if self.remote_host is not None:
 self.ssh_hook.remote_host = self.remote_host
 
 if not self.command:
-raise AirflowException("no command specified so nothing to 
execute here.")
+raise AirflowException("SSH command not specified. Aborting.")
 
 with self.ssh_hook.get_conn() as ssh_client:
 # Auto apply tty when its required in case of sudo
diff --git a/tests/contrib/operators/test_ssh_operator.py 
b/tests/contrib/operators/test_ssh_operator.py
index b97ba84a01..7ddd24b2ac 100644
--- a/tests/contrib/operators/test_ssh_operator.py
+++ b/tests/contrib/operators/test_ssh_operator.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -58,6 +58,23 @@ def setUp(self):
 self.hook = hook
 self.dag = dag
 
+def test_hook_created_correctly(self):
+TIMEOUT = 20
+SSH_ID = "ssh_default"
+task = SSHOperator(
+task_id="test",
+command="echo -n airflow",
+dag=self.dag,
+timeout=TIMEOUT,
+ssh_conn_id="ssh_default"
+)
+self.assertIsNotNone(task)
+
+task.execute(None)
+
+self.assertEquals(TIMEOUT, task.ssh_hook.timeout)
+self.assertEquals(SSH_ID, task.ssh_hook.ssh_conn_id)
+
 def test_json_command_execution(self):
 configuration.conf.set("core", "enable_xcom_pickling", "False")
 task = SSHOperator(


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564274#comment-16564274
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

kaxil commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not 
over schedule tasks
URL: 
https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409343719
 
 
   @dan-sf Can you please resolve the conflicts?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564085#comment-16564085
 ] 

ASF GitHub Bot commented on AIRFLOW-2658:
-

fenglu-g commented on a change in pull request #3532: [AIRFLOW-2658] Add GCP 
specific k8s pod operator
URL: https://github.com/apache/incubator-airflow/pull/3532#discussion_r206629560
 
 

 ##
 File path: airflow/contrib/operators/gcp_container_operator.py
 ##
 @@ -170,3 +175,147 @@ def execute(self, context):
 hook = GKEClusterHook(self.project_id, self.location)
 create_op = hook.create_cluster(cluster=self.body)
 return create_op
+
+
+KUBE_CONFIG_ENV_VAR = "KUBECONFIG"
+G_APP_CRED = "GOOGLE_APPLICATION_CREDENTIALS"
+
+
+class GKEPodOperator(KubernetesPodOperator):
+template_fields = ('project_id', 'location',
+   'cluster_name') + KubernetesPodOperator.template_fields
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ location,
+ cluster_name,
+ gcp_conn_id='google_cloud_default',
+ *args,
+ **kwargs):
+"""
+Executes a task in a Kubernetes pod in the specified Google Kubernetes
+Engine cluster
+
+This Operator assumes that the system has gcloud installed and either
+has working default application credentials or has configured a
+connection id with a service account.
+
+The **minimum** required to define a cluster to create are the 
variables
+``task_id``, ``project_id``, ``location``, ``cluster_name``, ``name``,
+``namespace``, and ``image``
+
+**Operator Creation**: ::
+
+operator = GKEPodOperator(task_id='pod_op',
+  project_id='my-project',
+  location='us-central1-a',
+  cluster_name='my-cluster-name',
+  name='task-name',
+  namespace='default',
+  image='perl')
+
+.. seealso::
+For more detail about application authentication have a look at 
the reference:
+
https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application
+
+:param project_id: The Google Developers Console project id
+:type project_id: str
+:param location: The name of the Google Kubernetes Engine zone in 
which the
+cluster resides, e.g. 'us-central1-a'
+:type location: str
+:param cluster_name: The name of the Google Kubernetes Engine cluster 
the pod
+should be spawned in
+:type cluster_name: str
+:param gcp_conn_id: The google cloud connection id to use. This allows 
for
+users to specify a service account.
+:type gcp_conn_id: str
+"""
+super(GKEPodOperator, self).__init__(*args, **kwargs)
+self.project_id = project_id
+self.location = location
+self.cluster_name = cluster_name
+self.gcp_conn_id = gcp_conn_id
+
+def execute(self, context):
+# Specifying a service account file allows the user to using non 
default
+# authentication for creating a Kubernetes Pod. This is done by 
setting the
+# environment variable `GOOGLE_APPLICATION_CREDENTIALS` that gcloud 
looks at.
+key_file = None
+
+# If gcp_conn_id is not specified gcloud will use the default
+# service account credentials.
+if self.gcp_conn_id:
+from airflow.hooks.base_hook import BaseHook
+# extras is a deserialized json object
+extras = BaseHook.get_connection(self.gcp_conn_id).extra_dejson
+# key_file only gets set if a json file is created from a JSON 
string in
+# the web ui, else none
+key_file = self._set_env_from_extras(extras=extras)
+
+# Write config to a temp file and set the environment variable to 
point to it.
+# This is to avoid race conditions of reading/writing a single file
+with tempfile.NamedTemporaryFile() as conf_file:
+os.environ[KUBE_CONFIG_ENV_VAR] = conf_file.name
+# Attempt to get/update credentials
+# We call gcloud directly instead of using google-cloud-python api
+# because there is no way to write kubernetes config to a file, 
which is
+# required by KubernetesPodOperator.
+# The gcloud command looks at the env variable `KUBECONFIG` for 
where to save
+# the kubernetes config file.
+subprocess.check_call(
+["gcloud", "container", "clusters", "get-credentials",
+ self.cluster_name,
+ "--zone", self.location,
+  

[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564241#comment-16564241
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

Fokko closed pull request #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to 
case
URL: https://github.com/apache/incubator-airflow/pull/3665
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/s3_to_hive_operator.py 
b/airflow/operators/s3_to_hive_operator.py
index 09eb8363c0..5faaf916b7 100644
--- a/airflow/operators/s3_to_hive_operator.py
+++ b/airflow/operators/s3_to_hive_operator.py
@@ -153,7 +153,7 @@ def execute(self, context):
 
 root, file_ext = os.path.splitext(s3_key_object.key)
 if (self.select_expression and self.input_compressed and
-file_ext != '.gz'):
+file_ext.lower() != '.gz'):
 raise AirflowException("GZIP is the only compression " +
"format Amazon S3 Select supports")
 
diff --git a/tests/operators/s3_to_hive_operator.py 
b/tests/operators/s3_to_hive_operator.py
index 482e7fefc8..6ca6274a2c 100644
--- a/tests/operators/s3_to_hive_operator.py
+++ b/tests/operators/s3_to_hive_operator.py
@@ -89,6 +89,11 @@ def setUp(self):
mode="wb") as f_gz_h:
 self._set_fn(fn_gz, '.gz', True)
 f_gz_h.writelines([header, line1, line2])
+fn_gz_upper = self._get_fn('.txt', True) + ".GZ"
+with gzip.GzipFile(filename=fn_gz_upper,
+   mode="wb") as f_gz_upper_h:
+self._set_fn(fn_gz_upper, '.GZ', True)
+f_gz_upper_h.writelines([header, line1, line2])
 fn_bz2 = self._get_fn('.txt', True) + '.bz2'
 with bz2.BZ2File(filename=fn_bz2,
  mode="wb") as f_bz2_h:
@@ -105,6 +110,11 @@ def setUp(self):
mode="wb") as f_gz_nh:
 self._set_fn(fn_gz, '.gz', False)
 f_gz_nh.writelines([line1, line2])
+fn_gz_upper = self._get_fn('.txt', False) + ".GZ"
+with gzip.GzipFile(filename=fn_gz_upper,
+   mode="wb") as f_gz_upper_nh:
+self._set_fn(fn_gz_upper, '.GZ', False)
+f_gz_upper_nh.writelines([line1, line2])
 fn_bz2 = self._get_fn('.txt', False) + '.bz2'
 with bz2.BZ2File(filename=fn_bz2,
  mode="wb") as f_bz2_nh:
@@ -143,7 +153,7 @@ def _check_file_equality(self, fn_1, fn_2, ext):
 # gz files contain mtime and filename in the header that
 # causes filecmp to return False even if contents are identical
 # Hence decompress to test for equality
-if(ext == '.gz'):
+if(ext.lower() == '.gz'):
 with gzip.GzipFile(fn_1, 'rb') as f_1,\
  NamedTemporaryFile(mode='wb') as f_txt_1,\
  gzip.GzipFile(fn_2, 'rb') as f_2,\
@@ -220,14 +230,14 @@ def test_execute(self, mock_hiveclihook):
 conn.create_bucket(Bucket='bucket')
 
 # Testing txt, zip, bz2 files with and without header row
-for (ext, has_header) in product(['.txt', '.gz', '.bz2'], [True, 
False]):
+for (ext, has_header) in product(['.txt', '.gz', '.bz2', '.GZ'], 
[True, False]):
 self.kwargs['headers'] = has_header
 self.kwargs['check_headers'] = has_header
 logging.info("Testing {0} format {1} header".
  format(ext,
 ('with' if has_header else 'without'))
  )
-self.kwargs['input_compressed'] = ext != '.txt'
+self.kwargs['input_compressed'] = ext.lower() != '.txt'
 self.kwargs['s3_key'] = 's3://bucket/' + self.s3_key + ext
 ip_fn = self._get_fn(ext, self.kwargs['headers'])
 op_fn = self._get_fn(ext, False)
@@ -260,8 +270,8 @@ def test_execute_with_select_expression(self, 
mock_hiveclihook):
 # Only testing S3ToHiveTransfer calls S3Hook.select_key with
 # the right parameters and its execute method succeeds here,
 # since Moto doesn't support select_object_content as of 1.3.2.
-for (ext, has_header) in product(['.txt', '.gz'], [True, False]):
-input_compressed = ext != '.txt'
+for (ext, has_header) in product(['.txt', '.gz', '.GZ'], [True, 
False]):
+input_compressed = ext.lower() != '.txt'
 key = self.s3_key + ext
 
 self.kwargs['check_headers'] = False


 


This is 

[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564311#comment-16564311
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

dan-sf commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does 
not over schedule tasks
URL: 
https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409350564
 
 
   @kaxil Conflicts have been updated


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564312#comment-16564312
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

feng-tao commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default 
config
URL: 
https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409350792
 
 
   +1 on keeping 0. 180 seconds is surely too high...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564313#comment-16564313
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

kaxil commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does not 
over schedule tasks
URL: 
https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409350840
 
 
   Can you squash your commits as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563923#comment-16563923
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

r39132 commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409282209
 
 
   @verdan once @tedmiston is done, please provide your +1 and notify some of 
the committers on this PR that the PR is ready for validation and merge. Thx 
for your help on reviewing this PR!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563963#comment-16563963
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

tedmiston edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint 
issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409266326
 
 
   @verdan Sure!  Typically I keep atomic commits while I'm working so everyone 
can follow small changes instead of one big diff, then squash down to one 
commit at the end.  I updated the title to make it clear this is WIP.  Since 
you're doing most of the reviewing here, do you have a preference on squashing 
throughout working vs just squashing pre-merge?
   
   I should have an update later today btw.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564022#comment-16564022
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

codecov-io edited a comment on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint 
issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-408503531
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=h1)
 Report
   > Merging 
[#3656](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/a338f3276835af45765d24a6e6d43ad4ba4d66ba?src=pr=desc)
 will **increase** coverage by `0.38%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3656/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3656  +/-   ##
   ==
   + Coverage   77.12%   77.51%   +0.38% 
   ==
 Files 206  205   -1 
 Lines   1577215751  -21 
   ==
   + Hits1216412209  +45 
   + Misses   3608 3542  -66
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <0%> (-0.99%)` | :arrow_down: |
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `92.59% <0%> (ø)` | :arrow_up: |
   | 
[airflow/www/validators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmFsaWRhdG9ycy5weQ==)
 | `100% <0%> (ø)` | :arrow_up: |
   | 
[airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==)
 | `80.43% <0%> (ø)` | :arrow_up: |
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `82.74% <0%> (ø)` | :arrow_up: |
   | 
[airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=)
 | | |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `89.87% <0%> (+0.42%)` | :arrow_up: |
   | 
[airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==)
 | `100% <0%> (+100%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=footer).
 Last update 
[a338f32...ecbc873](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2800) Remove airflow/ low-hanging linting errors

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563861#comment-16563861
 ] 

ASF GitHub Bot commented on AIRFLOW-2800:
-

r39132 closed pull request #3638: [AIRFLOW-2800] Remove low-hanging linting 
errors
URL: https://github.com/apache/incubator-airflow/pull/3638
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/__init__.py b/airflow/__init__.py
index f40b08aab5..bc6a7bbe19 100644
--- a/airflow/__init__.py
+++ b/airflow/__init__.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -80,11 +80,12 @@ class AirflowMacroPlugin(object):
 def __init__(self, namespace):
 self.namespace = namespace
 
-from airflow import operators
+
+from airflow import operators  # noqa: E402
 from airflow import sensors  # noqa: E402
-from airflow import hooks
-from airflow import executors
-from airflow import macros
+from airflow import hooks  # noqa: E402
+from airflow import executors  # noqa: E402
+from airflow import macros  # noqa: E402
 
 operators._integrate_plugins()
 sensors._integrate_plugins()  # noqa: E402
diff --git a/airflow/contrib/auth/backends/ldap_auth.py 
b/airflow/contrib/auth/backends/ldap_auth.py
index eefaa1263b..516e121c9b 100644
--- a/airflow/contrib/auth/backends/ldap_auth.py
+++ b/airflow/contrib/auth/backends/ldap_auth.py
@@ -62,7 +62,7 @@ def get_ldap_connection(dn=None, password=None):
 cacert = configuration.conf.get("ldap", "cacert")
 tls_configuration = Tls(validate=ssl.CERT_REQUIRED, 
ca_certs_file=cacert)
 use_ssl = True
-except:
+except Exception:
 pass
 
 server = Server(configuration.conf.get("ldap", "uri"), use_ssl, 
tls_configuration)
@@ -94,7 +94,7 @@ def groups_user(conn, search_base, user_filter, 
user_name_att, username):
 search_filter = "(&({0})({1}={2}))".format(user_filter, user_name_att, 
username)
 try:
 memberof_attr = configuration.conf.get("ldap", "group_member_attr")
-except:
+except Exception:
 memberof_attr = "memberOf"
 res = conn.search(native(search_base), native(search_filter),
   attributes=[native(memberof_attr)])
diff --git a/airflow/contrib/hooks/aws_hook.py 
b/airflow/contrib/hooks/aws_hook.py
index 69a1b0bed3..8ca1f3d744 100644
--- a/airflow/contrib/hooks/aws_hook.py
+++ b/airflow/contrib/hooks/aws_hook.py
@@ -72,7 +72,7 @@ def _parse_s3_config(config_file_name, config_format='boto', 
profile=None):
 try:
 access_key = config.get(cred_section, key_id_option)
 secret_key = config.get(cred_section, secret_key_option)
-except:
+except Exception:
 logging.warning("Option Error in parsing s3 config file")
 raise
 return access_key, secret_key
diff --git a/airflow/contrib/operators/awsbatch_operator.py 
b/airflow/contrib/operators/awsbatch_operator.py
index a5c86afce6..353fbbb0a0 100644
--- a/airflow/contrib/operators/awsbatch_operator.py
+++ b/airflow/contrib/operators/awsbatch_operator.py
@@ -139,7 +139,7 @@ def _wait_for_task_ended(self):
 if response['jobs'][-1]['status'] in ['SUCCEEDED', 'FAILED']:
 retry = False
 
-sleep( 1 + pow(retries * 0.1, 2))
+sleep(1 + pow(retries * 0.1, 2))
 retries += 1
 
 def _check_success_task(self):
diff --git a/airflow/contrib/operators/mlengine_prediction_summary.py 
b/airflow/contrib/operators/mlengine_prediction_summary.py
index 17fc2c0903..4efe81e641 100644
--- a/airflow/contrib/operators/mlengine_prediction_summary.py
+++ b/airflow/contrib/operators/mlengine_prediction_summary.py
@@ -112,14 +112,14 @@ def decode(self, x):
 @beam.ptransform_fn
 def MakeSummary(pcoll, metric_fn, metric_keys):  # pylint: disable=invalid-name
 return (
-pcoll
-| "ApplyMetricFnPerInstance" >> beam.Map(metric_fn)
-| "PairWith1" >> beam.Map(lambda tup: tup + (1,))
-| "SumTuple" >> beam.CombineGlobally(beam.combiners.TupleCombineFn(
-*([sum] * (len(metric_keys) + 1
-| "AverageAndMakeDict" >> beam.Map(
+pcoll |
+"ApplyMetricFnPerInstance" >> beam.Map(metric_fn) |
+"PairWith1" >> beam.Map(lambda tup: tup + (1,)) |
+"SumTuple" >> 

[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563965#comment-16563965
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

tedmiston commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix 
all ESLint issues
URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206602944
 
 

 ##
 File path: airflow/www_rbac/static/js/clock.js
 ##
 @@ -18,24 +18,25 @@
  */
 require('./jqClock.min');
 
-$(document).ready(function () {
-  x = new Date();
+$(document).ready(() => {
 
 Review comment:
   Sounds good.  I will stick with the ES5 for now for this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2310) Enable AWS Glue Job Integration

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564040#comment-16564040
 ] 

ASF GitHub Bot commented on AIRFLOW-2310:
-

suma-ps commented on issue #3504: [AIRFLOW-2310]: Add AWS Glue Job 
Compatibility to Airflow
URL: 
https://github.com/apache/incubator-airflow/pull/3504#issuecomment-409303864
 
 
   @OElesin  Do you plan to resolve the merge issues soon? Looking forward to 
using the Glue operator soon, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enable AWS Glue Job Integration
> ---
>
> Key: AIRFLOW-2310
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2310
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Olalekan Elesin
>Assignee: Olalekan Elesin
>Priority: Major
>  Labels: AWS
>
> Would it be possible to integrate AWS Glue into Airflow, such that Glue jobs 
> and ETL pipelines can be orchestrated with Airflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564330#comment-16564330
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

kaxil closed pull request #3568: AIRFLOW-1104 Update jobs.py so Airflow does 
not over schedule tasks
URL: https://github.com/apache/incubator-airflow/pull/3568
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index 224ff185fb..a4252473cd 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1075,9 +1075,6 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
 :type states: Tuple[State]
 :return: List[TaskInstance]
 """
-# TODO(saguziel): Change this to include QUEUED, for concurrency
-# purposes we may want to count queued tasks
-states_to_count_as_running = [State.RUNNING]
 executable_tis = []
 
 # Get all the queued task instances from associated with scheduled
@@ -1123,6 +1120,7 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
 for task_instance in task_instances_to_examine:
 pool_to_task_instances[task_instance.pool].append(task_instance)
 
+states_to_count_as_running = [State.RUNNING, State.QUEUED]
 task_concurrency_map = self.__get_task_concurrency_map(
 states=states_to_count_as_running, session=session)
 
@@ -1173,7 +1171,6 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
 simple_dag = simple_dag_bag.get_dag(dag_id)
 
 if dag_id not in dag_id_to_possibly_running_task_count:
-# TODO(saguziel): also check against QUEUED state, see 
AIRFLOW-1104
 dag_id_to_possibly_running_task_count[dag_id] = \
 DAG.get_num_task_instances(
 dag_id,
diff --git a/tests/jobs.py b/tests/jobs.py
index 93f6574df4..c701214f1e 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -1493,6 +1493,39 @@ def 
test_find_executable_task_instances_concurrency(self):
 
 self.assertEqual(0, len(res))
 
+def test_find_executable_task_instances_concurrency_queued(self):
+dag_id = 
'SchedulerJobTest.test_find_executable_task_instances_concurrency_queued'
+dag = DAG(dag_id=dag_id, start_date=DEFAULT_DATE, concurrency=3)
+task1 = DummyOperator(dag=dag, task_id='dummy1')
+task2 = DummyOperator(dag=dag, task_id='dummy2')
+task3 = DummyOperator(dag=dag, task_id='dummy3')
+dagbag = self._make_simple_dag_bag([dag])
+
+scheduler = SchedulerJob()
+session = settings.Session()
+dag_run = scheduler.create_dag_run(dag)
+
+ti1 = TI(task1, dag_run.execution_date)
+ti2 = TI(task2, dag_run.execution_date)
+ti3 = TI(task3, dag_run.execution_date)
+ti1.state = State.RUNNING
+ti2.state = State.QUEUED
+ti3.state = State.SCHEDULED
+
+session.merge(ti1)
+session.merge(ti2)
+session.merge(ti3)
+
+session.commit()
+
+res = scheduler._find_executable_task_instances(
+dagbag,
+states=[State.SCHEDULED],
+session=session)
+
+self.assertEqual(1, len(res))
+self.assertEqual(res[0].key, ti3.key)
+
 def test_find_executable_task_instances_task_concurrency(self):
 dag_id = 
'SchedulerJobTest.test_find_executable_task_instances_task_concurrency'
 task_id_1 = 'dummy'


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly 

[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564359#comment-16564359
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

codecov-io edited a comment on issue #3568: AIRFLOW-1104 Update jobs.py so 
Airflow does not over schedule tasks
URL: 
https://github.com/apache/incubator-airflow/pull/3568#issuecomment-401878707
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=h1)
 Report
   > Merging 
[#3568](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/3b35d360f6ff8694b6fb4387901c182ca39160b5?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3568/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3568  +/-   ##
   ==
   + Coverage   77.51%   77.51%   +<.01% 
   ==
 Files 205  205  
 Lines   1575115751  
   ==
   + Hits1220912210   +1 
   + Misses   3542 3541   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3568/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `82.74% <100%> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3568/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.58% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=footer).
 Last update 
[3b35d36...b04c9b1](https://codecov.io/gh/apache/incubator-airflow/pull/3568?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564370#comment-16564370
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

ashb commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all 
ESLint issues
URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206684313
 
 

 ##
 File path: airflow/www_rbac/templates/airflow/circles.html
 ##
 @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
 
 
 

[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564387#comment-16564387
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

tedmiston commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix 
all ESLint issues
URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206688518
 
 

 ##
 File path: airflow/www_rbac/templates/airflow/circles.html
 ##
 @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
 
 
 

[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564395#comment-16564395
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

codecov-io commented on issue #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: 
https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409376218
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=h1)
 Report
   > Merging 
[#3670](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3670/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3670   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 205  205   
 Lines   1575115751   
   ===
 Hits1221012210   
 Misses   3541 3541
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=footer).
 Last update 
[ed97204...eef6fc8](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564396#comment-16564396
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

codecov-io edited a comment on issue #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: 
https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409376218
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=h1)
 Report
   > Merging 
[#3670](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3670/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3670   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 205  205   
 Lines   1575115751   
   ===
 Hits1221012210   
 Misses   3541 3541
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=footer).
 Last update 
[ed97204...eef6fc8](https://codecov.io/gh/apache/incubator-airflow/pull/3670?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2658) Add GKE specific Kubernetes Pod Operator

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564402#comment-16564402
 ] 

ASF GitHub Bot commented on AIRFLOW-2658:
-

fenglu-g commented on issue #3532: [AIRFLOW-2658] Add GCP specific k8s pod 
operator
URL: 
https://github.com/apache/incubator-airflow/pull/3532#issuecomment-409378846
 
 
   @Noremac201 please fix travis-ci, thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add GKE specific Kubernetes Pod Operator
> 
>
> Key: AIRFLOW-2658
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2658
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>
> Currently there is a Kubernetes Pod operator, but it is not really easy to 
> have it work with GCP Kubernetes Engine, it would be nice to have one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564327#comment-16564327
 ] 

ASF GitHub Bot commented on AIRFLOW-1104:
-

dan-sf commented on issue #3568: AIRFLOW-1104 Update jobs.py so Airflow does 
not over schedule tasks
URL: 
https://github.com/apache/incubator-airflow/pull/3568#issuecomment-409355510
 
 
   Sure, the changes have been rebased on master


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency check in scheduler should count queued tasks as well as running
> ---
>
> Key: AIRFLOW-1104
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: see https://github.com/apache/incubator-airflow/pull/2221
> "Tasks with the QUEUED state should also be counted below, but for now we 
> cannot count them. This is because there is no guarantee that queued tasks in 
> failed dagruns will or will not eventually run and queued tasks that will 
> never run will consume slots and can stall a DAG. Once we can guarantee that 
> all queued tasks in failed dagruns will never run (e.g. make sure that all 
> running/newly queued TIs have running dagruns), then we can include QUEUED 
> tasks here, with the constraint that they are in running dagruns."
>Reporter: Alex Guziel
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564342#comment-16564342
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

tedmiston commented on issue #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: 
https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409358478
 
 
   This PR is now squashed and ready for review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564354#comment-16564354
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

tedmiston edited a comment on issue #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: 
https://github.com/apache/incubator-airflow/pull/3670#issuecomment-409358478
 
 
   This PR is now squashed and ready for review.
   
   I'm not sure that there's any one best person to review these changes but in 
a git log, I see that @bolkedebruin, @Fokko, and @r39132 have modified some of 
these files in recent history.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564341#comment-16564341
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

tedmiston opened a new pull request #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: https://github.com/apache/incubator-airflow/pull/3670
 
 
   Make sure you have checked _all_ steps below.
   
   ### JIRA
   - [x] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
   - https://issues.apache.org/jira/browse/AIRFLOW-2832
   - In case you are fixing a typo in the documentation you can prepend 
your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue.
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   - Inspired by other recent issues related to linter errors in Python and JS 
(AIRFLOW-2783, AIRFLOW-2800, AIRFLOW-2803)
   - This PR does a few things:
 - Resolves linter errors in markdown files across the project (ignores 
errors that aren't super useful on GitHub such as line wrapping and putting 
`` in brackets)
 - Clarifies that commit message length of 50 characters doesn't include 
the Jira issue tag
 - Replaces usage of JIRA with Jira the way it's styled nowadays by 
[Atlassian](https://www.atlassian.com/software/jira) and 
[Wikipedia](https://en.wikipedia.org/wiki/Jira_(software))
 - Makes code block formatting consistent
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   The changes in this PR are restricted to linting documentation.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Documentation
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
   - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   n/a
   
   ### Code Quality
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564400#comment-16564400
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

XD-DENG commented on issue #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: 
https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409378082
 
 
   Hi @kaxil , please be reminded to update the comment in 
   https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L592 
as well, otherwise the comment will be inconsistent with the configuration 
value again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564422#comment-16564422
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206700100
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   Hi Fokko, 
   Thank you so much for your review. I really appreciate your feedback. I 
didn't figure out how to reply to your request, so I'll just reply to you here. 
The main reason why I separate it to operator and sensor is that the success of 
the training job have two stages: successfully kick off a training job, and the 
training job successfully finishes. The operator tells about the first status, 
and the sensor tells the latter one. Also, since a training job is hosted at an 
AWS instance, not the instance that is hosting Airflow, so this way, other 
operators can set upstream to the operator, rather than the sensor, if they 
aren't dependent on the model actually being created. Also, by using the 
sensor, users can set parameters like poke_interval, which makes more sense for 
a sensor rather than an operator.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2835) Remove python-selinux

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564864#comment-16564864
 ] 

ASF GitHub Bot commented on AIRFLOW-2835:
-

Fokko opened a new pull request #3673: [AIRFLOW-2835] Remove python-selinux
URL: https://github.com/apache/incubator-airflow/pull/3673
 
 
   This package is not used and it sometimes breaks the CI because it is not 
available. Therefore it makes sense to just remove it :-)
   
   Example failed builds on the master branch:
   https://travis-ci.org/apache/incubator-airflow/jobs/410483664
   https://travis-ci.org/apache/incubator-airflow/jobs/410483665
   https://travis-ci.org/apache/incubator-airflow/jobs/410484305
   
   Make sure you have checked _all_ steps below.
   
   ### JIRA
   - [x] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-2835\] My Airflow PR"
   - https://issues.apache.org/jira/browse/AIRFLOW-XXX
   - In case you are fixing a typo in the documentation you can prepend 
your commit with \[AIRFLOW-2835\], code changes always need a JIRA issue.
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Documentation
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
   - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   
   ### Code Quality
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove python-selinux
> -
>
> Key: AIRFLOW-2835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2835
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> This package sometimes crashes the CI and is not required. Therefore it does 
> not make sense to install it since it will take ci-time and make things 
> brittle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564908#comment-16564908
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

Fokko closed pull request #3670: [AIRFLOW-2832] Lint and resolve 
inconsistencies in Markdown files
URL: https://github.com/apache/incubator-airflow/pull/3670
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 6000d0e5ff..90452d954b 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,33 +1,34 @@
 Make sure you have checked _all_ steps below.
 
-### JIRA
-- [ ] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
-- https://issues.apache.org/jira/browse/AIRFLOW-XXX
-- In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue.
+### Jira
 
+- [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
+  - https://issues.apache.org/jira/browse/AIRFLOW-XXX
+  - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 
 ### Description
-- [ ] Here are some details about my PR, including screenshots of any UI 
changes:
 
+- [ ] Here are some details about my PR, including screenshots of any UI 
changes:
 
 ### Tests
-- [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
 
+- [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
 
 ### Commits
-- [ ] My commits all reference JIRA issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
-1. Subject is separated from body by a blank line
-2. Subject is limited to 50 characters
-3. Subject does not end with a period
-4. Subject uses the imperative mood ("add", not "adding")
-5. Body wraps at 72 characters
-6. Body explains "what" and "why", not "how"
 
+- [ ] My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
+  1. Subject is separated from body by a blank line
+  1. Subject is limited to 50 characters (not including Jira issue reference)
+  1. Subject does not end with a period
+  1. Subject uses the imperative mood ("add", not "adding")
+  1. Body wraps at 72 characters
+  1. Body explains "what" and "why", not "how"
 
 ### Documentation
-- [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
-- When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 
+- [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
+  - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 
 ### Code Quality
+
 - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 47a1a80549..2cf8e0218e 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -3,22 +3,21 @@
 Contributions are welcome and are greatly appreciated! Every
 little bit helps, and credit will always be given.
 
-
-# Table of Contents
-  * [TOC](#table-of-contents)
-  * [Types of Contributions](#types-of-contributions)
-  - [Report Bugs](#report-bugs)
-  - [Fix Bugs](#fix-bugs)
-  - [Implement Features](#implement-features)
-  - [Improve Documentation](#improve-documentation)
-  - [Submit Feedback](#submit-feedback)
-  * [Documentation](#documentation)
-  * [Development and Testing](#development-and-testing)
-  - [Setting up a development 
environment](#setting-up-a-development-environment)
-  - [Pull requests guidelines](#pull-request-guidelines)
-  - [Testing Locally](#testing-locally)
-  * [Changing the Metadata Database](#changing-the-metadata-database)
-
+## Table of Contents
+
+- [TOC](#table-of-contents)
+- [Types of Contributions](#types-of-contributions)
+  - [Report Bugs](#report-bugs)
+  - [Fix Bugs](#fix-bugs)
+  - [Implement 

[jira] [Commented] (AIRFLOW-2832) Inconsistencies and linter errors across markdown files

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564907#comment-16564907
 ] 

ASF GitHub Bot commented on AIRFLOW-2832:
-

Fokko commented on a change in pull request #3670: [AIRFLOW-2832] Lint and 
resolve inconsistencies in Markdown files
URL: https://github.com/apache/incubator-airflow/pull/3670#discussion_r206783822
 
 

 ##
 File path: dev/README.md
 ##
 @@ -72,25 +76,33 @@ origin https://github.com//airflow (push)
 ```
 
  JIRA
+
 Users should set environment variables `JIRA_USERNAME` and `JIRA_PASSWORD` 
corresponding to their ASF JIRA login. This will allow the tool to 
automatically close issues. If they are not set, the user will be prompted 
every time.
 
  GitHub OAuth Token
+
 Unauthenticated users can only make 60 requests/hour to the Github API. If you 
get an error about exceeding the rate, you will need to set a 
`GITHUB_OAUTH_KEY` environment variable that contains a token value. Users can 
generate tokens from their GitHub profile.
 
 ## Airflow release signing tool
+
 The release signing tool can be used to create the SHA512/MD5 and ASC files 
that required for Apache releases.
 
 ### Execution
-To create a release tar ball execute following command from Airflow's root. 
 
-`python setup.py compile_assets sdist --formats=gztar`
+To create a release tarball execute following command from Airflow's root.
 
-*Note: `compile_assets` command build the frontend assets (JS and CSS) files 
for the 
+```bash
+python setup.py compile_assets sdist --formats=gztar
+```
+
+*Note: `compile_assets` command build the frontend assets (JS and CSS) files 
for the
 Web UI using webpack and npm. Please make sure you have `npm` installed on 
your local machine globally.
 Details on how to install `npm` can be found in CONTRIBUTING.md file.*
 
 After that navigate to relative directory i.e., `cd dist` and sign the release 
files.
 
-`../dev/sign.sh  Inconsistencies and linter errors across markdown files
> ---
>
> Key: AIRFLOW-2832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2832
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> There are a number of inconsistencies within and across markdown files in the 
> Airflow project.  Most of these are simple formatting issues easily fixed by 
> linting (e.g., with mdl).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564941#comment-16564941
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

verdan commented on a change in pull request #3656: [WIP][AIRFLOW-2803] Fix all 
ESLint issues
URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206791865
 
 

 ##
 File path: airflow/www_rbac/templates/airflow/circles.html
 ##
 @@ -28,117 +28,111 @@ Airflow 404 = lots of circles
 
 
 

[jira] [Commented] (AIRFLOW-2835) Remove python-selinux

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564947#comment-16564947
 ] 

ASF GitHub Bot commented on AIRFLOW-2835:
-

bolkedebruin closed pull request #3673: [AIRFLOW-2835] Remove python-selinux
URL: https://github.com/apache/incubator-airflow/pull/3673
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 81e43fb4b8..4e490c74e1 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -40,7 +40,6 @@ addons:
   - krb5-kdc
   - krb5-admin-server
   - oracle-java8-installer
-  - python-selinux
   postgresql: "9.2"
 python:
   - "2.7"


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove python-selinux
> -
>
> Key: AIRFLOW-2835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2835
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> This package sometimes crashes the CI and is not required. Therefore it does 
> not make sense to install it since it will take ci-time and make things 
> brittle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2756) Marking DAG run does not set start_time and end_time correctly

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564958#comment-16564958
 ] 

ASF GitHub Bot commented on AIRFLOW-2756:
-

kaxil closed pull request #3606: [AIRFLOW-2756] Fix bug in set DAG run state 
workflow
URL: https://github.com/apache/incubator-airflow/pull/3606
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/api/common/experimental/mark_tasks.py 
b/airflow/api/common/experimental/mark_tasks.py
index 681864dfbe..88c5275f5a 100644
--- a/airflow/api/common/experimental/mark_tasks.py
+++ b/airflow/api/common/experimental/mark_tasks.py
@@ -206,7 +206,10 @@ def _set_dag_run_state(dag_id, execution_date, state, 
session=None):
 DR.execution_date == execution_date
 ).one()
 dr.state = state
-dr.end_date = timezone.utcnow()
+if state == State.RUNNING:
+dr.start_date = timezone.utcnow()
+else:
+dr.end_date = timezone.utcnow()
 session.commit()
 
 
diff --git a/airflow/jobs.py b/airflow/jobs.py
index 00ede5451d..70891ab4c3 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1023,8 +1023,7 @@ def _change_state_for_tis_without_dagrun(self,
 models.TaskInstance.dag_id == subq.c.dag_id,
 models.TaskInstance.task_id == subq.c.task_id,
 models.TaskInstance.execution_date ==
-subq.c.execution_date,
-models.TaskInstance.task_id == subq.c.task_id)) \
+subq.c.execution_date)) \
 .update({models.TaskInstance.state: new_state},
 synchronize_session=False)
 session.commit()
diff --git a/airflow/www/views.py b/airflow/www/views.py
index d37c0db45d..1ee5a2df86 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -2741,7 +2741,8 @@ def after_model_change(self, form, dagrun, is_created, 
session=None):
 altered_tis = set_dag_run_state_to_success(
 dagbag.get_dag(dagrun.dag_id),
 dagrun.execution_date,
-commit=True)
+commit=True,
+session=session)
 elif dagrun.state == State.FAILED:
 altered_tis = set_dag_run_state_to_failed(
 dagbag.get_dag(dagrun.dag_id),
diff --git a/tests/api/common/experimental/mark_tasks.py 
b/tests/api/common/experimental/mark_tasks.py
index 181d10d8a1..9bba91bee0 100644
--- a/tests/api/common/experimental/mark_tasks.py
+++ b/tests/api/common/experimental/mark_tasks.py
@@ -267,11 +267,25 @@ def _create_test_dag_run(self, state, date):
 def _verify_dag_run_state(self, dag, date, state):
 drs = models.DagRun.find(dag_id=dag.dag_id, execution_date=date)
 dr = drs[0]
+
 self.assertEqual(dr.get_state(), state)
 
+def _verify_dag_run_dates(self, dag, date, state, middle_time):
+# When target state is RUNNING, we should set start_date,
+# otherwise we should set end_date.
+drs = models.DagRun.find(dag_id=dag.dag_id, execution_date=date)
+dr = drs[0]
+if state == State.RUNNING:
+self.assertGreater(dr.start_date, middle_time)
+self.assertIsNone(dr.end_date)
+else:
+self.assertLess(dr.start_date, middle_time)
+self.assertGreater(dr.end_date, middle_time)
+
 def test_set_running_dag_run_to_success(self):
 date = self.execution_dates[0]
 dr = self._create_test_dag_run(State.RUNNING, date)
+middle_time = timezone.utcnow()
 self._set_default_task_instance_states(dr)
 
 altered = set_dag_run_state_to_success(self.dag1, date, commit=True)
@@ -280,10 +294,12 @@ def test_set_running_dag_run_to_success(self):
 self.assertEqual(len(altered), 5)
 self._verify_dag_run_state(self.dag1, date, State.SUCCESS)
 self._verify_task_instance_states(self.dag1, date, State.SUCCESS)
+self._verify_dag_run_dates(self.dag1, date, State.SUCCESS, middle_time)
 
 def test_set_running_dag_run_to_failed(self):
 date = self.execution_dates[0]
 dr = self._create_test_dag_run(State.RUNNING, date)
+middle_time = timezone.utcnow()
 self._set_default_task_instance_states(dr)
 
 altered = set_dag_run_state_to_failed(self.dag1, date, commit=True)
@@ -292,10 +308,12 @@ def test_set_running_dag_run_to_failed(self):
 self.assertEqual(len(altered), 1)
 self._verify_dag_run_state(self.dag1, date, State.FAILED)
 self.assertEqual(dr.get_task_instance('run_after_loop').state, 
State.FAILED)
+self._verify_dag_run_dates(self.dag1, date, State.FAILED, middle_time)
 
  

[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564918#comment-16564918
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206786344
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   Hi Keliang, thanks for explaining the Sagemaker process. I think it is very 
similar to for example the Druid hook that we have: 
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/druid_hook.py#L93
   
   This hook will kick of a job using a HTTP POST of a json document to the 
druid cluster, and make sure that it receives a http 200. And then it will 
continue to poll the job by invoking the API periodically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2835) Remove python-selinux

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564919#comment-16564919
 ] 

ASF GitHub Bot commented on AIRFLOW-2835:
-

codecov-io commented on issue #3673: [AIRFLOW-2835] Remove python-selinux
URL: 
https://github.com/apache/incubator-airflow/pull/3673#issuecomment-409485914
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=h1)
 Report
   > Merging 
[#3673](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3673/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3673   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 205  205   
 Lines   1575115751   
   ===
 Hits1221012210   
 Misses   3541 3541
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=footer).
 Last update 
[ed97204...ed2a781](https://codecov.io/gh/apache/incubator-airflow/pull/3673?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove python-selinux
> -
>
> Key: AIRFLOW-2835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2835
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> This package sometimes crashes the CI and is not required. Therefore it does 
> not make sense to install it since it will take ci-time and make things 
> brittle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564938#comment-16564938
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409491349
 
 
   @tedmiston please tag me once it is ready for the next review. I see you're 
still working on this PR. i.e., Jinja template tags, indentation and some 
commented out code. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564514#comment-16564514
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

codecov-io edited a comment on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker 
Training
URL: 
https://github.com/apache/incubator-airflow/pull/3658#issuecomment-408564225
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=h1)
 Report
   > Merging 
[#3658](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/096ba9ecd961cdaebd062599f408571ffb21165a?src=pr=desc)
 will **increase** coverage by `0.4%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3658/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master#3658 +/-   ##
   =
   + Coverage   77.11%   77.51%   +0.4% 
   =
 Files 206  205  -1 
 Lines   1577215751 -21 
   =
   + Hits1216212210 +48 
   + Misses   3610 3541 -69
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <0%> (-0.99%)` | :arrow_down: |
   | 
[airflow/www/validators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmFsaWRhdG9ycy5weQ==)
 | `100% <0%> (ø)` | :arrow_up: |
   | 
[airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==)
 | `80.43% <0%> (ø)` | :arrow_up: |
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `92.59% <0%> (ø)` | :arrow_up: |
   | 
[airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=)
 | | |
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `82.74% <0%> (+0.26%)` | :arrow_up: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `89.87% <0%> (+0.42%)` | :arrow_up: |
   | 
[airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==)
 | `100% <0%> (+100%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=footer).
 Last update 
[096ba9e...3f1e4b1](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564478#comment-16564478
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711354
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
+   :param use_db_config: Whether or not to use db config
+   associated with sagemaker_conn_id.
 
 Review comment:
   Added


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564480#comment-16564480
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

codecov-io commented on issue #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: 
https://github.com/apache/incubator-airflow/pull/3669#issuecomment-409396427
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=h1)
 Report
   > Merging 
[#3669](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ed972042a864cd010137190e0bbb1d25a9dcfe83?src=pr=desc)
 will **increase** coverage by `0.27%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3669/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3669  +/-   ##
   ==
   + Coverage   77.51%   77.79%   +0.27% 
   ==
 Files 205  205  
 Lines   1575116079 +328 
   ==
   + Hits1221012508 +298 
   - Misses   3541 3571  +30
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3669/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `84.63% <ø> (+1.88%)` | :arrow_up: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3669/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `89.45% <0%> (-0.43%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=footer).
 Last update 
[ed97204...1ee1fc4](https://codecov.io/gh/apache/incubator-airflow/pull/3669?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564482#comment-16564482
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711545
 
 

 ##
 File path: tests/contrib/hooks/test_sagemaker_hook.py
 ##
 @@ -0,0 +1,341 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+
+import json
+import unittest
+import copy
+try:
+from unittest import mock
+except ImportError:
+try:
+import mock
+except ImportError:
+mock = None
+
+from airflow import configuration
+from airflow import models
+from airflow.utils import db
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.hooks.S3_hook import S3Hook
+from airflow.exceptions import AirflowException
+
+
+role = 'test-role'
+
+bucket = 'test-bucket'
+
+key = 'test/data'
+data_url = 's3://{}/{}'.format(bucket, key)
+
+job_name = 'test-job-name'
+
+image = 'test-image'
+
+test_arn_return = {'TrainingJobArn': 'testarn'}
+
+test_list_training_job_return = {
+'TrainingJobSummaries': [
+{
+'TrainingJobName': job_name,
+'TrainingJobStatus': 'InProgress'
+},
+],
+'NextToken': 'test-token'
+}
+
+test_list_tuning_job_return = {
+'TrainingJobSummaries': [
+{
+'TrainingJobName': job_name,
+'TrainingJobArn': 'testarn',
+'TunedHyperParameters': {
+'k': '3'
+},
+'TrainingJobStatus': 'InProgress'
+},
+],
+'NextToken': 'test-token'
+}
+
+output_url = 's3://{}/test/output'.format(bucket)
+create_training_params = \
+{
+'AlgorithmSpecification': {
+'TrainingImage': image,
+'TrainingInputMode': 'File'
+},
+'RoleArn': role,
+'OutputDataConfig': {
+'S3OutputPath': output_url
+},
+'ResourceConfig': {
+'InstanceCount': 2,
+'InstanceType': 'ml.c4.8xlarge',
+'VolumeSizeInGB': 50
+},
+'TrainingJobName': job_name,
+'HyperParameters': {
+'k': '10',
+'feature_dim': '784',
+'mini_batch_size': '500',
+'force_dense': 'True'
+},
+'StoppingCondition': {
+'MaxRuntimeInSeconds': 60 * 60
+},
+'InputDataConfig': [
+{
+'ChannelName': 'train',
+'DataSource': {
+'S3DataSource': {
+'S3DataType': 'S3Prefix',
+'S3Uri': data_url,
+'S3DataDistributionType': 'FullyReplicated'
+}
+},
+'CompressionType': 'None',
+'RecordWrapperType': 'None'
+}
+]
+}
+
+create_tuning_params = {'HyperParameterTuningJobName': job_name,
+'HyperParameterTuningJobConfig': {
+'Strategy': 'Bayesian',
+'HyperParameterTuningJobObjective': {
+'Type': 'Maximize',
+'MetricName': 'test_metric'
+},
+'ResourceLimits': {
+'MaxNumberOfTrainingJobs': 123,
+'MaxParallelTrainingJobs': 123
+},
+'ParameterRanges': {
+'IntegerParameterRanges': [
+{
+'Name': 'k',
+'MinValue': '2',
+'MaxValue': '10'
+},
+]
+}
+},
+'TrainingJobDefinition': {
+

[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564481#comment-16564481
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206711515
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
+   :param use_db_config: Whether or not to use db config
+   associated with sagemaker_conn_id.
+   If set to true, will automatically update the training config
+   with what's in db, so the db config doesn't need to
+   included everything, but what's there does replace the ones
+   in the training_job_config, so be careful
+   :type use_db_config:
+   :param aws_conn_id: The AWS connection ID to use.
+   :type aws_conn_id: string
+
+   **Example**:
+   The following operator would start a training job when executed
+
+sagemaker_training =
+   SageMakerCreateTrainingJobOperator(
+   task_id='sagemaker_training',
+   training_job_config=config,
+   use_db_config=True,
+   region_name='us-west-2'
+   sagemaker_conn_id='sagemaker_customers_conn',
+   aws_conn_id='aws_customers_conn'
+   )
+   """
+
+template_fields = ['training_job_config']
+template_ext = ()
+ui_color = '#ededed'
+
+@apply_defaults
+def __init__(self,
+ sagemaker_conn_id=None,
 
 Review comment:
   Changed the order


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564491#comment-16564491
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

XD-DENG commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config
URL: 
https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409398992
 
 
   Hi all, thanks for the inputs. Agree with you on the desired value as well 
(the objective of this PR was to fix inconsistency between `.cfg` and comment 
in `jobs.py`, instead of proposing another value for this configuration item).
   
   Hi @kaxil , regarding `dag_dir_list_interval`, personally I think it should 
be reduced. 5 minutes is quite long for users to wait until new DAG file is 
reflected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565043#comment-16565043
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

ashb commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL 
dependency
URL: 
https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409513668
 
 
   If not I think vendoring python-nvd3 and slugify to use the non-GPL is 
probably the way to go.
   
   (Or perhaps replacing python-nvd3 entirely. That's a bigger job though. 
https://medium.com/@Elijah_Meeks/introducing-semiotic-for-data-visualization-88dc3c6b6926
 looks interesting ,but uses React (which is fine from a licensing PoV now.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Force explicit choice on GPL dependency
> ---
>
> Key: AIRFLOW-2817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
>
> A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565046#comment-16565046
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

ashb edited a comment on issue #3660: [AIRFLOW-2817] Force explicit choice on 
GPL dependency
URL: 
https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409513668
 
 
   If not I think vendoring python-nvd3 and slugify to use the non-GPL is 
probably the way to go.
   
   (Or perhaps replacing python-nvd3 entirely. That's a bigger job though. 
https://medium.com/@Elijah_Meeks/introducing-semiotic-for-data-visualization-88dc3c6b6926
 looks interesting ,but uses React (which is fine from a licensing PoV now.) 
Edit: If we did use this I wouldn't suggest React-ifying the whole app, just 
the chart part of the page itself. If that's possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Force explicit choice on GPL dependency
> ---
>
> Key: AIRFLOW-2817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
>
> A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2655) Default Kubernetes worker configurations are inconsistent

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565049#comment-16565049
 ] 

ASF GitHub Bot commented on AIRFLOW-2655:
-

johnchenghk01 commented on issue #3529: [AIRFLOW-2655] Fix inconsistency of 
default config of kubernetes worker
URL: 
https://github.com/apache/incubator-airflow/pull/3529#issuecomment-409515471
 
 
   It will expose the DB password when doing a kubectl describe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Kubernetes worker configurations are inconsistent
> -
>
> Key: AIRFLOW-2655
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2655
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.10.0
>Reporter: Shintaro Murakami
>Priority: Minor
> Fix For: 2.0.0
>
>
> if optional config `airflow_configmap` is not set, the worker configured with 
> `LocalExecutor` and sql_alchemy_conn starts with `sqlite`.
> This combination is not allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565036#comment-16565036
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

bolkedebruin closed pull request #3660: [AIRFLOW-2817] Force explicit choice on 
GPL dependency
URL: https://github.com/apache/incubator-airflow/pull/3660
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 81e43fb4b8..e078d7c9ae 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -47,6 +47,7 @@ python:
   - "3.5"
 env:
   global:
+- SLUGIFY_USES_TEXT_UNIDECODE=yes
 - TRAVIS_CACHE=$HOME/.travis_cache/
 - KRB5_CONFIG=/etc/krb5.conf
 - KRB5_KTNAME=/etc/airflow.keytab
diff --git a/INSTALL b/INSTALL
index 5c8f03eb66..596ce25814 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,13 +1,30 @@
-# INSTALL / BUILD instruction for Apache Airflow (incubating)
-# fetch the tarball and untar the source
+# INSTALL / BUILD instructions for Apache Airflow (incubating)
+
+# [required] fetch the tarball and untar the source
+# change into the directory that was untarred.
 
 # [optional] run Apache RAT (release audit tool) to validate license headers
-# RAT docs here: https://creadur.apache.org/rat/
+# RAT docs here: https://creadur.apache.org/rat/. Requires Java and Apache Rat
 java -jar apache-rat.jar -E ./.rat-excludes -d .
 
-# [optional] by default one of Apache Airflow's dependencies pulls in a GPL
-# library. If this is a concern issue (also every upgrade):
-# export SLUGIFY_USES_TEXT_UNIDECODE=yes
+# [optional] Airflow pulls in quite a lot of dependencies in order
+# to connect to other services. You might want to test or run Airflow
+# from a virtual env to make sure those dependencies are separated
+# from your system wide versions
+python -m my_env
+source my_env/bin/activate
+
+# [required] by default one of Apache Airflow's dependencies pulls in a GPL
+# library. Airflow will not install (and upgrade) without an explicit choice.
+#
+# To make sure not to install the GPL dependency:
+#   export SLUGIFY_USES_TEXT_UNIDECODE=yes
+# In case you do not mind:
+#   export GPL_UNIDECODE=yes
+
+# [required] building and installing
+# by pip (preferred)
+pip install .
 
-# install the release
+# or directly
 python setup.py install
diff --git a/UPDATING.md b/UPDATING.md
index da80f56fcb..ef29e1d3a4 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -5,6 +5,12 @@ assists users migrating to a new version.
 
 ## Airflow Master
 
+## Airflow 1.10
+
+Installation and upgrading requires setting `SLUGIFY_USES_TEXT_UNIDECODE=yes` 
in your environment or
+`AIRFLOW_GPL_UNIDECODE=yes`. In case of the latter a GPL runtime dependency 
will be installed due to a
+dependency (python-nvd3 -> python-slugify -> unidecode).
+
 ### Replace DataProcHook.await calls to DataProcHook.wait
 
 The method name was changed to be compatible with the Python 3.7 async/await 
keywords
diff --git a/scripts/ci/kubernetes/docker/Dockerfile 
b/scripts/ci/kubernetes/docker/Dockerfile
index 498c47b21a..93b20dbcd2 100644
--- a/scripts/ci/kubernetes/docker/Dockerfile
+++ b/scripts/ci/kubernetes/docker/Dockerfile
@@ -17,6 +17,8 @@
 
 FROM ubuntu:16.04
 
+ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
+
 # install deps
 RUN apt-get update -y && apt-get install -y \
 wget \
@@ -33,7 +35,6 @@ RUN apt-get update -y && apt-get install -y \
 unzip \
 && apt-get clean
 
-
 RUN pip install --upgrade pip
 
 # Since we install vanilla Airflow, we also want to have support for Postgres 
and Kubernetes
diff --git a/setup.py b/setup.py
index 50af30944e..e69572c51d 100644
--- a/setup.py
+++ b/setup.py
@@ -35,6 +35,17 @@
 PY3 = sys.version_info[0] == 3
 
 
+# See LEGAL-362
+def verify_gpl_dependency():
+if (not os.getenv("AIRFLOW_GPL_UNIDECODE")
+and not os.getenv("SLUGIFY_USES_TEXT_UNIDECODE") == "yes"):
+raise RuntimeError("By default one of Airflow's dependencies installs 
a GPL "
+   "dependency (unidecode). To avoid this dependency 
set "
+   "SLUGIFY_USES_TEXT_UNIDECODE=yes in your 
environment when you "
+   "install or upgrade Airflow. To force installing 
the GPL "
+   "version set AIRFLOW_GPL_UNIDECODE")
+
+
 class Tox(TestCommand):
 user_options = [('tox-args=', None, "Arguments to pass to tox")]
 
@@ -258,6 +269,7 @@ def write_version(filename=os.path.join(*['airflow',
 
 
 def do_setup():
+verify_gpl_dependency()
 write_version()
 setup(
 name='apache-airflow',
@@ -376,6 +388,7 @@ def do_setup():
 'License :: OSI Approved :: Apache Software License',
 'Programming Language :: Python :: 

[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565038#comment-16565038
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

bolkedebruin commented on issue #3660: [AIRFLOW-2817] Force explicit choice on 
GPL dependency
URL: 
https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409512201
 
 
   Will see if we can address the issue with upstream


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Force explicit choice on GPL dependency
> ---
>
> Key: AIRFLOW-2817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
>
> A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565082#comment-16565082
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

verdan commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL 
dependency
URL: 
https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409522600
 
 
   @ashb I believe we can remove the python-nvd3 entirely and use the custom 
javascript to render the charts using d3 and nvd3 JS libraries, just the way we 
are using Graph View on DAG detail page i.e., sending all the data from python 
and implement charts on the frontend in templates. 
   But as you said, it will take some time to implement on the frontend, and 
won't be ready for the release 1.10.
   
   P.S: Yes, it is possible to make a part of the application/page use the 
React. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Force explicit choice on GPL dependency
> ---
>
> Key: AIRFLOW-2817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
>
> A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565025#comment-16565025
 ] 

ASF GitHub Bot commented on AIRFLOW-2836:
-

XD-DENG opened a new pull request #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: https://github.com/apache/incubator-airflow/pull/3674
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2836
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
    Background
   
   The default `fs_conn_id` in `contrib.sensors.FileSensor` is 'fs_default2'. 
However, when we initiate the database 
(https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
 there isn't such an entry. It doesn't exist anywhere else.
   
    Issue
   
   The purpose of `contrib.sensors.FileSensor` is mainly for checking local 
file system (of course can also be used for NAS). Then the path ("/") from 
default connection 'fs_default' would suffice.
   
   However, given the default value for fs_conn_id in 
contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
make the situation much more complex. 
   
   When users intend to check local file system only, they should be able to 
leave fs_conn_id default directly, instead of going setting up another 
connection separately.
   
    Proposal
   
   Change default value for `fs_conn_id` in `contrib.sensors.FileSensor` from 
"fs_default2" to "fs_default" (actually in the related test, the `fs_conn_id` 
are all specified to be "fs_default").
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Minor improvement of contrib.sensors.FileSensor
> ---
>
> Key: AIRFLOW-2836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2836
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h4. *Background*
> The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. 
> However, when we initiate the database 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
>  there isn't such an entry. It doesn't exist anywhere else.
> h4. *Issue*
> The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file 
> system (of course can also be used for NAS). Then the path ("/") from default 
> connection 'fs_default' would suffice.
> However, given the default value for *fs_conn_id* in 
> contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
> make the situation much more complex. 
> When users intend to check local file system only, they should be able to 
> leave *fs_conn_id* default directly, instead of going setting up another 
> connection separately.
> h4. Proposal
> Change default value for *fs_conn_id* in contrib.sensors.FileSensor from 
> "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* 
> are all specified to be "fs_default").



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565070#comment-16565070
 ] 

ASF GitHub Bot commented on AIRFLOW-2817:
-

ashb commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL 
dependency
URL: 
https://github.com/apache/incubator-airflow/pull/3660#issuecomment-409521040
 
 
   Something about the logic isn't right - everything on Travis is failing on 
the env check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Force explicit choice on GPL dependency
> ---
>
> Key: AIRFLOW-2817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
>
> A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2814) Default Arg "file_process_interval" for class SchedulerJob is inconsistent with doc

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563309#comment-16563309
 ] 

ASF GitHub Bot commented on AIRFLOW-2814:
-

kaxil commented on issue #3659: [AIRFLOW-2814] Fix inconsistent default config
URL: 
https://github.com/apache/incubator-airflow/pull/3659#issuecomment-409144039
 
 
   @bolkedebruin @Fokko Thoughts? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default Arg "file_process_interval" for class SchedulerJob is inconsistent 
> with doc
> ---
>
> Key: AIRFLOW-2814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2814
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Fix For: 2.0.0
>
>
> h2. Backgrond
> In 
> [https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/jobs.py#L592]
>  , it was mentioned the default value of argument *file_process_interval* 
> should be 3 minutes (*file_process_interval:* Parse and schedule each file no 
> faster than this interval).
> The value is normally parsed from the default configuration. However, in the 
> default config_template, its value is 0 rather than 180 seconds 
> ([https://github.com/XD-DENG/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L432]
>  ). 
> h2. Issue
> This means that actually that each file is parsed and scheduled without 
> letting Airflow "rest". This conflicts with the design purpose (by default 
> let it be 180 seconds) and may affect performance significantly.
> h2. My Proposal
> Change the value in the config template from 0 to 180.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563319#comment-16563319
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

verdan commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-409147448
 
 
   @tedmiston can you please make sure:
   - you squash your commits 
   - your commit message adheres the [commit 
guidelines](https://github.com/apache/incubator-airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md#commits)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563312#comment-16563312
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

verdan commented on a change in pull request #3656: [AIRFLOW-2803] Fix all 
ESLint issues
URL: https://github.com/apache/incubator-airflow/pull/3656#discussion_r206443837
 
 

 ##
 File path: airflow/www_rbac/static/js/clock.js
 ##
 @@ -18,24 +18,25 @@
  */
 require('./jqClock.min');
 
-$(document).ready(function () {
-  x = new Date();
+$(document).ready(() => {
 
 Review comment:
   Please note that most of the custom JS is written inline in .html files, and 
we are not yet considering that javascript in webpack, that means, we won't be 
able to transpile that javascript to ES5. (which is fine for now)
   I am working on another issue to extract all inline JS from html files to 
separate .js files. 
   https://issues.apache.org/jira/browse/AIRFLOW-2804
   
   My suggestion would be to implement the ES6->ES5 tranpilation as part of 
this issue. And once this PR gets merged, we'll be able to extract all inline 
JS into separate .js files. 
   We already have a JIRA issue for that: 
https://issues.apache.org/jira/browse/AIRFLOW-2730


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2834) can not see the dag page after build from the newest code in github

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565151#comment-16565151
 ] 

ASF GitHub Bot commented on AIRFLOW-2834:
-

yeluolei opened a new pull request #3675: [AIRFLOW-2834] fix build script for 
k8s docker
URL: https://github.com/apache/incubator-airflow/pull/3675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2834
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   the kubernetes docker build airflow without rbac support, but the configmap 
need rbac. so need to change the build script to build js and css files. 
   currently when open airflow web ui deployed in kubernetes, the webpage is 
blank and will be some file missing.
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> can not see the dag page after build from the newest code in github
> ---
>
> Key: AIRFLOW-2834
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2834
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 2.0
>Reporter: Rurui Ye
>Assignee: Rurui Ye
>Priority: Blocker
> Attachments: image-2018-08-01-14-20-09-256.png
>
>
> after build and deploy the newest version of code from github. got the web 
> server opened and the dags page blank with the following error in request 
> resource.
>  
> !image-2018-08-01-14-20-09-256.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565171#comment-16565171
 ] 

ASF GitHub Bot commented on AIRFLOW-2836:
-

XD-DENG commented on issue #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3674#issuecomment-409545344
 
 
   Thanks @ashb . Green now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Minor improvement of contrib.sensors.FileSensor
> ---
>
> Key: AIRFLOW-2836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2836
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h4. *Background*
> The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. 
> However, when we initiate the database 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
>  there isn't such an entry. It doesn't exist anywhere else.
> h4. *Issue*
> The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file 
> system (of course can also be used for NAS). Then the path ("/") from default 
> connection 'fs_default' would suffice.
> However, given the default value for *fs_conn_id* in 
> contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
> make the situation much more complex. 
> When users intend to check local file system only, they should be able to 
> leave *fs_conn_id* default directly, instead of going setting up another 
> connection separately.
> h4. Proposal
> Change default value for *fs_conn_id* in contrib.sensors.FileSensor from 
> "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* 
> are all specified to be "fs_default").



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565170#comment-16565170
 ] 

ASF GitHub Bot commented on AIRFLOW-2836:
-

codecov-io commented on issue #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3674#issuecomment-409544984
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=h1)
 Report
   > Merging 
[#3674](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3674/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #3674  +/-   ##
   =
   - Coverage77.5%   77.5%   -0.01% 
   =
 Files 205 205  
 Lines   15753   15753  
   =
   - Hits12210   12209   -1 
   - Misses   35433544   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3674/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=footer).
 Last update 
[c37fc0b...4d8abd8](https://codecov.io/gh/apache/incubator-airflow/pull/3674?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Minor improvement of contrib.sensors.FileSensor
> ---
>
> Key: AIRFLOW-2836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2836
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h4. *Background*
> The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. 
> However, when we initiate the database 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
>  there isn't such an entry. It doesn't exist anywhere else.
> h4. *Issue*
> The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file 
> system (of course can also be used for NAS). Then the path ("/") from default 
> connection 'fs_default' would suffice.
> However, given the default value for *fs_conn_id* in 
> contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
> make the situation much more complex. 
> When users intend to check local file system only, they should be able to 
> leave *fs_conn_id* default directly, instead of going setting up another 
> connection separately.
> h4. Proposal
> Change default value for *fs_conn_id* in contrib.sensors.FileSensor from 
> "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* 
> are all specified to be "fs_default").



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2846) devel requirement is not sufficient to run tests

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568846#comment-16568846
 ] 

ASF GitHub Bot commented on AIRFLOW-2846:
-

holdenk opened a new pull request #3691: [AIRFLOW-2846] Add missing python test 
dependency to setup.py
URL: https://github.com/apache/incubator-airflow/pull/3691
 
 
   Add missing python test dependency (tox) to setup.py dev requirement.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ X ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Adds test dependency.
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ X ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> devel requirement is not sufficient to run tests
> 
>
> Key: AIRFLOW-2846
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2846
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> The devel requirement doesn't list tox, but `python setup.py test` requires 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2845) Remove asserts from the contrib code (change to legal exceptions)

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568769#comment-16568769
 ] 

ASF GitHub Bot commented on AIRFLOW-2845:
-

xnuinside opened a new pull request #3690: [AIRFLOW-2845] Remove asserts from 
the contrib package
URL: https://github.com/apache/incubator-airflow/pull/3690
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following 
[AIRFLOW-2845](https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2845)
 issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My 
Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   `asserts` is used in Airflow contrib package code .  And from point of view 
for which purposes asserts are really is, it's not correct.
   
   If we look at documentation we could find information what asserts is debug 
tool: 
https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement and 
also it is could be disabled globally by default. 
   
   So, I just want to change debug asserts to ValueError and TypeError.
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   It's covered by existing tests. No new features or important changes. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove asserts from the contrib code (change to legal exceptions) 
> --
>
> Key: AIRFLOW-2845
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2845
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.9.0
>
>
> Hi guys.  `asserts` is used in Airflow contrib package code .  And from point 
> of view for which purposes asserts are really is, it's not correct.
> If we look at documentation we could find information what asserts is debug 
> tool: 
> [https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement] 
> and also it is could be disabled globally by default. 
> If you do not mind, I will be happy to prepare PR for remove asserts from the 
> contrib module with changing it to raising errors with correct Exceptions and 
> messages and not just "Assertion Error".
> I talk only about src (not about asserts in tests). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2849) devel requirement is not sufficient to check code quality locally

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569221#comment-16569221
 ] 

ASF GitHub Bot commented on AIRFLOW-2849:
-

ashb closed pull request #3694: [AIRFLOW-2849] Add missing dependency flake8 to 
setup to allow running code quality checks locally
URL: https://github.com/apache/incubator-airflow/pull/3694
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/setup.py b/setup.py
index e69572c51d..d84c981ccb 100644
--- a/setup.py
+++ b/setup.py
@@ -246,7 +246,8 @@ def write_version(filename=os.path.join(*['airflow',
 'pywinrm',
 'qds-sdk>=1.9.6',
 'rednose',
-'requests_mock'
+'requests_mock',
+'flake8'
 ]
 
 if not PY3:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> devel requirement is not sufficient to check code quality locally
> -
>
> Key: AIRFLOW-2849
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2849
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: Eyal Trabelsi
>Assignee: Eyal Trabelsi
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The devel requirement doesn't list flake8, but in order to check code quality 
> locally one need to install it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569284#comment-16569284
 ] 

ASF GitHub Bot commented on AIRFLOW-2851:
-

tedmiston opened a new pull request #3696: [AIRFLOW-2851] Canonicalize "as 
_..." etc imports
URL: https://github.com/apache/incubator-airflow/pull/3696
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2851
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR:
   
   1. Replaces `import foo as _foo` style imports with the more common `import 
foo` used everywhere else across the codebase.  I dug through history and 
couldn't find special reasons to maintain the as style imports here (I think 
it's just old code).  Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), 
there are just a handful of import lines using `as _...` vs thousands not using 
it, so the goal here is to improve consistency.
   
   2. It also simplifies `import foo.bar as bar` style imports to equivalent 
`from foo import bar`.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Coverage by existing tests.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Canonicalize "as _..." etc imports
> --
>
> Key: AIRFLOW-2851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2851
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> This PR:
> 1. Replaces `import foo as _foo` style imports with the more common `import 
> foo` used everywhere else across the codebase. I dug through history and 
> couldn't find special reasons to maintain the as style imports here (I think 
> it's just old code). Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), 
> there are just a handful of import lines using `as _...` vs thousands not 
> using it, so the goal here is to improve consistency.
> 2. It also simplifies `import foo.bar as bar` style imports to equivalent 
> `from foo import bar`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2850) Remove deprecated airflow.utils.apply_defaults

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569280#comment-16569280
 ] 

ASF GitHub Bot commented on AIRFLOW-2850:
-

tedmiston opened a new pull request #3695: [AIRFLOW-2850] Remove deprecated 
airflow.utils.apply_defaults
URL: https://github.com/apache/incubator-airflow/pull/3695
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2850
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR removes the wrapper function `apply_defaults` that's had a 
deprecation warning since 2016.  As similar "to be deprecated" stuff is removed 
for 2.0 in #3692, this felt like a good time to take care of related things.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Coverage by existing tests.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove deprecated airflow.utils.apply_defaults
> --
>
> Key: AIRFLOW-2850
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2850
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: utils
>Affects Versions: 2.0.0
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> This PR removes the wrapper function apply_defaults that's had a deprecation 
> warning since 2016.  As similar "to be deprecated" stuff is removed for 2.0 
> in #3692 ([AIRFLOW-2847]), this felt like a good time to take care of related 
> things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2806) test_mark_success_no_kill test breaks intermittently on CI

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569314#comment-16569314
 ] 

ASF GitHub Bot commented on AIRFLOW-2806:
-

tedmiston closed pull request #3646: [WIP][AIRFLOW-2806] 
test_mark_success_no_kill test breaks intermittently on CI
URL: https://github.com/apache/incubator-airflow/pull/3646
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 81e43fb4b8..3f41d6525d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -54,15 +54,15 @@ env:
 # does not work with python 3
 - BOTO_CONFIG=/tmp/bogusvalue
   matrix:
-- TOX_ENV=py27-backend_mysql
-- TOX_ENV=py27-backend_sqlite
-- TOX_ENV=py27-backend_postgres
-- TOX_ENV=py35-backend_mysql
-- TOX_ENV=py35-backend_sqlite
+# - TOX_ENV=py27-backend_mysql
+# - TOX_ENV=py27-backend_sqlite
+# - TOX_ENV=py27-backend_postgres
+# - TOX_ENV=py35-backend_mysql
+# - TOX_ENV=py35-backend_sqlite
 - TOX_ENV=py35-backend_postgres
-- TOX_ENV=flake8
-- TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0
-- TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0
+# - TOX_ENV=flake8
+# - TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0
+# - TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0
 matrix:
   exclude:
 - python: "3.5"
diff --git a/scripts/ci/kubernetes/docker/Dockerfile 
b/scripts/ci/kubernetes/docker/Dockerfile
index 498c47b21a..ef72a6c08c 100644
--- a/scripts/ci/kubernetes/docker/Dockerfile
+++ b/scripts/ci/kubernetes/docker/Dockerfile
@@ -40,7 +40,7 @@ RUN pip install --upgrade pip
 RUN pip install -U setuptools && \
 pip install kubernetes && \
 pip install cryptography && \
-pip install psycopg2-binary==2.7.4  # I had issues with older versions of 
psycopg2, just a warning
+pip install psycopg2-binary>=2.7.4  # I had issues with older versions of 
psycopg2, just a warning
 
 # install airflow
 COPY airflow.tar.gz /tmp/airflow.tar.gz
diff --git a/setup.py b/setup.py
index 50af30944e..bf4ce1d1cf 100644
--- a/setup.py
+++ b/setup.py
@@ -299,7 +299,7 @@ def do_setup():
 'python-nvd3==0.15.0',
 'requests>=2.5.1, <3',
 'setproctitle>=1.1.8, <2',
-'sqlalchemy>=1.1.15, <1.2.0',
+'sqlalchemy>=1.1.15, <1.3.0',
 'sqlalchemy-utc>=0.9.0',
 'tabulate>=0.7.5, <0.8.0',
 'tenacity==4.8.0',
diff --git a/tests/jobs.py b/tests/jobs.py
index 93f6574df4..d4184236d8 100644
--- a/tests/jobs.py
+++ b/tests/jobs.py
@@ -1086,10 +1086,10 @@ def test_localtaskjob_heartbeat(self, mock_pid):
 mock_pid.return_value = 2
 self.assertRaises(AirflowException, job1.heartbeat_callback)
 
-@unittest.skipIf('mysql' in configuration.conf.get('core', 
'sql_alchemy_conn'),
- "flaky when run on mysql")
-@unittest.skipIf('postgresql' in configuration.conf.get('core', 
'sql_alchemy_conn'),
- 'flaky when run on postgresql')
+# @unittest.skipIf('mysql' in configuration.conf.get('core', 
'sql_alchemy_conn'),
+#  "flaky when run on mysql")
+# @unittest.skipIf('postgresql' in configuration.conf.get('core', 
'sql_alchemy_conn'),
+#  'flaky when run on postgresql')
 def test_mark_success_no_kill(self):
 """
 Test that ensures that mark_success in the UI doesn't cause


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> test_mark_success_no_kill test breaks intermittently on CI
> --
>
> Key: AIRFLOW-2806
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2806
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> The test_mark_success_no_kill test is breaking intermittently on the CI for 
> some versions of Python and some databases, particularly Python 3.5 for both 
> PostgreSQL and MySQL.
> A traceback of the error is 
> ([link|https://travis-ci.org/apache/incubator-airflow/jobs/407522994#L5668-L5701]):
> {code:java}
> 10) ERROR: test_mark_success_no_kill (tests.transplant_class..C)
> --
>  Traceback (most recent call last):
>  tests/jobs.py line 1116 in 

[jira] [Commented] (AIRFLOW-2796) Improve code coverage for utils/helpers.py

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569327#comment-16569327
 ] 

ASF GitHub Bot commented on AIRFLOW-2796:
-

feng-tao closed pull request #3637: [AIRFLOW-2796] Improve utils helpers code 
coverage
URL: https://github.com/apache/incubator-airflow/pull/3637
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index 1005671e9e..5fa941c55d 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -116,6 +116,43 @@ def test_reduce_in_chunks(self):
   2),
  14)
 
+def test_is_in(self):
+obj = ["list", "object"]
+# Check for existence of a list object within a list
+self.assertTrue(
+helpers.is_in(obj, [obj])
+)
+
+# Check that an empty list returns false
+self.assertFalse(
+helpers.is_in(obj, [])
+)
+
+# Check to ensure it handles None types
+self.assertFalse(
+helpers.is_in(None, [obj])
+)
+
+# Check to ensure true will be returned of multiple objects exist
+self.assertTrue(
+helpers.is_in(obj, [obj, obj])
+)
+
+def test_is_container(self):
+self.assertFalse(helpers.is_container("a string is not a container"))
+self.assertTrue(helpers.is_container(["a", "list", "is", "a", 
"container"]))
+
+def test_as_tuple(self):
+self.assertEquals(
+helpers.as_tuple("a string is not a container"),
+("a string is not a container",)
+)
+
+self.assertEquals(
+helpers.as_tuple(["a", "list", "is", "a", "container"]),
+("a", "list", "is", "a", "container")
+)
+
 
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve code coverage for utils/helpers.py
> --
>
> Key: AIRFLOW-2796
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2796
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andy Cooper
>Assignee: Andy Cooper
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Improve code coverage by adding tests for 
>  * is_container
>  * is_in
>  * as_tuple



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1749) AirflowConfigParser fails to override has_option from ConfigParser, causing broken LDAP config

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569229#comment-16569229
 ] 

ASF GitHub Bot commented on AIRFLOW-1749:
-

ashb closed pull request #2722: [AIRFLOW-1749] Fix has_option to consider 
environment and cmd overrides
URL: https://github.com/apache/incubator-airflow/pull/2722
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index ff81d9827b..adefb3fc20 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -175,10 +175,10 @@ def _get_cmd_option(self, section, key):
 # if this is a valid command key...
 if (section, key) in AirflowConfigParser.as_command_stdout:
 # if the original key is present, return it no matter what
-if self.has_option(section, key):
+if ConfigParser.has_option(self, section, key):
 return ConfigParser.get(self, section, key)
 # otherwise, execute the fallback key
-elif self.has_option(section, fallback_key):
+elif ConfigParser.has_option(self, section, fallback_key):
 command = self.get(section, fallback_key)
 return run_command(command)
 
@@ -192,7 +192,7 @@ def get(self, section, key, **kwargs):
 return option
 
 # ...then the config file
-if self.has_option(section, key):
+if ConfigParser.has_option(self, section, key)
 return expand_env_var(
 ConfigParser.get(self, section, key, **kwargs))
 
@@ -229,6 +229,11 @@ def getint(self, section, key):
 def getfloat(self, section, key):
 return float(self.get(section, key))
 
+def has_option(self, section, key):
+return ((self._get_env_var_option(section, key) is not None) or
+   ConfigParser.has_option(self, section, key) or
+   (self._get_cmd_option(section, key) is not None))
+
 def read(self, filenames):
 ConfigParser.read(self, filenames)
 self._validate()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AirflowConfigParser fails to override has_option from ConfigParser, causing 
> broken LDAP config
> --
>
> Key: AIRFLOW-1749
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1749
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: Airflow 2.0, Airflow 1.8
> Environment: Ubuntu 16.04
>Reporter: Nick McNutt
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In configuration.py, class {{AirflowConfigParser}} fails to override 
> {{has_option}} from {{ConfigParser}}.  This breaks the following in 
> ldap_auth.py:
> {{if configuration.has_option("ldap", "search_scope"):
> search_scope = SUBTREE if configuration.get("ldap", 
> "search_scope") == "SUBTREE" else LEVEL}}
> This code fails to consider whether any environment variable (e.g., 
> {{AIRFLOW__LDAP__SEARCH_SCOPE}}) or command override's are set, meaning that 
> LDAP configuration cannot be entirely set up through environment variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-763) Vertica Check Operator

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569240#comment-16569240
 ] 

ASF GitHub Bot commented on AIRFLOW-763:


ashb closed pull request #1998: [AIRFLOW-763] Add contrib check operator for 
Vertica
URL: https://github.com/apache/incubator-airflow/pull/1998
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/vertica_check_operator.py 
b/airflow/contrib/operators/vertica_check_operator.py
new file mode 100644
index 00..1f936cab3c
--- /dev/null
+++ b/airflow/contrib/operators/vertica_check_operator.py
@@ -0,0 +1,125 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from airflow.contrib.hooks.vertica_hook import VerticaHook
+from airflow.operators.check_operator import CheckOperator, 
ValueCheckOperator, IntervalCheckOperator
+from airflow.utils.decorators import apply_defaults
+
+class VerticaCheckOperator(CheckOperator):
+"""
+Performs checks against Vertica. The ``VerticaCheckOperator`` expects
+a sql query that will return a single row. Each value on that
+first row is evaluated using python ``bool`` casting. If any of the
+values return ``False`` the check is failed and errors out.
+
+Note that Python bool casting evals the following as ``False``:
+
+* ``False``
+* ``0``
+* Empty string (``""``)
+* Empty list (``[]``)
+* Empty dictionary or set (``{}``)
+
+Given a query like ``SELECT COUNT(*) FROM foo``, it will fail only if
+the count ``== 0``. You can craft much more complex query that could,
+for instance, check that the table has the same number of rows as
+the source table upstream, or that the count of today's partition is
+greater than yesterday's partition, or that a set of metrics are less
+than 3 standard deviation for the 7 day average.
+
+This operator can be used as a data quality check in your pipeline, and
+depending on where you put it in your DAG, you have the choice to
+stop the critical path, preventing from
+publishing dubious data, or on the side and receive email alerts
+without stopping the progress of the DAG.
+
+:param sql: the sql to be executed
+:type sql: string
+:param vertica_conn_id: reference to the Vertica database
+:type vertica_conn_id: string
+"""
+
+@apply_defaults
+def __init__(
+self,
+sql,
+vertica_conn_id='vertica_default',
+*args,
+**kwargs):
+super(VerticaCheckOperator, self).__init__(sql=sql, *args, **kwargs)
+self.vertica_conn_id = vertica_conn_id
+self.sql = sql
+
+def get_db_hook(self):
+return VerticaHook(vertica_conn_id=self.vertica_conn_id)
+
+
+class VerticaValueCheckOperator(ValueCheckOperator):
+"""
+Performs a simple value check using sql code.
+
+:param sql: the sql to be executed
+:type sql: string
+"""
+
+@apply_defaults
+def __init__(
+self, sql, pass_value, tolerance=None,
+vertica_conn_id='vertica_default',
+*args, **kwargs):
+super(VerticaValueCheckOperator, self).__init__(
+sql=sql, pass_value=pass_value, tolerance=tolerance,
+*args, **kwargs)
+self.vertica_conn_id = vertica_conn_id
+
+def get_db_hook(self):
+return VerticaHook(vertica_conn_id=self.vertica_conn_id)
+
+
+class VerticaIntervalCheckOperator(IntervalCheckOperator):
+"""
+Checks that the values of metrics given as SQL expressions are within
+a certain tolerance of the ones from days_back before.
+
+This method constructs a query like so:
+
+SELECT {metrics_threshold_dict_key} FROM {table}
+WHERE {date_filter_column}=
+
+:param table: the table name
+:type table: str
+:param days_back: number of days between ds and the ds we want to check
+against. Defaults to 7 days
+:type days_back: int
+:param metrics_threshold: a dictionary of ratios indexed by metrics, for
+example 'COUNT(*)': 1.5 would require a 50 percent or less difference
+between the current day, and the prior 

[jira] [Commented] (AIRFLOW-661) Celery Task Result Expiry

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569238#comment-16569238
 ] 

ASF GitHub Bot commented on AIRFLOW-661:


ashb closed pull request #2143: [AIRFLOW-661] Add Celery 
broker_transport_options config
URL: https://github.com/apache/incubator-airflow/pull/2143
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index cfccbe9a25..47106e7347 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -319,6 +319,10 @@ def run_command(command):
 # information.
 broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
 
+# Celery broker transport options. Provide options in JSON format. Refer to
+# the Celery documentation for more information.
+broker_transport_options = {{}}
+
 # Another key Celery setting
 celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
 
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 04414fbc08..a7d7114711 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -16,6 +16,7 @@
 import logging
 import subprocess
 import time
+import json
 
 from celery import Celery
 from celery import states as celery_states
@@ -39,6 +40,7 @@ class CeleryConfig(object):
 CELERYD_PREFETCH_MULTIPLIER = 1
 CELERY_ACKS_LATE = True
 BROKER_URL = configuration.get('celery', 'BROKER_URL')
+BROKER_TRANSPORT_OPTIONS = json.loads(configuration.get('celery', 
'BROKER_TRANSPORT_OPTIONS'))
 CELERY_RESULT_BACKEND = configuration.get('celery', 
'CELERY_RESULT_BACKEND')
 CELERYD_CONCURRENCY = configuration.getint('celery', 'CELERYD_CONCURRENCY')
 CELERY_DEFAULT_QUEUE = DEFAULT_QUEUE


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Celery Task Result Expiry
> -
>
> Key: AIRFLOW-661
> URL: https://issues.apache.org/jira/browse/AIRFLOW-661
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, executor
>Reporter: Robin Miller
>Assignee: Robin Miller
>Priority: Minor
>
> When using RabbitMQ as the Celery Results Backend, it is desirable to be able 
> to set the CELERY_TASK_RESULT_EXPIRES config option to reduce the time out 
> period of the task tombstones to less than a day. As such we should pull this 
> option from the airflow.cfg file and pass it through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >