[jira] [Commented] (AIRFLOW-1298) Airflow Clear Command does not clear tasks in UPSTREAM_FAILED state

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563151#comment-16563151
 ] 

ASF GitHub Bot commented on AIRFLOW-1298:
-

ubermen opened a new pull request #3667: [AIRFLOW-1298] Add clear option 
'only_failed_or_upstream_failed'
URL: https://github.com/apache/incubator-airflow/pull/3667
 
 
   ### Description
   In my operations team, it is necessary to clear all tasks in one command 
line (because of so many schedules will be reprocessed)
   But, 'clear -cdf ...' is only clearing failed tasks without upstream_failed. 
It is not fit for our operator's needs.
   So, I want to add new options to clear failed or upstream_failed jobs all at 
once.
   
   
   ### Tests
   1. clear only one schedule of this task : airflow clear -cp -s 
2018-07-22T05:00:00 -e 2018-07-22T05:00:00 -t ^task_name$ schedule_name
   2. clear only multiple schedules of this task : airflow clear -cp -s 
2018-07-22T05:00:00 -e 2018-07-22T09:00:00 -t ^task_name$ schedule_name
   3. clear multiple schedules of this task with downstream : airflow clear 
-cdp -s 2018-07-22T05:00:00 -e 2018-07-22T09:00:00 -t ^task_name$ schedule_name
   
   
   ### Documentation
   new option of clear command
   option is 'p' of upstream_failed's p character.
   Usage sample is like below :
   airflow clear -cdp -s 2018-07-22T05:00:00 -e 2018-07-22T09:00:00 -t 
^task_name$ schedule_name


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow Clear Command does not clear tasks in UPSTREAM_FAILED state
> ---
>
> Key: AIRFLOW-1298
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1298
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: Airflow 1.8
> Environment: Ubuntu 12.04, Kernel: 3.13.0-113-generic, python 2.7.10
>Reporter: Aaditya Ramesh
>Assignee: Aaditya Ramesh
>Priority: Critical
>
> We are unable to clear airflow tasks that are in the UPSTREAM_FAILED state 
> using the command line. The fix is simple - just change `clear` function in 
> models.py to also clear tasks in UPSTREAM_FAILED state, not just FAILED.
> Diff:
> {noformat}
> diff --git a/airflow/models.py b/airflow/models.py
> index 30e18a44..e60d2918 100755
> --- a/airflow/models.py
> +++ b/airflow/models.py
> @@ -3180,7 +3180,7 @@ class DAG(BaseDag, LoggingMixin):
>  if end_date:
>  tis = tis.filter(TI.execution_date <= end_date)
>  if only_failed:
> -tis = tis.filter(TI.state == State.FAILED)
> +tis = tis.filter(TI.state == State.FAILED or TI.state == 
> State.UPSTREAM_FAILED)
>  if only_running:
>  tis = tis.filter(TI.state == State.RUNNING)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563098#comment-16563098
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

XD-DENG commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due to 
case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-409081714
 
 
   Hi @feng-tao, thanks for suggesting this.
   
   I have updated the related test. Instead of adding separate testing items, I 
updated the existing ones.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563096#comment-16563096
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

codecov-io edited a comment on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer 
bug due to case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408920953
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=h1)
 Report
   > Merging 
[#3665](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3665/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3665  +/-   ##
   ==
   - Coverage   77.51%   77.51%   -0.01% 
   ==
 Files 205  205  
 Lines   1575115751  
   ==
   - Hits1221012209   -1 
   - Misses   3541 3542   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/s3\_to\_hive\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfdG9faGl2ZV9vcGVyYXRvci5weQ==)
 | `93.96% <ø> (ø)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3665/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=footer).
 Last update 
[dfa7b26...c7e5446](https://codecov.io/gh/apache/incubator-airflow/pull/3665?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2795) Oracle to Oracle Transfer Operator

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563058#comment-16563058
 ] 

ASF GitHub Bot commented on AIRFLOW-2795:
-

marcusrehm commented on issue #3639: [AIRFLOW-2795] Oracle to Oracle Transfer 
Operator
URL: 
https://github.com/apache/incubator-airflow/pull/3639#issuecomment-409075763
 
 
   Just bumping up


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Oracle to Oracle Transfer Operator 
> ---
>
> Key: AIRFLOW-2795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2795
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Reporter: Marcus Rehm
>Assignee: Marcus Rehm
>Priority: Trivial
>
> This operator should help in transfer data from one Oracle instance to 
> another or between tables in the same instance. t's suitable in use cases 
> where you don't want to or it's not allowed use dblink.
> The operator needs a sql query and a destination table in order to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2829) Brush up the CI script for minikube

2018-07-30 Thread Kengo Seki (JIRA)
Kengo Seki created AIRFLOW-2829:
---

 Summary: Brush up the CI script for minikube
 Key: AIRFLOW-2829
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2829
 Project: Apache Airflow
  Issue Type: Bug
  Components: ci
Reporter: Kengo Seki
Assignee: Kengo Seki


Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found some 
points that can be improved:

- minikube version is hard-coded
- Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}}
- Undefined variables: {{$unameOut}}
- The following lines cause warnings if download is skipped:

{code}
 69 sudo mv bin/minikube /usr/local/bin/minikube
 70 sudo mv bin/kubectl /usr/local/bin/kubectl
{code}

- {{return}} s at line 81 and 96 won't work since it's outside of a function

- To run this script as a non-root user, {{-E}} is required for {{sudo}}. See 
https://github.com/kubernetes/minikube/issues/1883.

{code}
105 _MINIKUBE="sudo PATH=$PATH minikube"
106 
107 $_MINIKUBE config set bootstrapper localkube
108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION}  
--vm-driver=none
109 $_MINIKUBE update-context
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2760) DAG parsing loop coupled with scheduler loop

2018-07-30 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2760:
---

Assignee: Kevin Yang

> DAG parsing loop coupled with scheduler loop
> 
>
> Key: AIRFLOW-2760
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2760
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Currently DAG parsing loop is coupled with scheduler loop, meaning that if 
> scheduler loop became slow, we will parse DAG slower.
> As a simple producer and consumer pattern, we shall have them decoupled and 
> completely remove the scheduling bottleneck placed by DAG parsing--which is 
> identified in Airbnb as the current biggest bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2760) DAG parsing loop coupled with scheduler loop

2018-07-30 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2760 started by Kevin Yang.
---
> DAG parsing loop coupled with scheduler loop
> 
>
> Key: AIRFLOW-2760
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2760
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Currently DAG parsing loop is coupled with scheduler loop, meaning that if 
> scheduler loop became slow, we will parse DAG slower.
> As a simple producer and consumer pattern, we shall have them decoupled and 
> completely remove the scheduling bottleneck placed by DAG parsing--which is 
> identified in Airbnb as the current biggest bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562663#comment-16562663
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

codecov-io commented on issue #3666: [AIRFLOW-2670] Update SSH Operator's Hook 
to respect timeout
URL: 
https://github.com/apache/incubator-airflow/pull/3666#issuecomment-409045376
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=h1)
 Report
   > Merging 
[#3666](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3666/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3666   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 205  205   
 Lines   1575115751   
   ===
 Hits1221012210   
 Misses   3541 3541
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=footer).
 Last update 
[dfa7b26...42b907c](https://codecov.io/gh/apache/incubator-airflow/pull/3666?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2670) SSHOperator's timeout parameter doesn't affect SSHook timeoot

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562541#comment-16562541
 ] 

ASF GitHub Bot commented on AIRFLOW-2670:
-

Noremac201 opened a new pull request #3666: [AIRFLOW-2670] Update SSH 
Operator's Hook to respect timeout
URL: https://github.com/apache/incubator-airflow/pull/3666
 
 
   ### JIRA
   - [x] My PR addresses the following [Airflow 
JIRA](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
   - https://issues.apache.org/jira/browse/AIRFLOW-2670
   
   ### Description
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Previously the SSH operator was not respecting the passed in timeout to the 
operator. Changed the Operator to pass the timeout to hook, as well as add a 
test to make sure the hook is being created correctly.
   
   Extension of #3553, mistakenly closed after I thought it was fixed elsewhere.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
   ### Code Quality
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SSHOperator's timeout parameter doesn't affect SSHook timeoot
> -
>
> Key: AIRFLOW-2670
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2670
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: jin zhang
>Priority: Major
>
> when I use SSHOperator, SSHOperator's timeout parameter can't set in SSHHook 
> and it's just effect exce_command. 
> old version:
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id)
> I change it to :
> self.ssh_hook = SSHHook(ssh_conn_id=self.ssh_conn_id, timeout=self.timeout)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1979) Redis celery backend not work on 1.9.0 (configuration is ignored)

2018-07-30 Thread Sean Byrne (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562524#comment-16562524
 ] 

Sean Byrne edited comment on AIRFLOW-1979 at 7/30/18 9:21 PM:
--

I was content with the workaround but started digging into this today again 
after your comment.

DEFAULT_CELERY_CONFIG is a dictionary with lowercase keys. I changed the keys 
to uppercase config options and passed this through to the Celery constructor 
and it worked. It looks like celery only supports lowercase keys in version 4.0 
and above.

http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings


was (Author: seanbyrne88):
I was content with the workaround but started digging into this today again 
after your comment.


DEFAULT_CELERY_CONFIG is a dictionary with lowercase keys. I changed the keys 
to uppercase config options and passed this through to the Celery constructor 
and it worked. It looks like celery only supports lowercase keys in version 4.0 
and above.

> Redis celery backend not work on 1.9.0 (configuration is ignored)
> -
>
> Key: AIRFLOW-1979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, worker
>Affects Versions: 1.9.0
>Reporter: Norio Akagi
>Priority: Major
>
> Worker tries to connect to RabbigMQ based on a default setting and shows an 
> error as below:
> {noformat}
> [2018-01-09 16:45:42,778] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2018-01-09 16:45:42,802] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2018-01-09 16:45:43,051] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2018-01-09 16:45:43,051] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2018-01-09 16:45:43,052] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2018-01-09 16:45:43,140: WARNING/MainProcess] 
> /usr/local/lib/python2.7/dist-packages/celery/apps/worker.py:161: 
> CDeprecationWarning:
> Starting from version 3.2 Celery will refuse to accept pickle by default.
> The pickle serializer is a security concern as it may give attackers
> the ability to execute any command.  It's important to secure
> your broker from unauthorized access when using pickle, so we think
> that enabling pickle should require a deliberate action and not be
> the default choice.
> If you depend on pickle then you should set a setting to disable this
> warning and to be sure that everything will continue working
> when you upgrade to Celery 3.2::
> CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
> You must only enable the serializers that you will actually use.
>   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
> [2018-01-09 16:45:43,240: ERROR/MainProcess] consumer: Cannot connect to 
> amqp://guest:**@127.0.0.1:5672//: [Errno 111] Connection refused.
> Trying again in 2.00 seconds...
> {noformat}
> I deploy Airflow on kubernetes so each component (web, scheduler, worker, and 
> flower) is containerized and distributed among nodes. I set 
> {{AIRFLOW__CELERY__CELERY_RESULT_BACKEND}}
>  and {{AIRFLOW__CELERY__BROKER_URL}} in environment variables and it can be 
> seen when I run {{printenv}} in a container, but it looks completely ignored.
> Moving these values to {{airflow.cfg}} doesn't work either.
> It worked just perfectly 1.8 and suddenly stopped working when I upgraded 
> Airflow to 1.9.
> Do you have any idea what may cause this configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2828) Adding named blocks around action links in the dag.html and dags.html templates to make it easier to add custom links

2018-07-30 Thread JIRA
Ricardo Bánffy created AIRFLOW-2828:
---

 Summary: Adding named blocks around action links in the dag.html 
and dags.html templates to make it easier to add custom links
 Key: AIRFLOW-2828
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2828
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Affects Versions: 1.9.0
Reporter: Ricardo Bánffy


I am in the process of building a component that will add a couple actions one 
can do to a DAG,  mostly to make it easier for my analysts to avoid writing 
Python and focus on the SQL they need. One of the ideas is to add a couple 
items to the actions for the "easy" editor and a testing functionality. It'd be 
best if I could extend the template and override the specific parts instead of 
replacing it completely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562468#comment-16562468
 ] 

ASF GitHub Bot commented on AIRFLOW-2803:
-

codecov-io edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-408503531
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=h1)
 Report
   > Merging 
[#3656](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/a338f3276835af45765d24a6e6d43ad4ba4d66ba?src=pr=desc)
 will **increase** coverage by `0.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3656/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3656  +/-   ##
   ==
   + Coverage   77.12%   77.51%   +0.39% 
   ==
 Files 206  205   -1 
 Lines   1577215751  -21 
   ==
   + Hits1216412210  +46 
   + Misses   3608 3541  -67
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5)
 | `99.01% <0%> (-0.99%)` | :arrow_down: |
   | 
[airflow/minihivecluster.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9taW5paGl2ZWNsdXN0ZXIucHk=)
 | | |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.58% <0%> (+0.04%)` | :arrow_up: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `89.87% <0%> (+0.42%)` | :arrow_up: |
   | 
[airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3656/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==)
 | `100% <0%> (+100%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=footer).
 Last update 
[a338f32...b65388a](https://codecov.io/gh/apache/incubator-airflow/pull/3656?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562458#comment-16562458
 ] 

ASF GitHub Bot commented on AIRFLOW-2825:
-

feng-tao commented on issue #3665: [AIRFLOW-2825]Fix S3ToHiveTransfer bug due 
to case
URL: 
https://github.com/apache/incubator-airflow/pull/3665#issuecomment-408999062
 
 
   could you add a test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase 
> ext in S3
> ---
>
> Key: AIRFLOW-2825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> Because upper/lower case was not considered in the extension check, 
> S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is 
> not a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1979) Redis celery backend not work on 1.9.0 (configuration is ignored)

2018-07-30 Thread Marcin Szymanski (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562389#comment-16562389
 ] 

Marcin Szymanski commented on AIRFLOW-1979:
---

If you migrated your config from 1.8, then check if this is in your config file
{code:java}
celery_config_options = 
airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
{code}

> Redis celery backend not work on 1.9.0 (configuration is ignored)
> -
>
> Key: AIRFLOW-1979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, worker
>Affects Versions: 1.9.0
>Reporter: Norio Akagi
>Priority: Major
>
> Worker tries to connect to RabbigMQ based on a default setting and shows an 
> error as below:
> {noformat}
> [2018-01-09 16:45:42,778] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2018-01-09 16:45:42,802] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2018-01-09 16:45:43,051] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2018-01-09 16:45:43,051] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2018-01-09 16:45:43,052] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2018-01-09 16:45:43,140: WARNING/MainProcess] 
> /usr/local/lib/python2.7/dist-packages/celery/apps/worker.py:161: 
> CDeprecationWarning:
> Starting from version 3.2 Celery will refuse to accept pickle by default.
> The pickle serializer is a security concern as it may give attackers
> the ability to execute any command.  It's important to secure
> your broker from unauthorized access when using pickle, so we think
> that enabling pickle should require a deliberate action and not be
> the default choice.
> If you depend on pickle then you should set a setting to disable this
> warning and to be sure that everything will continue working
> when you upgrade to Celery 3.2::
> CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
> You must only enable the serializers that you will actually use.
>   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
> [2018-01-09 16:45:43,240: ERROR/MainProcess] consumer: Cannot connect to 
> amqp://guest:**@127.0.0.1:5672//: [Errno 111] Connection refused.
> Trying again in 2.00 seconds...
> {noformat}
> I deploy Airflow on kubernetes so each component (web, scheduler, worker, and 
> flower) is containerized and distributed among nodes. I set 
> {{AIRFLOW__CELERY__CELERY_RESULT_BACKEND}}
>  and {{AIRFLOW__CELERY__BROKER_URL}} in environment variables and it can be 
> seen when I run {{printenv}} in a container, but it looks completely ignored.
> Moving these values to {{airflow.cfg}} doesn't work either.
> It worked just perfectly 1.8 and suddenly stopped working when I upgraded 
> Airflow to 1.9.
> Do you have any idea what may cause this configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2827) Tasks that fail with spurious Celery issues are not retried

2018-07-30 Thread James Davidheiser (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Davidheiser updated AIRFLOW-2827:
---
Description: 
We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
RabbitMQ using a setup derived pretty heavily from 
[https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
spurious Celery execution failures (possibly related to 
https://issues.apache.org/jira/browse/AIRFLOW-2011 ), resulting in the Worker 
throwing errors that look like this:

 

 
{code:java}
[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
 raised unexpected: AirflowException('Celery command failed',)
 Traceback (most recent call last):
   File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 382, 
in trace_task
     R = retval = fun(*args, **kwargs)
   File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 641, 
in _protected_call_
     return self.run(*args, **kwargs)
   File 
"/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", 
line 55, in execute_command
     raise AirflowException('Celery command failed')
 AirflowException: Celery command failed{code}
 

 

 

When these tasks fail, they send a "task failed" email that has very little 
information about the state of the task failure.  The logs for the task run are 
empty, because the task never actually did anything and the error message was 
generated by the worker.  Also, the task does not retry, so if something goes 
wrong with Celery, the task simply fails outright instead of trying again.

 

This may be the same issue reported in  
https://issues.apache.org/jira/browse/AIRFLOW-1844, but I am not sure because 
there is not much detail there.

  was:
We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
RabbitMQ using a setup derived pretty heavily from 
[https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
spurious Celery execution failures (possibly related to #2011 ), resulting in 
the Worker throwing errors that look like this:

 

```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
 raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 382, 
in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 641, 
in __protected_call__
    return self.run(*args, **kwargs)
  File 
"/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", 
line 55, in execute_command
    raise AirflowException('Celery command failed')
AirflowException: Celery command failed```

 

When these tasks fail, they send a "task failed" email that has very little 
information about the state of the task failure.  The logs for the task run are 
empty, because the task never actually did anything and the error message was 
generated by the worker.  Also, the task does not retry, so if something goes 
wrong with Celery, the task simply fails outright instead of trying again.

 

This may be the same issue reported in #1844, but I am not sure because there 
is not much detail there.


> Tasks that fail with spurious Celery issues are not retried
> ---
>
> Key: AIRFLOW-2827
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: James Davidheiser
>Priority: Major
>
> We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
> RabbitMQ using a setup derived pretty heavily from 
> [https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
> spurious Celery execution failures (possibly related to 
> https://issues.apache.org/jira/browse/AIRFLOW-2011 ), resulting in the Worker 
> throwing errors that look like this:
>  
>  
> {code:java}
> [2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
> airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
>  raised unexpected: AirflowException('Celery command failed',)
>  Traceback (most recent call last):
>    File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 
> 382, in trace_task
>      R = retval = fun(*args, **kwargs)
>    File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 
> 641, in _protected_call_
>      return self.run(*args, **kwargs)
>    File 
> "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py",
>  line 55, in execute_command
>      raise AirflowException('Celery command failed')
>  

[jira] [Created] (AIRFLOW-2827) Tasks that fail with spurious Celery issues are not retried

2018-07-30 Thread James Davidheiser (JIRA)
James Davidheiser created AIRFLOW-2827:
--

 Summary: Tasks that fail with spurious Celery issues are not 
retried
 Key: AIRFLOW-2827
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
 Project: Apache Airflow
  Issue Type: Wish
Reporter: James Davidheiser


We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
RabbitMQ using a setup derived pretty heavily from 
[https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
spurious Celery execution failures (possibly related to #2011 ), resulting in 
the Worker throwing errors that look like this:

 

```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
 raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 382, 
in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 641, 
in __protected_call__
    return self.run(*args, **kwargs)
  File 
"/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", 
line 55, in execute_command
    raise AirflowException('Celery command failed')
AirflowException: Celery command failed```

 

When these tasks fail, they send a "task failed" email that has very little 
information about the state of the task failure.  The logs for the task run are 
empty, because the task never actually did anything and the error message was 
generated by the worker.  Also, the task does not retry, so if something goes 
wrong with Celery, the task simply fails outright instead of trying again.

 

This may be the same issue reported in #1844, but I am not sure because there 
is not much detail there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2011) Airflow ampq pool maintains dead connections

2018-07-30 Thread James Davidheiser (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562316#comment-16562316
 ] 

James Davidheiser commented on AIRFLOW-2011:


Confirming that I am also running into this error - can this configuration 
change be made in the [celery] section of airflow.cfg?

> Airflow ampq pool maintains dead connections
> 
>
> Key: AIRFLOW-2011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2011
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: 1.9.1
> Environment: OS: Ubuntu 16.04 LTS (debian)
> Python: 3.6.3
> Airflow: 1.9.1rc1
>Reporter: Kevin Reilly
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Airflow scheduler deadlocks on queue-up for tasks
> [2018-01-08 07:01:09,315] \{{celery_executor.py:101}} ERROR - Error syncing 
> the celery executor, ignoring it:
> [2018-01-08 07:01:09,315] \{{celery_executor.py:102}} ERROR - [Errno 104] 
> Connection reset by peer
> Traceback (most recent call last):
> File 
> "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py",
>  line 83, in
> state = async.state
> File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 436, in 
> state
> return self._get_task_meta()['status']
> File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 375, in 
> _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
> File "/usr/local/lib/python3.6/dist-packages/celery/backends/rpc.py", line 
> 244, in get_task_meta
> for acc in self._slurp_from_queue(task_id, self.accept, backlog_limit):
> File "/usr/local/lib/python3.6/dist-packages/celery/backends/rpc.py", line 
> 278, in
> binding.declare()
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 605, in 
> declare
> self._create_queue(nowait=nowait, channel=channel)
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 614, in 
> _create_queue
> self.queue_declare(nowait=nowait, passive=False, channel=channel)
> File "/usr/local/lib/python3.6/dist-packages/kombu/entity.py", line 649, in 
> queue_declare
> nowait=nowait,
> File "/usr/local/lib/python3.6/dist-packages/amqp/channel.py", line 1147, in 
> queue_declare
> nowait, arguments),
> File "/usr/local/lib/python3.6/dist-packages/amqp/abstract_channel.py", line 
> 50, in send_method
> conn.frame_writer(1, self.channel_id, sig, args, content)
> File "/usr/local/lib/python3.6/dist-packages/amqp/method_framing.py", line 
> 166, in write_frame
> write(view[:offset])
> File "/usr/local/lib/python3.6/dist-packages/amqp/transport.py", line 258, in 
> write
> self._write(s)
> ConnectionResetError: [Errno 104] Connection reset by peer
> If I edit the celery settings file and add an argument to set
> broker_pool_limit=None
> editing default_celery.py
> and adding
> "broker_pool_limit":None,
> between lines 37 and 38 would solve the issue.  This particular setting 
> requires celery to create a new ampq connection each time it needs one, 
> thereby preventing the rabbitmq server from disconnecting the connection 
> where the client is unaware and leaving broken sockets open for use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562276#comment-16562276
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273826
 
 

 ##
 File path: airflow/contrib/sensors/sagemaker_base_sensor.py
 ##
 @@ -0,0 +1,63 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerBaseSensor(BaseSensorOperator):
+"""
+Contains general sensor behavior for SageMaker.
+Subclasses should implement get_emr_response() and state_from_response() 
methods.
+Subclasses should also implement NON_TERMINAL_STATES and FAILED_STATE 
constants.
 
 Review comment:
   I replaced the constant with a method that raises an error if not 
implemented. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562273#comment-16562273
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273440
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
 
 Review comment:
   No it doesn't. Its only used if user want to use config stored in db. 
Sagemaker hook still uses aws_conn_id to get credentials. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562270#comment-16562270
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273134
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
+the config stored in db for training/tuning
+"""
+
+def __init__(self,
+ sagemaker_conn_id=None,
+ use_db_config=False,
+ region_name=None,
+ *args, **kwargs):
+self.sagemaker_conn_id = sagemaker_conn_id
+self.use_db_config = use_db_config
+self.region_name = region_name
+super(SageMakerHook, self).__init__(*args, **kwargs)
 
 Review comment:
   You are right, Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562269#comment-16562269
 ] 

ASF GitHub Bot commented on AIRFLOW-2524:
-

troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r206273081
 
 

 ##
 File path: airflow/contrib/hooks/sagemaker_hook.py
 ##
 @@ -0,0 +1,177 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import copy
+
+from airflow.exceptions import AirflowException
+from airflow.contrib.hooks.aws_hook import AwsHook
+from airflow.hooks.S3_hook import S3Hook
+
+
+class SageMakerHook(AwsHook):
+"""
+Interact with Amazon SageMaker.
+sagemaker_conn_is is required for using
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2826) Add hook for Google Cloud KMS

2018-07-30 Thread Jasper Kahn (JIRA)
Jasper Kahn created AIRFLOW-2826:


 Summary: Add hook for Google Cloud KMS
 Key: AIRFLOW-2826
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2826
 Project: Apache Airflow
  Issue Type: Improvement
  Components: hooks
Reporter: Jasper Kahn
Assignee: Jasper Kahn


Add a hook to support interacting with Google Cloud KMS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2825) S3ToHiveTransfer operator may not may able to handle GZIP file with uppercase ext in S3

2018-07-30 Thread Xiaodong DENG (JIRA)
Xiaodong DENG created AIRFLOW-2825:
--

 Summary: S3ToHiveTransfer operator may not may able to handle GZIP 
file with uppercase ext in S3
 Key: AIRFLOW-2825
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2825
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Reporter: Xiaodong DENG
Assignee: Xiaodong DENG


Because upper/lower case was not considered in the extension check, 
S3ToHiveTransfer operator may think a GZIP file with uppercase ext `.GZ` is not 
a GZIP file and raise exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2824) Disable loading of default connections via airflow config

2018-07-30 Thread Felix Uellendall (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Uellendall updated AIRFLOW-2824:
--
Issue Type: Wish  (was: New Feature)

> Disable loading of default connections via airflow config
> -
>
> Key: AIRFLOW-2824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2824
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Felix Uellendall
>Priority: Major
>
> I would love to have a variable I can set in the airflow.cfg, like the DAG 
> examples have, to not load the default connections.
> Either by using {{load_examples}} that is already 
> [there|https://github.com/apache/incubator-airflow/blob/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6/airflow/config_templates/default_airflow.cfg#L128]
>  for loading dag examples or by a new one like {{load_default_connections}} 
> to check if the user wants to have it or not.
> The implementation of the default connections starts 
> [here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]
> Let me know what you guys think of it, pls. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2824) Disable loading of default connections via airflow config

2018-07-30 Thread Felix Uellendall (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Uellendall updated AIRFLOW-2824:
--
Description: 
I would love to have a variable I can set in the airflow.cfg, like the DAG 
examples have, to not load the default connections.

Either by using {{load_examples}} that is already 
[there|https://github.com/apache/incubator-airflow/blob/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6/airflow/config_templates/default_airflow.cfg#L128]
 for loading dag examples or by a new one like {{load_default_connections}} to 
check if the user wants to have it or not.

The implementation of the default connections starts 
[here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]

Let me know what you guys think of it, pls. :)

  was:
I would love to have a variable I can set in the airflow.cfg, like the DAG 
examples have, to not load the default connections.

Either by using {{load_examples}} that is already there for loading of dag 
examples or by a new one like {{load_default_connections}} to check if the user 
wants to have it or not.

The implementation of the default connections starts 
[here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]

Let me know what you guys think of it, pls. :)


> Disable loading of default connections via airflow config
> -
>
> Key: AIRFLOW-2824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2824
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Priority: Major
>
> I would love to have a variable I can set in the airflow.cfg, like the DAG 
> examples have, to not load the default connections.
> Either by using {{load_examples}} that is already 
> [there|https://github.com/apache/incubator-airflow/blob/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6/airflow/config_templates/default_airflow.cfg#L128]
>  for loading dag examples or by a new one like {{load_default_connections}} 
> to check if the user wants to have it or not.
> The implementation of the default connections starts 
> [here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]
> Let me know what you guys think of it, pls. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2824) Disable loading of default connections via airflow config

2018-07-30 Thread Felix Uellendall (JIRA)
Felix Uellendall created AIRFLOW-2824:
-

 Summary: Disable loading of default connections via airflow config
 Key: AIRFLOW-2824
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2824
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Felix Uellendall


I would love to have a variable I can set in the airflow.cfg, like the DAG 
examples have, to not load the default connections.

Either by using {{load_examples}} that is already there for loading of dag 
examples or by a new one like {{load_default_connections}} to check if the user 
wants to have it or not.

The implementation of the default connections starts 
[here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]

Let me know what you guys think of it, pls. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2803) Fix all ESLint issues

2018-07-30 Thread Verdan Mahmood (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561591#comment-16561591
 ] 

Verdan Mahmood commented on AIRFLOW-2803:
-

[~tedmiston] thanks for picking this up. 
I'm using Node 8.11.3 and npm 6.1.0

> Fix all ESLint issues
> -
>
> Key: AIRFLOW-2803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2803
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Taylor Edmiston
>Priority: Major
>
> Most of the JS code in Apache Airflow has linting issues which are 
> highlighted after the integration of ESLint. 
> Once AIRFLOW-2783 merged in master branch, please fix all the javascript 
> styling issues that we have in .js and .html files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2691) Make Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-30 Thread Verdan Mahmood (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561590#comment-16561590
 ] 

Verdan Mahmood commented on AIRFLOW-2691:
-

nice catch [~kevcampb], I've added a new JIRA issue for this documentation, and 
will make sure to pick this up ASAP. 

ref: https://issues.apache.org/jira/browse/AIRFLOW-2823

> Make Airflow's JS code (and dependencies) manageable via npm and webpack
> 
>
> Key: AIRFLOW-2691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2691
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
> Fix For: 2.0.0
>
>
> Airflow's JS code is hard to maintain and upgrade. The dependencies are 
> locally existing files making it hard to upgrade versions. 
> Make sure Airflow uses *npm* and *webpack* for the dependencies management. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2823) Improve documentation of the new JS package manager installation

2018-07-30 Thread Verdan Mahmood (JIRA)
Verdan Mahmood created AIRFLOW-2823:
---

 Summary: Improve documentation of the new JS package manager 
installation
 Key: AIRFLOW-2823
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2823
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Verdan Mahmood
Assignee: Verdan Mahmood


With the implementation of npm and webpack for JS packages and dependencies 
(AIRFLOW-2691), please make sure to document that change for the users who 
don't simply use the distributed version of Apache Airflow and install that 
directly using the source code. 
The details about how to install and compile the JS packages can be added in 
INSTALL file on the root so that users can understand quickly how to install 
and compile the JS packages. 
Note: The information is already available in CONTRIBUTING.md file. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2804) Extract inline JS from html files and move them in separate .js files

2018-07-30 Thread Verdan Mahmood (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Verdan Mahmood reassigned AIRFLOW-2804:
---

Assignee: Verdan Mahmood

> Extract inline JS from html files and move them in separate .js files
> -
>
> Key: AIRFLOW-2804
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2804
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
>
> Apache Airflow uses inline JS code on most/all of the pages, making it 
> difficult to optimize/minimfy them properly. 
> Please make full use of npm and webpack by extracting all inline JS from html 
> files, moving them in separate .js files and make use of `require` and 
> `import` to optimize the dependencies. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2805) Display user's local timezone and DAG's timezone on UI

2018-07-30 Thread Verdan Mahmood (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Verdan Mahmood reassigned AIRFLOW-2805:
---

Assignee: Verdan Mahmood

> Display user's local timezone and DAG's timezone on UI
> --
>
> Key: AIRFLOW-2805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
>
> The UI currently only displays the UTC timezone which is also not in human 
> readable forms on all places. 
> Make all the date times in human readable forms. 
> Also, we need to display user's local timezone and DAG's timezone along with 
> UTC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2822) PendingDeprecationWarning Invalid arguments: HipChatAPISendRoomNotificationOperator

2018-07-30 Thread Leo Gallucci (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2822 started by Leo Gallucci.
-
> PendingDeprecationWarning Invalid arguments: 
> HipChatAPISendRoomNotificationOperator
> ---
>
> Key: AIRFLOW-2822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2822
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, operators
>Affects Versions: Airflow 2.0
>Reporter: Leo Gallucci
>Assignee: Leo Gallucci
>Priority: Trivial
>  Labels: easyfix
>
> Using `HipChatAPISendRoomNotificationOperator` on Airflow master branch (2.0) 
> gives:
> {code:python}
> airflow/models.py:2390: PendingDeprecationWarning:
> Invalid arguments were passed to HipChatAPISendRoomNotificationOperator.
> Support for passing such arguments will be dropped in Airflow 2.0.
> Invalid arguments were:
> *args: ()
> **kwargs: {'color': 'green'}
> category=PendingDeprecationWarning
> {code}
> I've fixed this in my fork:
> https://github.com/elgalu/apache-airflow/commit/83fc940f54e5d6531f66bff256f66765899dc055
> I will send a PR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2822) PendingDeprecationWarning Invalid arguments: HipChatAPISendRoomNotificationOperator

2018-07-30 Thread Leo Gallucci (JIRA)
Leo Gallucci created AIRFLOW-2822:
-

 Summary: PendingDeprecationWarning Invalid arguments: 
HipChatAPISendRoomNotificationOperator
 Key: AIRFLOW-2822
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2822
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib, operators
Affects Versions: Airflow 2.0
Reporter: Leo Gallucci
Assignee: Leo Gallucci


Using `HipChatAPISendRoomNotificationOperator` on Airflow master branch (2.0) 
gives:

{code:python}
airflow/models.py:2390: PendingDeprecationWarning:
Invalid arguments were passed to HipChatAPISendRoomNotificationOperator.
Support for passing such arguments will be dropped in Airflow 2.0.
Invalid arguments were:
*args: ()
**kwargs: {'color': 'green'}
category=PendingDeprecationWarning
{code}

I've fixed this in my fork:
https://github.com/elgalu/apache-airflow/commit/83fc940f54e5d6531f66bff256f66765899dc055

I will send a PR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)