[jira] [Assigned] (AIRFLOW-2299) Add S3 Select functionarity to S3FileTransformOperator

2018-04-15 Thread Kengo Seki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kengo Seki reassigned AIRFLOW-2299:
---

Assignee: Kengo Seki

> Add S3 Select functionarity to S3FileTransformOperator
> --
>
> Key: AIRFLOW-2299
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2299
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, operators
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>
> S3FileTransformOperator downloads the whole file from S3 before transforming 
> and uploading it, but it's inefficient if the original file is large but the 
> necessary part is small.
> S3 Select, [which became GA 
> recently|https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/],
>  can improve its efficiency and usablitily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2255) Add alembic migration script for AIRFLOW-2059

2018-04-15 Thread Tao Feng (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng closed AIRFLOW-2255.
-
Resolution: Duplicate

> Add alembic migration script for AIRFLOW-2059
> -
>
> Key: AIRFLOW-2255
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2255
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-390) [AIRFLOW-Don't load example dags by default]

2018-04-15 Thread Tao Feng (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng reassigned AIRFLOW-390:


Assignee: (was: Tao Feng)

> [AIRFLOW-Don't load example dags by default]
> 
>
> Key: AIRFLOW-390
> URL: https://issues.apache.org/jira/browse/AIRFLOW-390
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sunny Sun
>Priority: Trivial
>  Labels: easyfix
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Load examples should by default be set to False, so they are not 
> automatically deployed into production environments. This is especially heavy 
> because the twitter example dag requires Hive, which users may or may not use 
> in their own deployments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2324) View SubDags in Home Page

2018-04-15 Thread vishnu srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishnu srivastava updated AIRFLOW-2324:
---
Description: View SubDag links in the Home page as a collapsible list. This 
needs to be set via airflow.cfg  (was: View SubDag links in the Home page as a 
collapsible list. This needs to be set vi airflow.cfg)

> View SubDags in Home Page
> -
>
> Key: AIRFLOW-2324
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2324
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: ui
>Affects Versions: Airflow 2.0
>Reporter: vishnu srivastava
>Assignee: vishnu srivastava
>Priority: Major
> Fix For: Airflow 2.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> View SubDag links in the Home page as a collapsible list. This needs to be 
> set via airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2324) View SubDags in Home Page

2018-04-15 Thread vishnu srivastava (JIRA)
vishnu srivastava created AIRFLOW-2324:
--

 Summary: View SubDags in Home Page
 Key: AIRFLOW-2324
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2324
 Project: Apache Airflow
  Issue Type: New Feature
  Components: ui
Affects Versions: Airflow 2.0
Reporter: vishnu srivastava
Assignee: vishnu srivastava
 Fix For: Airflow 2.0


View SubDag links in the Home page as a collapsible list. This needs to be set 
vi airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1834) Unfold SubDAGs in Graph- and Tree-View

2018-04-15 Thread vishnu srivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vishnu srivastava reassigned AIRFLOW-1834:
--

Assignee: vishnu srivastava

> Unfold SubDAGs in Graph- and Tree-View
> --
>
> Key: AIRFLOW-1834
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1834
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 1.8
>Reporter: Christoph Hösler
>Assignee: vishnu srivastava
>Priority: Major
>
> If one has a DAG with multiple nested SubDAGs, it is cumbersome with the 
> current UI to zoom into each SubDAG to view its tasks. It would be helpful to 
> "unfold" a SubDAG Operator and show its subtasks as part of the current DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?

2018-04-15 Thread A.Quasimodo (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A.Quasimodo updated AIRFLOW-2323:
-
Description: 
As we know, latest librabbitmq is still can't support Python3,so, when I 
executed the command *pip install apache-airflow[rabbitmq]*, some errors 
happened.

So, should we replace the librabbitmq with other libraries like 
amqplib,py-amqp,.etc?

Thank you

> Should we replace the librabbitmq with other library in setup.py for Apache 
> Airflow 1.9+?
> -
>
> Key: AIRFLOW-2323
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2323
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: A.Quasimodo
>Priority: Major
>
> As we know, latest librabbitmq is still can't support Python3,so, when I 
> executed the command *pip install apache-airflow[rabbitmq]*, some errors 
> happened.
> So, should we replace the librabbitmq with other libraries like 
> amqplib,py-amqp,.etc?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2325) Task logging with AWS Cloud watch

2018-04-15 Thread Fang-Pen Lin (JIRA)
Fang-Pen Lin created AIRFLOW-2325:
-

 Summary: Task logging with AWS Cloud watch
 Key: AIRFLOW-2325
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2325
 Project: Apache Airflow
  Issue Type: New Feature
  Components: logging
Reporter: Fang-Pen Lin


In many cases, it's ideal to use remote logging while running Airflow in 
production, as the worker could be easily scale down or scale up. Or the worker 
is running in containers, where the local storage is not meant to be there 
forever. In that case, the S3 task logging handler could be used

[https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]

However, it comes with drawback. S3 logging handler only uploads the log when 
the task completed or failed. For long running tasks, it's hard to know what's 
going on with the process until it finishes.

To make more real-time logging, I built a logging handler based on AWS 
CloudWatch. It uses a third party python package `watchtower`

 

[https://github.com/kislyuk/watchtower/tree/master/watchtower]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2325) Task logging with AWS Cloud watch

2018-04-15 Thread Fang-Pen Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Pen Lin updated AIRFLOW-2325:
--
Description: 
In many cases, it's ideal to use remote logging while running Airflow in 
production, as the worker could be easily scale down or scale up. Or the worker 
is running in containers, where the local storage is not meant to be there 
forever. In that case, the S3 task logging handler could be used

[https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]

However, it comes with drawback. S3 logging handler only uploads the log when 
the task completed or failed. For long running tasks, it's hard to know what's 
going on with the process until it finishes.

To make more real-time logging, I built a logging handler based on AWS 
CloudWatch. It uses a third party python package `watchtower`

 

[https://github.com/kislyuk/watchtower/tree/master/watchtower]

 

I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
basically I just copy-pasted the code I wrote for my own project, it works fine 
with 1.9 release, but never tested with master branch. Also, there is a bug in 
watchtower causing task runner to hang forever when it completes. I created an 
issue in their repo

[https://github.com/kislyuk/watchtower/issues/57]

And a PR for addressing that issue 
[https://github.com/kislyuk/watchtower/pull/58]

 

The PR is still far from ready to be reviewed, but I just want to get some 
feedback before I spend more time on it. I would like to see if youguys want 
this cloudwatch handler goes into the main repo, or do youguys prefer it to be 
a standalone third-party module. If it's that case, I can close this ticket and 
create a standalone repo on my own. If the PR is welcome, then I can spend more 
time on polishing it based on your feedback, add tests / documents and other 
stuff.

 

  was:
In many cases, it's ideal to use remote logging while running Airflow in 
production, as the worker could be easily scale down or scale up. Or the worker 
is running in containers, where the local storage is not meant to be there 
forever. In that case, the S3 task logging handler could be used

[https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]

However, it comes with drawback. S3 logging handler only uploads the log when 
the task completed or failed. For long running tasks, it's hard to know what's 
going on with the process until it finishes.

To make more real-time logging, I built a logging handler based on AWS 
CloudWatch. It uses a third party python package `watchtower`

 

[https://github.com/kislyuk/watchtower/tree/master/watchtower]

 

I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
basically I just copy-pasted the code I wrote for my own project, it works fine 
with 1.9 release, but never tested with master branch. Also, there is a bug in 
watchtower causing task runner to hang forever when it completes. I created an 
issue in their repo

[https://github.com/kislyuk/watchtower/issues/57]

And a PR for addressing that issue 
[https://github.com/kislyuk/watchtower/pull/58]

 

The PR is still far from ready to be reviewed, but I just want to get some 
feedback before I spend more time on it. I would like to see if youguys want 
this cloudwatch handler goes into the main repo, or do youguys prefer it to be 
a standalone third-party module. If it's that case, I can close this ticket and 
create a standalone repo on my own.

 


> Task logging with AWS Cloud watch
> -
>
> Key: AIRFLOW-2325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2325
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: logging
>Reporter: Fang-Pen Lin
>Priority: Minor
>
> In many cases, it's ideal to use remote logging while running Airflow in 
> production, as the worker could be easily scale down or scale up. Or the 
> worker is running in containers, where the local storage is not meant to be 
> there forever. In that case, the S3 task logging handler could be used
> [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]
> However, it comes with drawback. S3 logging handler only uploads the log when 
> the task completed or failed. For long running tasks, it's hard to know 
> what's going on with the process until it finishes.
> To make more real-time logging, I built a logging handler based on AWS 
> CloudWatch. It uses a third party python package `watchtower`
>  
> [https://github.com/kislyuk/watchtower/tree/master/watchtower]
>  
> I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
> basically I just copy-pasted the code I wrote for my own project, it works 
> fine with 1.9 release, but 

[jira] [Updated] (AIRFLOW-2325) Task logging with AWS Cloud watch

2018-04-15 Thread Fang-Pen Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Pen Lin updated AIRFLOW-2325:
--
Description: 
In many cases, it's ideal to use remote logging while running Airflow in 
production, as the worker could be easily scale down or scale up. Or the worker 
is running in containers, where the local storage is not meant to be there 
forever. In that case, the S3 task logging handler could be used

[https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]

However, it comes with drawback. S3 logging handler only uploads the log when 
the task completed or failed. For long running tasks, it's hard to know what's 
going on with the process until it finishes.

To make more real-time logging, I built a logging handler based on AWS 
CloudWatch. It uses a third party python package `watchtower`

 

[https://github.com/kislyuk/watchtower/tree/master/watchtower]

 

I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
basically I just copy-pasted the code I wrote for my own project, it works fine 
with 1.9 release, but never tested with master branch. Also, there is a bug in 
watchtower causing task runner to hang forever when it completes. I created an 
issue in their repo

[https://github.com/kislyuk/watchtower/issues/57]

And a PR for addressing that issue 
[https://github.com/kislyuk/watchtower/pull/58]

 

The PR is still far from ready to be reviewed, but I just want to get some 
feedback before I spend more time on it. I would like to see if youguys want 
this cloudwatch handler goes into the main repo, or do youguys prefer it to be 
a standalone third-party module. If it's that case, I can close this ticket and 
create a standalone repo on my own.

 

  was:
In many cases, it's ideal to use remote logging while running Airflow in 
production, as the worker could be easily scale down or scale up. Or the worker 
is running in containers, where the local storage is not meant to be there 
forever. In that case, the S3 task logging handler could be used

[https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]

However, it comes with drawback. S3 logging handler only uploads the log when 
the task completed or failed. For long running tasks, it's hard to know what's 
going on with the process until it finishes.

To make more real-time logging, I built a logging handler based on AWS 
CloudWatch. It uses a third party python package `watchtower`

 

[https://github.com/kislyuk/watchtower/tree/master/watchtower]


> Task logging with AWS Cloud watch
> -
>
> Key: AIRFLOW-2325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2325
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: logging
>Reporter: Fang-Pen Lin
>Priority: Minor
>
> In many cases, it's ideal to use remote logging while running Airflow in 
> production, as the worker could be easily scale down or scale up. Or the 
> worker is running in containers, where the local storage is not meant to be 
> there forever. In that case, the S3 task logging handler could be used
> [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]
> However, it comes with drawback. S3 logging handler only uploads the log when 
> the task completed or failed. For long running tasks, it's hard to know 
> what's going on with the process until it finishes.
> To make more real-time logging, I built a logging handler based on AWS 
> CloudWatch. It uses a third party python package `watchtower`
>  
> [https://github.com/kislyuk/watchtower/tree/master/watchtower]
>  
> I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
> basically I just copy-pasted the code I wrote for my own project, it works 
> fine with 1.9 release, but never tested with master branch. Also, there is a 
> bug in watchtower causing task runner to hang forever when it completes. I 
> created an issue in their repo
> [https://github.com/kislyuk/watchtower/issues/57]
> And a PR for addressing that issue 
> [https://github.com/kislyuk/watchtower/pull/58]
>  
> The PR is still far from ready to be reviewed, but I just want to get some 
> feedback before I spend more time on it. I would like to see if youguys want 
> this cloudwatch handler goes into the main repo, or do youguys prefer it to 
> be a standalone third-party module. If it's that case, I can close this 
> ticket and create a standalone repo on my own.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)