[jira] [Assigned] (AIRFLOW-2299) Add S3 Select functionarity to S3FileTransformOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kengo Seki reassigned AIRFLOW-2299: --- Assignee: Kengo Seki > Add S3 Select functionarity to S3FileTransformOperator > -- > > Key: AIRFLOW-2299 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2299 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, operators >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > > S3FileTransformOperator downloads the whole file from S3 before transforming > and uploading it, but it's inefficient if the original file is large but the > necessary part is small. > S3 Select, [which became GA > recently|https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/], > can improve its efficiency and usablitily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2255) Add alembic migration script for AIRFLOW-2059
[ https://issues.apache.org/jira/browse/AIRFLOW-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng closed AIRFLOW-2255. - Resolution: Duplicate > Add alembic migration script for AIRFLOW-2059 > - > > Key: AIRFLOW-2255 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2255 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-390) [AIRFLOW-Don't load example dags by default]
[ https://issues.apache.org/jira/browse/AIRFLOW-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng reassigned AIRFLOW-390: Assignee: (was: Tao Feng) > [AIRFLOW-Don't load example dags by default] > > > Key: AIRFLOW-390 > URL: https://issues.apache.org/jira/browse/AIRFLOW-390 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Sunny Sun >Priority: Trivial > Labels: easyfix > Original Estimate: 24h > Remaining Estimate: 24h > > Load examples should by default be set to False, so they are not > automatically deployed into production environments. This is especially heavy > because the twitter example dag requires Hive, which users may or may not use > in their own deployments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2324) View SubDags in Home Page
[ https://issues.apache.org/jira/browse/AIRFLOW-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vishnu srivastava updated AIRFLOW-2324: --- Description: View SubDag links in the Home page as a collapsible list. This needs to be set via airflow.cfg (was: View SubDag links in the Home page as a collapsible list. This needs to be set vi airflow.cfg) > View SubDags in Home Page > - > > Key: AIRFLOW-2324 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2324 > Project: Apache Airflow > Issue Type: New Feature > Components: ui >Affects Versions: Airflow 2.0 >Reporter: vishnu srivastava >Assignee: vishnu srivastava >Priority: Major > Fix For: Airflow 2.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > View SubDag links in the Home page as a collapsible list. This needs to be > set via airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2324) View SubDags in Home Page
vishnu srivastava created AIRFLOW-2324: -- Summary: View SubDags in Home Page Key: AIRFLOW-2324 URL: https://issues.apache.org/jira/browse/AIRFLOW-2324 Project: Apache Airflow Issue Type: New Feature Components: ui Affects Versions: Airflow 2.0 Reporter: vishnu srivastava Assignee: vishnu srivastava Fix For: Airflow 2.0 View SubDag links in the Home page as a collapsible list. This needs to be set vi airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1834) Unfold SubDAGs in Graph- and Tree-View
[ https://issues.apache.org/jira/browse/AIRFLOW-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vishnu srivastava reassigned AIRFLOW-1834: -- Assignee: vishnu srivastava > Unfold SubDAGs in Graph- and Tree-View > -- > > Key: AIRFLOW-1834 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1834 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: Airflow 1.8 >Reporter: Christoph Hösler >Assignee: vishnu srivastava >Priority: Major > > If one has a DAG with multiple nested SubDAGs, it is cumbersome with the > current UI to zoom into each SubDAG to view its tasks. It would be helpful to > "unfold" a SubDAG Operator and show its subtasks as part of the current DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?
[ https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A.Quasimodo updated AIRFLOW-2323: - Description: As we know, latest librabbitmq is still can't support Python3,so, when I executed the command *pip install apache-airflow[rabbitmq]*, some errors happened. So, should we replace the librabbitmq with other libraries like amqplib,py-amqp,.etc? Thank you > Should we replace the librabbitmq with other library in setup.py for Apache > Airflow 1.9+? > - > > Key: AIRFLOW-2323 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2323 > Project: Apache Airflow > Issue Type: Bug >Reporter: A.Quasimodo >Priority: Major > > As we know, latest librabbitmq is still can't support Python3,so, when I > executed the command *pip install apache-airflow[rabbitmq]*, some errors > happened. > So, should we replace the librabbitmq with other libraries like > amqplib,py-amqp,.etc? > Thank you -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2325) Task logging with AWS Cloud watch
Fang-Pen Lin created AIRFLOW-2325: - Summary: Task logging with AWS Cloud watch Key: AIRFLOW-2325 URL: https://issues.apache.org/jira/browse/AIRFLOW-2325 Project: Apache Airflow Issue Type: New Feature Components: logging Reporter: Fang-Pen Lin In many cases, it's ideal to use remote logging while running Airflow in production, as the worker could be easily scale down or scale up. Or the worker is running in containers, where the local storage is not meant to be there forever. In that case, the S3 task logging handler could be used [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] However, it comes with drawback. S3 logging handler only uploads the log when the task completed or failed. For long running tasks, it's hard to know what's going on with the process until it finishes. To make more real-time logging, I built a logging handler based on AWS CloudWatch. It uses a third party python package `watchtower` [https://github.com/kislyuk/watchtower/tree/master/watchtower] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2325) Task logging with AWS Cloud watch
[ https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Pen Lin updated AIRFLOW-2325: -- Description: In many cases, it's ideal to use remote logging while running Airflow in production, as the worker could be easily scale down or scale up. Or the worker is running in containers, where the local storage is not meant to be there forever. In that case, the S3 task logging handler could be used [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] However, it comes with drawback. S3 logging handler only uploads the log when the task completed or failed. For long running tasks, it's hard to know what's going on with the process until it finishes. To make more real-time logging, I built a logging handler based on AWS CloudWatch. It uses a third party python package `watchtower` [https://github.com/kislyuk/watchtower/tree/master/watchtower] I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], basically I just copy-pasted the code I wrote for my own project, it works fine with 1.9 release, but never tested with master branch. Also, there is a bug in watchtower causing task runner to hang forever when it completes. I created an issue in their repo [https://github.com/kislyuk/watchtower/issues/57] And a PR for addressing that issue [https://github.com/kislyuk/watchtower/pull/58] The PR is still far from ready to be reviewed, but I just want to get some feedback before I spend more time on it. I would like to see if youguys want this cloudwatch handler goes into the main repo, or do youguys prefer it to be a standalone third-party module. If it's that case, I can close this ticket and create a standalone repo on my own. If the PR is welcome, then I can spend more time on polishing it based on your feedback, add tests / documents and other stuff. was: In many cases, it's ideal to use remote logging while running Airflow in production, as the worker could be easily scale down or scale up. Or the worker is running in containers, where the local storage is not meant to be there forever. In that case, the S3 task logging handler could be used [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] However, it comes with drawback. S3 logging handler only uploads the log when the task completed or failed. For long running tasks, it's hard to know what's going on with the process until it finishes. To make more real-time logging, I built a logging handler based on AWS CloudWatch. It uses a third party python package `watchtower` [https://github.com/kislyuk/watchtower/tree/master/watchtower] I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], basically I just copy-pasted the code I wrote for my own project, it works fine with 1.9 release, but never tested with master branch. Also, there is a bug in watchtower causing task runner to hang forever when it completes. I created an issue in their repo [https://github.com/kislyuk/watchtower/issues/57] And a PR for addressing that issue [https://github.com/kislyuk/watchtower/pull/58] The PR is still far from ready to be reviewed, but I just want to get some feedback before I spend more time on it. I would like to see if youguys want this cloudwatch handler goes into the main repo, or do youguys prefer it to be a standalone third-party module. If it's that case, I can close this ticket and create a standalone repo on my own. > Task logging with AWS Cloud watch > - > > Key: AIRFLOW-2325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2325 > Project: Apache Airflow > Issue Type: New Feature > Components: logging >Reporter: Fang-Pen Lin >Priority: Minor > > In many cases, it's ideal to use remote logging while running Airflow in > production, as the worker could be easily scale down or scale up. Or the > worker is running in containers, where the local storage is not meant to be > there forever. In that case, the S3 task logging handler could be used > [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] > However, it comes with drawback. S3 logging handler only uploads the log when > the task completed or failed. For long running tasks, it's hard to know > what's going on with the process until it finishes. > To make more real-time logging, I built a logging handler based on AWS > CloudWatch. It uses a third party python package `watchtower` > > [https://github.com/kislyuk/watchtower/tree/master/watchtower] > > I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], > basically I just copy-pasted the code I wrote for my own project, it works > fine with 1.9 release, but
[jira] [Updated] (AIRFLOW-2325) Task logging with AWS Cloud watch
[ https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Pen Lin updated AIRFLOW-2325: -- Description: In many cases, it's ideal to use remote logging while running Airflow in production, as the worker could be easily scale down or scale up. Or the worker is running in containers, where the local storage is not meant to be there forever. In that case, the S3 task logging handler could be used [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] However, it comes with drawback. S3 logging handler only uploads the log when the task completed or failed. For long running tasks, it's hard to know what's going on with the process until it finishes. To make more real-time logging, I built a logging handler based on AWS CloudWatch. It uses a third party python package `watchtower` [https://github.com/kislyuk/watchtower/tree/master/watchtower] I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], basically I just copy-pasted the code I wrote for my own project, it works fine with 1.9 release, but never tested with master branch. Also, there is a bug in watchtower causing task runner to hang forever when it completes. I created an issue in their repo [https://github.com/kislyuk/watchtower/issues/57] And a PR for addressing that issue [https://github.com/kislyuk/watchtower/pull/58] The PR is still far from ready to be reviewed, but I just want to get some feedback before I spend more time on it. I would like to see if youguys want this cloudwatch handler goes into the main repo, or do youguys prefer it to be a standalone third-party module. If it's that case, I can close this ticket and create a standalone repo on my own. was: In many cases, it's ideal to use remote logging while running Airflow in production, as the worker could be easily scale down or scale up. Or the worker is running in containers, where the local storage is not meant to be there forever. In that case, the S3 task logging handler could be used [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] However, it comes with drawback. S3 logging handler only uploads the log when the task completed or failed. For long running tasks, it's hard to know what's going on with the process until it finishes. To make more real-time logging, I built a logging handler based on AWS CloudWatch. It uses a third party python package `watchtower` [https://github.com/kislyuk/watchtower/tree/master/watchtower] > Task logging with AWS Cloud watch > - > > Key: AIRFLOW-2325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2325 > Project: Apache Airflow > Issue Type: New Feature > Components: logging >Reporter: Fang-Pen Lin >Priority: Minor > > In many cases, it's ideal to use remote logging while running Airflow in > production, as the worker could be easily scale down or scale up. Or the > worker is running in containers, where the local storage is not meant to be > there forever. In that case, the S3 task logging handler could be used > [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py] > However, it comes with drawback. S3 logging handler only uploads the log when > the task completed or failed. For long running tasks, it's hard to know > what's going on with the process until it finishes. > To make more real-time logging, I built a logging handler based on AWS > CloudWatch. It uses a third party python package `watchtower` > > [https://github.com/kislyuk/watchtower/tree/master/watchtower] > > I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], > basically I just copy-pasted the code I wrote for my own project, it works > fine with 1.9 release, but never tested with master branch. Also, there is a > bug in watchtower causing task runner to hang forever when it completes. I > created an issue in their repo > [https://github.com/kislyuk/watchtower/issues/57] > And a PR for addressing that issue > [https://github.com/kislyuk/watchtower/pull/58] > > The PR is still far from ready to be reviewed, but I just want to get some > feedback before I spend more time on it. I would like to see if youguys want > this cloudwatch handler goes into the main repo, or do youguys prefer it to > be a standalone third-party module. If it's that case, I can close this > ticket and create a standalone repo on my own. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)