[jira] [Commented] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-03-03 Thread Ahmed Elzeiny (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050328#comment-17050328
 ] 

Ahmed Elzeiny commented on AIRFLOW-6440:


Update 03/03/2020:

Since AWS ECS/Fargate is a proprietary technology, it's hard to maintain the 
breeze environment and develop integration tests. There are very valid concerns 
about maintenance going forward if we lack an AWS ECS cluster to test on. After 
2 months of hard work, this executor is a hard-pass.

It's also true that you can scale the Celery Executor on ECS or Fargate, and 
you would have a better time doing so. As an added bonus there would be 0 code 
involved. As of December, this is possible because Fargate added EFS support. 
I'm currently working on the cloud-formation stack that would spin this up. 
Effectively, The AWS Scheduler would put messages in an SQS which is monitored 
through CloudWatch which is hooked into an Application-AutoScaler for the AWS 
Service which triggers a Capacity Provider. AWS technically gives you the 
tools; it's just hard to string it all together.

 

> AWS Fargate Executor (AIP-29) (WIP)
> ---
>
> Key: AIRFLOW-6440
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, executors
>Affects Versions: 1.10.8
> Environment: AWS Cloud
>Reporter: Ahmed Elzeiny
>Assignee: Ahmed Elzeiny
>Priority: Minor
>  Labels: AWS, Executor, autoscaling
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. Links
> AIP - 
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
> PR - [https://github.com/apache/airflow/pull/7030]
> h1. Airflow on AWS Fargate
> {color:#707070}We propose the creation of a new Airflow Executor, called the 
> FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
> Scheduler comes up with a command that needs to be executed in some shell. A 
> Docker container parameterized with the command is passed in as an ARG, and 
> AWS Fargate provisions a new instance with . The container then completes or 
> fails the job, causing the container to die along with the Fargate instance. 
> The executor is responsible for keeping track what happened to the task with 
> an airflow task id and AWS ARN number, and based off of the instance exit 
> code we either say that the task succeeded or failed.{color}
> h1. Proposed Implementation
> As you could probably deduce, the underlying mechanism to launch, track, and 
> stop Fargate instances is AWS' Boto3 Library.
> To accomplish this we create a FargateExecutor under the "airflow.executors" 
> module. This class will extend from BaseExecutor and override 5 methods: 
> {{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
> execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
> {{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
> boto3 for monitoring and deployment purposes.
> {color:#707070}The three major Boto3 API calls are:{color}
>  * {color:#707070}The {color:#0747a6}{{execute_async()}}{color} function 
> calls boto3's {{{color:#0747a6}run_task(){color}}} function.{color}
>  * {color:#707070}The {{{color:#0747a6}sync(){color}}} function calls boto3's 
> {{{color:#0747a6}describe_tasks(){color}}} function.{color}
>  * {color:#707070}The {{{color:#0747a6}terminate(){color}}} function calls 
> boto3's {{{color:#0747a6}stop_task(){color}}} function.{color}
> h1. Maintenance
> The executor itself is nothing special since it mostly relies on overriding 
> the proper methods from .
> In general, AWS is fairly committed to keeping their APIs in service. Fargate 
> is rather new and I've personally perceived a lot more features added as 
> optional parameters over the course of the past year. However, the required 
> parameters for the three Boto3 calls that are used have remained the same. 
> I've also written test-cases that ensures that the Boto3 calls made are 
> complaint to the most current version of their APIs.
> We've also introduced a callback hook (very similar to the Celery Executor) 
> that allows users to launch tasks with their own parameters. Therefore if a 
> user doesn't like the default parameter options used in Boto3's 
> \{{run_task(),}}then they can call it themselves with whatever parameters 
> they want. This means that Airflow doesn't have to add a new configuration 
> everytime AWS makes an addition to AWS Fargate. It's just one configuration 
> to cover them all.
> h1. {color:#707070}Proposed Configuration{color}
>  
> {code:java}
> [fargate]
> # For more information on any of these execution parameters, see the link 
> below:
> # 
> 

[jira] [Commented] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-03-03 Thread Ahmed Elzeiny (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050320#comment-17050320
 ] 

Ahmed Elzeiny commented on AIRFLOW-6440:


Hey Andrea,

You're not wrong. I've made the suggested change.

> AWS Fargate Executor (AIP-29) (WIP)
> ---
>
> Key: AIRFLOW-6440
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, executors
>Affects Versions: 1.10.8
> Environment: AWS Cloud
>Reporter: Ahmed Elzeiny
>Assignee: Ahmed Elzeiny
>Priority: Minor
>  Labels: AWS, Executor, autoscaling
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. Links
> AIP - 
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
> PR - [https://github.com/apache/airflow/pull/7030]
> h1. Airflow on AWS Fargate
> {color:#707070}We propose the creation of a new Airflow Executor, called the 
> FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
> Scheduler comes up with a command that needs to be executed in some shell. A 
> Docker container parameterized with the command is passed in as an ARG, and 
> AWS Fargate provisions a new instance with . The container then completes or 
> fails the job, causing the container to die along with the Fargate instance. 
> The executor is responsible for keeping track what happened to the task with 
> an airflow task id and AWS ARN number, and based off of the instance exit 
> code we either say that the task succeeded or failed.{color}
> h1. Proposed Implementation
> As you could probably deduce, the underlying mechanism to launch, track, and 
> stop Fargate instances is AWS' Boto3 Library.
> To accomplish this we create a FargateExecutor under the "airflow.executors" 
> module. This class will extend from BaseExecutor and override 5 methods: 
> {{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
> execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
> {{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
> boto3 for monitoring and deployment purposes.
> {color:#707070}The three major Boto3 API calls are:{color}
>  * {color:#707070}The {color:#0747a6}{{execute_async()}}{color} function 
> calls boto3's {{{color:#0747a6}run_task(){color}}} function.{color}
>  * {color:#707070}The {{{color:#0747a6}sync(){color}}} function calls boto3's 
> {{{color:#0747a6}describe_tasks(){color}}} function.{color}
>  * {color:#707070}The {{{color:#0747a6}terminate(){color}}} function calls 
> boto3's {{{color:#0747a6}stop_task(){color}}} function.{color}
> h1. Maintenance
> The executor itself is nothing special since it mostly relies on overriding 
> the proper methods from .
> In general, AWS is fairly committed to keeping their APIs in service. Fargate 
> is rather new and I've personally perceived a lot more features added as 
> optional parameters over the course of the past year. However, the required 
> parameters for the three Boto3 calls that are used have remained the same. 
> I've also written test-cases that ensures that the Boto3 calls made are 
> complaint to the most current version of their APIs.
> We've also introduced a callback hook (very similar to the Celery Executor) 
> that allows users to launch tasks with their own parameters. Therefore if a 
> user doesn't like the default parameter options used in Boto3's 
> \{{run_task(),}}then they can call it themselves with whatever parameters 
> they want. This means that Airflow doesn't have to add a new configuration 
> everytime AWS makes an addition to AWS Fargate. It's just one configuration 
> to cover them all.
> h1. {color:#707070}Proposed Configuration{color}
>  
> {code:java}
> [fargate]
> # For more information on any of these execution parameters, see the link 
> below:
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
> # For boto3 credential management, see
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
> ### MANDATORY CONFIGS:
> # Name of region
> region = us-west-2
> # Name of cluster
> cluster = test-airflow
> ### EITHER POPULATE THESE:
> # Name of task definition with a bootable-container. Note that this container 
> will receive an airflow CLI
> # command as an additional parameter to its entrypoint. It's job is to 
> boot-up and run this command
> task_definition = test-airflow-worker
> # name of registered container within your AWS cluster
> container_name = airflow-worker
> # security group ids for task to run in (comma-separated)
> security_groups = sg-xx
> # Subnets for task to run in.
> subnets = subnet-yy,subnet-z
> # FARGATE platform version. Defaults to 

[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Description: 
h1. Links

AIP - 
[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]

PR - [https://github.com/apache/airflow/pull/7030]
h1. Airflow on AWS Fargate

{color:#707070}We propose the creation of a new Airflow Executor, called the 
FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
Scheduler comes up with a command that needs to be executed in some shell. A 
Docker container parameterized with the command is passed in as an ARG, and AWS 
Fargate provisions a new instance with . The container then completes or fails 
the job, causing the container to die along with the Fargate instance. The 
executor is responsible for keeping track what happened to the task with an 
airflow task id and AWS ARN number, and based off of the instance exit code we 
either say that the task succeeded or failed.{color}
h1. Proposed Implementation

As you could probably deduce, the underlying mechanism to launch, track, and 
stop Fargate instances is AWS' Boto3 Library.

To accomplish this we create a FargateExecutor under the "airflow.executors" 
module. This class will extend from BaseExecutor and override 5 methods: 
{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
{{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
boto3 for monitoring and deployment purposes.

{color:#707070}The three major Boto3 API calls are:{color}
 * {color:#707070}The {color:#0747a6}{{execute_async()}}{color} function calls 
boto3's {{{color:#0747a6}run_task(){color}}} function.{color}
 * {color:#707070}The {{{color:#0747a6}sync(){color}}} function calls boto3's 
{{{color:#0747a6}describe_tasks(){color}}} function.{color}
 * {color:#707070}The {{{color:#0747a6}terminate(){color}}} function calls 
boto3's {{{color:#0747a6}stop_task(){color}}} function.{color}

h1. Maintenance

The executor itself is nothing special since it mostly relies on overriding the 
proper methods from .

In general, AWS is fairly committed to keeping their APIs in service. Fargate 
is rather new and I've personally perceived a lot more features added as 
optional parameters over the course of the past year. However, the required 
parameters for the three Boto3 calls that are used have remained the same. I've 
also written test-cases that ensures that the Boto3 calls made are complaint to 
the most current version of their APIs.

We've also introduced a callback hook (very similar to the Celery Executor) 
that allows users to launch tasks with their own parameters. Therefore if a 
user doesn't like the default parameter options used in Boto3's 
\{{run_task(),}}then they can call it themselves with whatever parameters they 
want. This means that Airflow doesn't have to add a new configuration everytime 
AWS makes an addition to AWS Fargate. It's just one configuration to cover them 
all.
h1. {color:#707070}Proposed Configuration{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
launch_type = FARGATE
# Assign public ip can either be 'ENABLED' or 'DISABLED'.  Defaults to 
'ENABLED'.
assign_public_ip = DISABLED

### OR POPULATE THIS:
# This is a function which returns a function. The outer function takes no 
arguments, and returns the inner function.
# The inner function takes in an airflow CLI command an outputs a json 
compatible with the boto3 run_task API
# linked above. In other words, if you don't like the way I call the fargate 
API then call it yourself
execution_config_function = 
airflow.executors.fargate_executor.default_task_id_to_fargate_options_function
{code}
 

 

 

  was:
h1. Links

AIP - 

[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Description: 
h1. Links

AIP - 
[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]

PR - [https://github.com/apache/airflow/pull/7030]
h1. Airflow on AWS Fargate

{color:#707070}We propose the creation of a new Airflow Executor, called the 
FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
Scheduler comes up with a command that needs to be executed in some shell. A 
Docker container parameterized with the command is passed in as an ARG, and AWS 
Fargate provisions a new instance with . The container then completes or fails 
the job, causing the container to die along with the Fargate instance. The 
executor is responsible for keeping track what happened to the task with an 
airflow task id and AWS ARN number, and based off of the instance exit code we 
either say that the task succeeded or failed.{color}
h1. Proposed Implementation

As you could probably deduce, the underlying mechanism to launch, track, and 
stop Fargate instances is AWS' Boto3 Library.

To accomplish this we create a FargateExecutor under the "airflow.executors" 
module. This class will extend from BaseExecutor and override 5 methods: 
{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
{{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
boto3 for monitoring and deployment purposes.

{color:#707070}The three major Boto3 API calls are:{color}
 * The {color:#0747a6}{{execute_async()}} {color}function calls boto3's 
{color:#3366ff}{{run_task()}}{color} function.
 * {color:#707070} The{{{color:#0747a6} 
sync{color}}}{color}{{{color:#0747a6}{color:#3366ff}(){color}{color}}} function 
calls boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
 * {color:#707070}The 
{color:#0747a6}{{terminate}}{color}{color}{color:#0747a6}{{{color:#3366ff}(){color}}}{color}
 function calls boto3's {{{color:#3366ff}stop_task(){color}}} function.

h1. Maintenance

The executor itself is nothing special since it mostly relies on overriding the 
proper methods from .

In general, AWS is fairly committed to keeping their APIs in service. Fargate 
is rather new and I've personally perceived a lot more features added as 
optional parameters over the course of the past year. However, the required 
parameters for the three Boto3 calls that are used have remained the same. I've 
also written test-cases that ensures that the Boto3 calls made are complaint to 
the most current version of their APIs.

We've also introduced a callback hook (very similar to the Celery Executor) 
that allows users to launch tasks with their own parameters. Therefore if a 
user doesn't like the default parameter options used in Boto3's 
\{{run_task(),}}then they can call it themselves with whatever parameters they 
want. This means that Airflow doesn't have to add a new configuration everytime 
AWS makes an addition to AWS Fargate. It's just one configuration to cover them 
all.
h1. {color:#707070}Proposed Configuration{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
launch_type = FARGATE
# Assign public ip can either be 'ENABLED' or 'DISABLED'.  Defaults to 
'ENABLED'.
assign_public_ip = DISABLED

### OR POPULATE THIS:
# This is a function which returns a function. The outer function takes no 
arguments, and returns the inner function.
# The inner function takes in an airflow CLI command an outputs a json 
compatible with the boto3 run_task API
# linked above. In other words, if you don't like the way I call the fargate 
API then call it yourself
execution_config_function = 
airflow.executors.fargate_executor.default_task_id_to_fargate_options_function
{code}
 

 

 

  

[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Description: 
h1. Links

AIP - 
[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]

PR - https://github.com/apache/airflow/pull/7030
h1. Airflow on AWS Fargate

{color:#707070}We propose the creation of a new Airflow Executor, called the 
FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
Scheduler comes up with a command that needs to be executed in some shell. A 
Docker container parameterized with the command is passed in as an ARG, and AWS 
Fargate provisions a new instance with . The container then completes or fails 
the job, causing the container to die along with the Fargate instance. The 
executor is responsible for keeping track what happened to the task with an 
airflow task id and AWS ARN number, and based off of the instance exit code we 
either say that the task succeeded or failed.{color}
h1. Proposed Implementation

As you could probably deduce, the underlying mechanism to launch, track, and 
stop Fargate instances is AWS' Boto3 Library.

To accomplish this we create a FargateExecutor under the "airflow.executors" 
module. This class will extend from BaseExecutor and override 5 methods: 
{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
{{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
boto3 for monitoring and deployment purposes.

{color:#707070}The three major Boto3 API calls are:{color}
 * The {{execute_async()}} function calls boto3's 
{color:#3366ff}{{run_task()}}{color} function.
 * {color:#707070} The {{sync{color}{color:#3366ff}(){color}}} function calls 
boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
 * {color:#707070}The {{terminate{color}{color:#3366ff}(){color}}} function 
calls boto3's {{{color:#3366ff}stop_task(){color}}} function.

h1. Maintenance

The executor itself is nothing special since it mostly relies on overriding the 
proper methods from .

In general, AWS is fairly committed to keeping their APIs in service. Fargate 
is rather new and I've personally perceived a lot more features added as 
optional parameters over the course of the past year. However, the required 
parameters for the three Boto3 calls that are used have remained the same. I've 
also written test-cases that ensures that the Boto3 calls made are complaint to 
the most current version of their APIs.

We've also introduced a callback hook (very similar to the Celery Executor) 
that allows users to launch tasks with their own parameters. Therefore if a 
user doesn't like the default parameter options used in Boto3's 
\{{run_task(),}}then they can call it themselves with whatever parameters they 
want. This means that Airflow doesn't have to add a new configuration everytime 
AWS makes an addition to AWS Fargate. It's just one configuration to cover them 
all.
h1. {color:#707070}Proposed Configuration{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
launch_type = FARGATE
# Assign public ip can either be 'ENABLED' or 'DISABLED'.  Defaults to 
'ENABLED'.
assign_public_ip = DISABLED

### OR POPULATE THIS:
# This is a function which returns a function. The outer function takes no 
arguments, and returns the inner function.
# The inner function takes in an airflow CLI command an outputs a json 
compatible with the boto3 run_task API
# linked above. In other words, if you don't like the way I call the fargate 
API then call it yourself
execution_config_function = 
airflow.executors.fargate_executor.default_task_id_to_fargate_options_function
{code}
 

 

 

  was:
h1. AIP

[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
h1. Airflow on AWS Fargate

[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Description: 
h1. AIP

[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
h1. Airflow on AWS Fargate

{color:#707070}We propose the creation of a new Airflow Executor, called the 
FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
Scheduler comes up with a command that needs to be executed in some shell. A 
Docker container parameterized with the command is passed in as an ARG, and AWS 
Fargate provisions a new instance with . The container then completes or fails 
the job, causing the container to die along with the Fargate instance. The 
executor is responsible for keeping track what happened to the task with an 
airflow task id and AWS ARN number, and based off of the instance exit code we 
either say that the task succeeded or failed.{color}
h1. Proposed Implementation

As you could probably deduce, the underlying mechanism to launch, track, and 
stop Fargate instances is AWS' Boto3 Library.

To accomplish this we create a FargateExecutor under the "airflow.executors" 
module. This class will extend from BaseExecutor and override 5 methods: 
{color:#0747a6}start(){color}, 
{{{color:#3366ff}sync(){color}}},{{{color:#3366ff} execute_async(){color}}}, 
{{{color:#3366ff}end(){color}}}, and {{{color:#3366ff}terminate(){color}}}. 
Internally, the FargateExecutor uses boto3 for monitoring and deployment 
purposes.

{color:#707070}The three major Boto3 API calls are:{color}
 * The {color:#0747a6}{{execute_async()}}{color} function calls boto3's 
{color:#3366ff}{{run_task()}}{color} function.
 * {color:#707070} The 
{{{color:#0747a6}sync{color}}}{color}{{{color:#0747a6}(){color}}} function 
calls boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
 * {color:#707070}The 
{{{color:#0747a6}terminate{color}}}{color}{{{color:#0747a6}(){color}}} function 
calls boto3's {{{color:#3366ff}stop_task(){color}}} function.

h1. Maintenance

The executor itself is nothing special since it mostly relies on overriding the 
proper methods from .

In general, AWS is fairly committed to keeping their APIs in service. Fargate 
is rather new and I've personally perceived a lot more features added as 
optional parameters over the course of the past year. However, the required 
parameters for the three Boto3 calls that are used have remained the same. I've 
also written test-cases that ensures that the Boto3 calls made are complaint to 
the most current version of their APIs.

We've also introduced a callback hook (very similar to the Celery Executor) 
that allows users to launch tasks with their own parameters. Therefore if a 
user doesn't like the default parameter options used in Boto3's 
\{{run_task(),}}then they can call it themselves with whatever parameters they 
want. This means that Airflow doesn't have to add a new configuration everytime 
AWS makes an addition to AWS Fargate. It's just one configuration to cover them 
all.
h1. {color:#707070}Proposed Configuration{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
launch_type = FARGATE
# Assign public ip can either be 'ENABLED' or 'DISABLED'.  Defaults to 
'ENABLED'.
assign_public_ip = DISABLED

### OR POPULATE THIS:
# This is a function which returns a function. The outer function takes no 
arguments, and returns the inner function.
# The inner function takes in an airflow CLI command an outputs a json 
compatible with the boto3 run_task API
# linked above. In other words, if you don't like the way I call the fargate 
API then call it yourself
execution_config_function = 
airflow.executors.fargate_executor.default_task_id_to_fargate_options_function
{code}
 

 

 

  was:
h1. AIP


[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29) (WIP)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Summary: AWS Fargate Executor (AIP-29) (WIP)  (was: [WIP] AWS Fargate 
Executor (AIP-29))

> AWS Fargate Executor (AIP-29) (WIP)
> ---
>
> Key: AIRFLOW-6440
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, executors
>Affects Versions: 1.10.8
> Environment: AWS Cloud
>Reporter: Ahmed Elzeiny
>Assignee: Ahmed Elzeiny
>Priority: Minor
>  Labels: AWS, Executor, autoscaling
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. AIP
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
> h1. Airflow on AWS Fargate
> {color:#707070}We propose the creation of a new Airflow Executor, called the 
> FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
> Scheduler comes up with a command that needs to be executed in some shell. A 
> Docker container parameterized with the command is passed in as an ARG, and 
> AWS Fargate provisions a new instance with . The container then completes or 
> fails the job, causing the container to die along with the Fargate instance. 
> The executor is responsible for keeping track what happened to the task with 
> an airflow task id and AWS ARN number, and based off of the instance exit 
> code we either say that the task succeeded or failed.{color}
>  h1. Proposed Implementation
>  
>  As you could probably deduce, the underlying mechanism to launch, track, and 
> stop Fargate instances is AWS' Boto3 Library.
>  
>  To accomplish this we create a FargateExecutor under the "airflow.executors" 
> module. This class will extend from BaseExecutor and override 5 methods: 
> \{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
> execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
> {{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
> boto3 for monitoring and deployment purposes.
> {color:#707070}The three major Boto3 API calls are:{color}
>  * The \{{execute_async()}} function calls boto3's 
> {color:#3366ff}{{run_task()}}{color} function.
>  * {color:#707070} The {{sync{color}{color:#3366ff}(){color}}} function calls 
> boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
>  * {color:#707070}The {{terminate{color}{color:#3366ff}(){color}}} function 
> calls boto3's {{{color:#3366ff}stop_task(){color}}} function.
> h1. Maintenance
>  The executor itself is nothing special since it mostly relies on overriding 
> the proper methods from .
>  
>  In general, AWS is fairly committed to keeping their APIs in service. 
> Fargate is rather new and I've personally perceived a lot more features added 
> as optional parameters over the course of the past year. However, the 
> required parameters for the three Boto3 calls that are used have remained the 
> same. I've also written test-cases that ensures that the Boto3 calls made are 
> complaint to the most current version of their APIs.
>  
>  We've also introduced a callback hook (very similar to the Celery Executor) 
> that allows users to launch tasks with their own parameters. Therefore if a 
> user doesn't like the default parameter options used in Boto3's 
> {{run_task(),}}then they can call it themselves with whatever parameters they 
> want. This means that Airflow doesn't have to add a new configuration 
> everytime AWS makes an addition to AWS Fargate. It's just one configuration 
> to cover them all.
> h1. {color:#707070}Proposed Configuration{color}
>  
> {code:java}
> [fargate]
> # For more information on any of these execution parameters, see the link 
> below:
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
> # For boto3 credential management, see
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
> ### MANDATORY CONFIGS:
> # Name of region
> region = us-west-2
> # Name of cluster
> cluster = test-airflow
> ### EITHER POPULATE THESE:
> # Name of task definition with a bootable-container. Note that this container 
> will receive an airflow CLI
> # command as an additional parameter to its entrypoint. It's job is to 
> boot-up and run this command
> task_definition = test-airflow-worker
> # name of registered container within your AWS cluster
> container_name = airflow-worker
> # security group ids for task to run in (comma-separated)
> security_groups = sg-xx
> # Subnets for task to run in.
> subnets = subnet-yy,subnet-z
> # FARGATE platform version. Defaults to Latest.
> platform_version = LATEST
> # Launch type can either be 'FARGATE' OR 'ECS'. Defaults 

[jira] [Work started] (AIRFLOW-6440) AWS Fargate Executor (AIP-29)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-6440 started by Ahmed Elzeiny.
--
> AWS Fargate Executor (AIP-29)
> -
>
> Key: AIRFLOW-6440
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, executors
>Affects Versions: 1.10.8
> Environment: AWS Cloud
>Reporter: Ahmed Elzeiny
>Assignee: Ahmed Elzeiny
>Priority: Minor
>  Labels: AWS, Executor, autoscaling
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. AIP
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
> h1. Airflow on AWS Fargate
> {color:#707070}We propose the creation of a new Airflow Executor, called the 
> FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
> Scheduler comes up with a command that needs to be executed in some shell. A 
> Docker container parameterized with the command is passed in as an ARG, and 
> AWS Fargate provisions a new instance with . The container then completes or 
> fails the job, causing the container to die along with the Fargate instance. 
> The executor is responsible for keeping track what happened to the task with 
> an airflow task id and AWS ARN number, and based off of the instance exit 
> code we either say that the task succeeded or failed.{color}
>  h1. Proposed Implementation
>  
>  As you could probably deduce, the underlying mechanism to launch, track, and 
> stop Fargate instances is AWS' Boto3 Library.
>  
>  To accomplish this we create a FargateExecutor under the "airflow.executors" 
> module. This class will extend from BaseExecutor and override 5 methods: 
> \{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
> execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
> {{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
> boto3 for monitoring and deployment purposes.
> {color:#707070}The three major Boto3 API calls are:{color}
>  * The \{{execute_async()}} function calls boto3's 
> {color:#3366ff}{{run_task()}}{color} function.
>  * {color:#707070} The {{sync{color}{color:#3366ff}(){color}}} function calls 
> boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
>  * {color:#707070}The {{terminate{color}{color:#3366ff}(){color}}} function 
> calls boto3's {{{color:#3366ff}stop_task(){color}}} function.
> h1. Maintenance
>  The executor itself is nothing special since it mostly relies on overriding 
> the proper methods from .
>  
>  In general, AWS is fairly committed to keeping their APIs in service. 
> Fargate is rather new and I've personally perceived a lot more features added 
> as optional parameters over the course of the past year. However, the 
> required parameters for the three Boto3 calls that are used have remained the 
> same. I've also written test-cases that ensures that the Boto3 calls made are 
> complaint to the most current version of their APIs.
>  
>  We've also introduced a callback hook (very similar to the Celery Executor) 
> that allows users to launch tasks with their own parameters. Therefore if a 
> user doesn't like the default parameter options used in Boto3's 
> {{run_task(),}}then they can call it themselves with whatever parameters they 
> want. This means that Airflow doesn't have to add a new configuration 
> everytime AWS makes an addition to AWS Fargate. It's just one configuration 
> to cover them all.
> h1. {color:#707070}Proposed Configuration{color}
>  
> {code:java}
> [fargate]
> # For more information on any of these execution parameters, see the link 
> below:
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
> # For boto3 credential management, see
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
> ### MANDATORY CONFIGS:
> # Name of region
> region = us-west-2
> # Name of cluster
> cluster = test-airflow
> ### EITHER POPULATE THESE:
> # Name of task definition with a bootable-container. Note that this container 
> will receive an airflow CLI
> # command as an additional parameter to its entrypoint. It's job is to 
> boot-up and run this command
> task_definition = test-airflow-worker
> # name of registered container within your AWS cluster
> container_name = airflow-worker
> # security group ids for task to run in (comma-separated)
> security_groups = sg-xx
> # Subnets for task to run in.
> subnets = subnet-yy,subnet-z
> # FARGATE platform version. Defaults to Latest.
> platform_version = LATEST
> # Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
> launch_type = FARGATE
> # Assign public ip can either be 'ENABLED' or 

[jira] [Updated] (AIRFLOW-6440) [WIP] AWS Fargate Executor (AIP-29)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Summary: [WIP] AWS Fargate Executor (AIP-29)  (was: AWS Fargate Executor 
(AIP-29))

> [WIP] AWS Fargate Executor (AIP-29)
> ---
>
> Key: AIRFLOW-6440
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, executors
>Affects Versions: 1.10.8
> Environment: AWS Cloud
>Reporter: Ahmed Elzeiny
>Assignee: Ahmed Elzeiny
>Priority: Minor
>  Labels: AWS, Executor, autoscaling
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. AIP
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
> h1. Airflow on AWS Fargate
> {color:#707070}We propose the creation of a new Airflow Executor, called the 
> FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
> Scheduler comes up with a command that needs to be executed in some shell. A 
> Docker container parameterized with the command is passed in as an ARG, and 
> AWS Fargate provisions a new instance with . The container then completes or 
> fails the job, causing the container to die along with the Fargate instance. 
> The executor is responsible for keeping track what happened to the task with 
> an airflow task id and AWS ARN number, and based off of the instance exit 
> code we either say that the task succeeded or failed.{color}
>  h1. Proposed Implementation
>  
>  As you could probably deduce, the underlying mechanism to launch, track, and 
> stop Fargate instances is AWS' Boto3 Library.
>  
>  To accomplish this we create a FargateExecutor under the "airflow.executors" 
> module. This class will extend from BaseExecutor and override 5 methods: 
> \{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
> execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
> {{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
> boto3 for monitoring and deployment purposes.
> {color:#707070}The three major Boto3 API calls are:{color}
>  * The \{{execute_async()}} function calls boto3's 
> {color:#3366ff}{{run_task()}}{color} function.
>  * {color:#707070} The {{sync{color}{color:#3366ff}(){color}}} function calls 
> boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
>  * {color:#707070}The {{terminate{color}{color:#3366ff}(){color}}} function 
> calls boto3's {{{color:#3366ff}stop_task(){color}}} function.
> h1. Maintenance
>  The executor itself is nothing special since it mostly relies on overriding 
> the proper methods from .
>  
>  In general, AWS is fairly committed to keeping their APIs in service. 
> Fargate is rather new and I've personally perceived a lot more features added 
> as optional parameters over the course of the past year. However, the 
> required parameters for the three Boto3 calls that are used have remained the 
> same. I've also written test-cases that ensures that the Boto3 calls made are 
> complaint to the most current version of their APIs.
>  
>  We've also introduced a callback hook (very similar to the Celery Executor) 
> that allows users to launch tasks with their own parameters. Therefore if a 
> user doesn't like the default parameter options used in Boto3's 
> {{run_task(),}}then they can call it themselves with whatever parameters they 
> want. This means that Airflow doesn't have to add a new configuration 
> everytime AWS makes an addition to AWS Fargate. It's just one configuration 
> to cover them all.
> h1. {color:#707070}Proposed Configuration{color}
>  
> {code:java}
> [fargate]
> # For more information on any of these execution parameters, see the link 
> below:
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
> # For boto3 credential management, see
> # 
> https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
> ### MANDATORY CONFIGS:
> # Name of region
> region = us-west-2
> # Name of cluster
> cluster = test-airflow
> ### EITHER POPULATE THESE:
> # Name of task definition with a bootable-container. Note that this container 
> will receive an airflow CLI
> # command as an additional parameter to its entrypoint. It's job is to 
> boot-up and run this command
> task_definition = test-airflow-worker
> # name of registered container within your AWS cluster
> container_name = airflow-worker
> # security group ids for task to run in (comma-separated)
> security_groups = sg-xx
> # Subnets for task to run in.
> subnets = subnet-yy,subnet-z
> # FARGATE platform version. Defaults to Latest.
> platform_version = LATEST
> # Launch type can either be 'FARGATE' OR 'ECS'. Defaults to 

[jira] [Updated] (AIRFLOW-6440) AWS Fargate Executor (AIP-29)

2020-01-03 Thread Ahmed Elzeiny (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Elzeiny updated AIRFLOW-6440:
---
Description: 
h1. AIP

[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
h1. Airflow on AWS Fargate

{color:#707070}We propose the creation of a new Airflow Executor, called the 
FargateExecutor, that runs tasks asynchronously on AWS Fargate. The Airflow 
Scheduler comes up with a command that needs to be executed in some shell. A 
Docker container parameterized with the command is passed in as an ARG, and AWS 
Fargate provisions a new instance with . The container then completes or fails 
the job, causing the container to die along with the Fargate instance. The 
executor is responsible for keeping track what happened to the task with an 
airflow task id and AWS ARN number, and based off of the instance exit code we 
either say that the task succeeded or failed.{color}
 h1. Proposed Implementation
 
 As you could probably deduce, the underlying mechanism to launch, track, and 
stop Fargate instances is AWS' Boto3 Library.
 
 To accomplish this we create a FargateExecutor under the "airflow.executors" 
module. This class will extend from BaseExecutor and override 5 methods: 
\{{start()}}, {{{color:#3366ff}sync(){color}}},{{{color:#3366ff} 
execute_async(){color}}}, {{{color:#3366ff}end(){color}}}, and 
{{{color:#3366ff}terminate(){color}}}. Internally, the FargateExecutor uses 
boto3 for monitoring and deployment purposes.

{color:#707070}The three major Boto3 API calls are:{color}
 * The \{{execute_async()}} function calls boto3's 
{color:#3366ff}{{run_task()}}{color} function.
 * {color:#707070} The {{sync{color}{color:#3366ff}(){color}}} function calls 
boto3's {{{color:#3366ff}describe_tasks(){color}}} function.
 * {color:#707070}The {{terminate{color}{color:#3366ff}(){color}}} function 
calls boto3's {{{color:#3366ff}stop_task(){color}}} function.

h1. Maintenance


 The executor itself is nothing special since it mostly relies on overriding 
the proper methods from .
 
 In general, AWS is fairly committed to keeping their APIs in service. Fargate 
is rather new and I've personally perceived a lot more features added as 
optional parameters over the course of the past year. However, the required 
parameters for the three Boto3 calls that are used have remained the same. I've 
also written test-cases that ensures that the Boto3 calls made are complaint to 
the most current version of their APIs.
 
 We've also introduced a callback hook (very similar to the Celery Executor) 
that allows users to launch tasks with their own parameters. Therefore if a 
user doesn't like the default parameter options used in Boto3's 
{{run_task(),}}then they can call it themselves with whatever parameters they 
want. This means that Airflow doesn't have to add a new configuration everytime 
AWS makes an addition to AWS Fargate. It's just one configuration to cover them 
all.
h1. {color:#707070}Proposed Configuration{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be 'FARGATE' OR 'ECS'. Defaults to Fargate.
launch_type = FARGATE
# Assign public ip can either be 'ENABLED' or 'DISABLED'.  Defaults to 
'ENABLED'.
assign_public_ip = DISABLED

### OR POPULATE THIS:
# This is a function which returns a function. The outer function takes no 
arguments, and returns the inner function.
# The inner function takes in an airflow CLI command an outputs a json 
compatible with the boto3 run_task API
# linked above. In other words, if you don't like the way I call the fargate 
API then call it yourself
execution_config_function = 
airflow.executors.fargate_executor.default_task_id_to_fargate_options_function
{code}
 

 

 

  was:
h1. AIP

[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
h1. Airflow on AWS Fargate

{color:#707070}{color:#22}We propose the 

[jira] [Created] (AIRFLOW-6440) AWS Fargate Executor (AIP-29)

2020-01-03 Thread Ahmed Elzeiny (Jira)
Ahmed Elzeiny created AIRFLOW-6440:
--

 Summary: AWS Fargate Executor (AIP-29)
 Key: AIRFLOW-6440
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6440
 Project: Apache Airflow
  Issue Type: Improvement
  Components: aws, executors
Affects Versions: 1.10.8
 Environment: AWS Cloud
Reporter: Ahmed Elzeiny
Assignee: Ahmed Elzeiny


h1. AIP

[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-29%3A+AWS+Fargate+Executor]
h1. Airflow on AWS Fargate

{color:#707070}{color:#22}We propose the creation of a new Airflow 
Executor, called the FargateExecutor, that runs tasks asynchronously on AWS 
Fargate. The Airflow Scheduler comes up with a command that needs to be 
executed in some shell. A Docker container parameterized with the command is 
passed in as an ARG, and AWS Fargate provisions a new instance with . The 
container then completes or fails the job, causing the container to die along 
with the Fargate instance. The executor is responsible for keeping track what 
happened to the task with an airflow task id and AWS ARN number, and based off 
of the instance exit code we either say that the task succeeded or 
failed.{color}{color}
h1. {color:#707070}{color:#22}Proposed Implementation{color}{color}

{color:#707070}{color:#22}As you could probably deduce, the underlying 
mechanism to launch, track, and stop Fargate instances is AWS' Boto3 
Library.{color}{color}

{color:#707070}{color:#22}To accomplish this we create a FargateExecutor 
under the "airflow.executors" module. This class will extend from BaseExecutor 
and override 5 methods: {{{color:#3366ff}start(){color}}}, 
{{{color:#3366ff}sync(){color}}},{{{color:#3366ff} execute_async(){color}}}, 
{{{color:#3366ff}end(){color}}}, and {{{color:#3366ff}terminate(){color}}}. 
Internally, the FargateExecutor uses boto3 for monitoring and deployment 
purposes.{color}{color}

{color:#707070}{color:#22}The three major Boto3 API calls are:{color}{color}
 * {color:#707070}{color:#22}The {{{color:#3366ff}execute_async(){color}}} 
function calls boto3's {color:#3366ff}{{run_task()}}{color} function.
{color}{color}
 * {color:#707070}{color:#22} The 
{{{color:#3366ff}sync{color}{color:#3366ff}(){color}}} function calls boto3's 
{{{color:#3366ff}describe_tasks(){color}}} function.{color}{color}
 * {color:#707070}{color:#22}The 
{{{color:#3366ff}terminate{color}{color:#3366ff}(){color}}} function calls 
boto3's {{{color:#3366ff}stop_task(){color}}} function.{color}{color}

h1. {color:#707070}{color:#22}Maintenance {color}{color}

{color:#707070}{color:#22}The executor itself is nothing special since it 
mostly relies on overriding the proper methods from .{color}{color}

{color:#707070}{color:#22}In general, AWS is fairly committed to keeping 
their APIs in service. Fargate is rather new and I've personally perceived a 
lot more features added as optional parameters over the course of the past 
year. However, the required parameters for the three Boto3 calls that are used 
have remained the same. I've also written test-cases that ensures that the 
Boto3 calls made are complaint to the most current version of their APIs.
{color}{color}

{color:#707070}{color:#22}We've also introduced a callback hook (very 
similar to the Celery Executor) that allows users to launch tasks with their 
own parameters. Therefore if a user doesn't like the default parameter options 
used in Boto3's {color:#3366ff}{{run_task(),}}{color}{color}{color}then they 
can call it themselves with whatever parameters they want. This means that 
Airflow doesn't have to add a new configuration everytime AWS makes an addition 
to AWS Fargate. It's just one configuration to cover them all.
h1. {color:#707070}{color:#22}Proposed Configuration{color}{color}

 
{code:java}
[fargate]
# For more information on any of these execution parameters, see the link below:
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
# For boto3 credential management, see
# 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

### MANDATORY CONFIGS:
# Name of region
region = us-west-2
# Name of cluster
cluster = test-airflow

### EITHER POPULATE THESE:
# Name of task definition with a bootable-container. Note that this container 
will receive an airflow CLI
# command as an additional parameter to its entrypoint. It's job is to boot-up 
and run this command
task_definition = test-airflow-worker
# name of registered container within your AWS cluster
container_name = airflow-worker
# security group ids for task to run in (comma-separated)
security_groups = sg-xx
# Subnets for task to run in.
subnets = subnet-yy,subnet-z
# FARGATE platform version. Defaults to Latest.
platform_version = LATEST
# Launch type can either be