[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-07 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:42 PM:
---

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660
- (installs spark on each airflow worker node and runs local spark jobs without 
use of spark submit) 
https://medium.com/@calvertmg/airflow-integrating-with-apache-spark-50a7704dcebd
- (alternative mozilla implementation for emr spark job) 
https://github.com/mozilla/telemetry-airflow/blob/master/dags/operators/emr_spark_operator.py

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660
- (installs spark on each airflow worker node and runs local spark jobs without 
use of spark submit) 
https://medium.com/@calvertmg/airflow-integrating-with-apache-spark-50a7704dcebd

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-07 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:10 PM:
---

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660
- (installs spark on each airflow worker node and runs local spark jobs without 
use of spark submit) 
https://medium.com/@calvertmg/airflow-integrating-with-apache-spark-50a7704dcebd

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-07 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:08 PM:
---

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/
- (use shell script to spark-submit on a local spark installation) 
https://blog.insightdatascience.com/scheduling-spark-jobs-with-airflow-4c66f3144660

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-07 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:07 PM:
---

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
- (uses emr hooks, operators) 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0
- (uses shells scripts to launch and terminate emr clusters) 
https://www.agari.com/automated-model-building-emr-spark-airflow/

EMR: 
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0

EMR: 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-07 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:06 PM:
---

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 

Spark, EMR:
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0

EMR: 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py

Spark:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-05 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri edited comment on AIRFLOW-247 at 5/5/17 6:01 PM:
--

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

EDIT: Found some information here: 
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd4067_1_0


was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-247) EMR Hook, Operators, Sensor

2017-05-05 Thread Al Johri (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998681#comment-15998681
 ] 

Al Johri commented on AIRFLOW-247:
--

I'm searching for documentation related to how Airflow works with EMR. I'm 
struggling to find anything here: 
https://airflow.incubator.apache.org/integration.html#aws

My main question is, can Airflow create an EMR cluster and bring it back down 
like AWS Data Pipeline?

Thanks!

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)