pipeline executed twice, unable to figure out the reason

vivek chaurasiya Sun, 09 Feb 2020 17:06:59 -0800

Hey team,

My beam pipeline seems to be executing twice. The business logic of beam
pipeline is to create one ElasticSearch Index. But since its executed twice
the "spark-submit" command always fails and fails my automation.


Attached is the logs.

I am running spark-submit on AWS-EMR like this:

spark-submit --deploy-mode cluster --conf
spark.executor.extraJavaOptions=-DCLOUD_PLATFORM=AWS --conf
spark.driver.extraJavaOptions=-DCLOUD_PLATFORM=AWS --conf
spark.yarn.am.waitTime=300s --conf
spark.executor.extraClassPath=__app__.jar --driver-memory 8G
--num-executors 5 --executor-memory 20G --executor-cores 6 --jars
s3://vivek-tests/cloud-dataflow-1.0.jar --name
new_user_index_mappings_create_dev --class
com.noka.beam.common.pipeline.EMRSparkStartPipeline
s3://vivek-tests/cloud-dataflow-1.0.jar
--job=new-user-index-mappings-create --dateTime=2020-02-04T00:00:00
--isDev=True --incrementalExport=False

Note: The code has been working as expected (i.e. one run of create-index)
on AWS EMR 5.17 but recently we upgraded to AWS-EMR-5.29

Does someone know if something changed in framework or am I doing smth
wrong? Please help!

Thanks
Vivek

application_1581290593006_0004.log
Description: Binary data

pipeline executed twice, unable to figure out the reason

Reply via email to