I want to run different jobs on demand with same spark context, but i don't
know how exactly i can do this.

I try to get current context, but seems it create a new spark context(with
new executors).

I call spark-submit to add new jobs.

I run code on Amazon EMR(3 instances, 4 core & 16GB ram / instance), with
yarn as resource manager.

My code:

val sparkContext = SparkContext.getOrCreate()
val content = 1 to 40000
val result = sparkContext.parallelize(content, 5)
result.map(value => value.toString).foreach(loop)

def loop(x: String): Unit = {
   for (a <- 1 to 30000000) {

   }
}

spark-submit:

spark-submit --executor-cores 1 \
             --executor-memory 1g \
             --driver-memory 1g \
             --master yarn \
             --deploy-mode cluster \
             --conf spark.dynamicAllocation.enabled=true \
             --conf spark.shuffle.service.enabled=true \
             --conf spark.dynamicAllocation.minExecutors=1 \
             --conf spark.dynamicAllocation.maxExecutors=3 \
             --conf spark.dynamicAllocation.initialExecutors=3 \
             --conf spark.executor.instances=3 \

If i run twice spark-submit it create 6 executors, but i want to run all
this jobs on same spark application.

How can achieve adding jobs to an existing spark application?

I don't understand why SparkContext.getOrCreate() don't get existing spark
context.


Thanks,

Cosmin P.

Reply via email to