What is the purpose of CoarseGrainedScheduler and how can I disable it?

Yeikel Santana Sat, 31 Mar 2018 23:45:53 -0700

Hi ,


This is probably not a spark issue, and more a configuration that I am
missing. Any help would be appreciated. 

 

I am running Spark from a docker template with the following configuration: 

 

version: '2'

 

services:

 

  master:

 

    image: gettyimages/spark

 

    command: bin/spark-class org.apache.spark.deploy.master.Master -h master

 

    hostname: master

 

    environment:

 

      MASTER: spark://master:7077

 

      SPARK_CONF_DIR: /conf

 

      SPARK_PUBLIC_DNS: localhost

 

    expose:

 

      - 7001

 

      - 7002

 

      - 7003

 

      - 7004

 

      - 7005

 

      - 7006

 

      - 7077

 

      - 6066

 

    ports:

 

      - 4040:4040

 

      - 6066:6066

 

      - 7077:7077

 

      - 8080:8080

 

  worker:

 

    image: gettyimages/spark

 

    command: bin/spark-class org.apache.spark.deploy.worker.Worker

spark://master:7077

 

    hostname: worker

 

    environment:

 

      SPARK_CONF_DIR: /conf

 

      SPARK_WORKER_CORES: 2

 

      SPARK_WORKER_MEMORY: 1g

 

      SPARK_WORKER_PORT: 8881

 

      SPARK_WORKER_WEBUI_PORT: 8081

 

      SPARK_PUBLIC_DNS: localhost

 

    links:

 

      - master

 

    expose:

 

      - 7012

 

      - 7013

 

      - 7014

 

      - 7015

 

      - 7016

 

      - 8881

 

    ports:

 

      - 8081:8081

 

 

And I have the following simple Java program: 

 

SparkConf conf = new

SparkConf().setMaster("spark://localhost:7077").setAppName("Word Count
Sample App");

 

conf.set("spark.dynamicAllocation.enabled","false");

 

String file = "test.txt";

 

JavaSparkContext sc = new JavaSparkContext(conf);

 

JavaRDD<String> textFile = sc.textFile("src/main/resources/" + file);

 

JavaPairRDD<String, Integer> counts = textFile.flatMap(s ->
Arrays.asList(s.split("[ ,]")).iterator()).mapToPair(word -> new
Tuple2<>(word, 1)).reduceByKey((a, b) -> a + b);counts.foreach(p ->
System.out.println(p));

 

System.out.println("Total words: " + counts.count());

 

counts.saveAsTextFile(file + "out.txt");

 

 

The problem that I am having is that at runtime , Java is calling the
following command

 

Spark Executor Command: "/usr/jdk1.8.0_131/bin/java" "-cp"

"/conf:/usr/spark-2.3.0/jars/*:/usr/hadoop-2.8.3/etc/hadoop/:/usr/hadoop-2.8

.3/etc/hadoop/*:/usr/hadoop-2.8.3/share/hadoop/common/lib/*:/usr/hadoop-2.8.

3/share/hadoop/common/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/*:/usr/hadoop-2.

8.3/share/hadoop/hdfs/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/lib/*:/usr/h

adoop-2.8.3/share/hadoop/yarn/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/lib

/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/*:/usr/hadoop-2.8.3/share/hadoop

/tools/lib/*" "-Xmx1024M" "-Dspark.driver.port=59906"

"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"

"spark://CoarseGrainedScheduler@yeikel-pc:59906" "--executor-id" "6"

"--hostname" "172.19.0.3" "--cores" "2" "--app-id" "app-20180401005243-0000"

"--worker-url" "spark://Worker@172.19.0.3:8881"

 

Which results in 

 

 

Caused by: java.io.IOException: Failed to connect to yeikel-pc:59906

        at

org.apache.spark.network.client.TransportClientFactory.createClient(Transpor

tClientFactory.java:245)

        at

org.apache.spark.network.client.TransportClientFactory.createClient(Transpor

tClientFactory.java:187)

        at

org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)

        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)

        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11

42)

        at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6

17)

        at java.lang.Thread.run(Thread.java:748)

Caused by: java.net.UnknownHostException: yeikel-pc

 

 

 

Can I overwrite the "--driver-url" from java? OR how can I disable
CoarseGrainedScheduler?

 

I tried to set spark.dynamicAllocation.enabled to false but that did not
work.

What is the purpose of CoarseGrainedScheduler and how can I disable it?

Reply via email to