Can't connect to remote spark standalone cluster: getting WARN TaskSchedulerImpl: Initial job has not accepted any resources

Andrew Vykhodtsev Tue, 16 Aug 2016 16:24:50 -0700

Dear all,

I am trying to connect a remote windows machine to a standalone spark
cluster (a single VM running on Ubuntu server with 8 cores and 64GB RAM).
Both client and server have Spark 2.0 software prebuilt for Hadoop 2.6, and
hadoop 2.7


I have the following settings on cluster:

export SPARK_WORKER_MEMORY=32G
export SPARK_WORKER_CORES=8

and the following settings on client (spark-defaults.conf)

spark.driver.memory              4g
spark.executor.memory            8g
spark.executor.cores              2


When I start pyspark, everything works smoothly. In Spark UI, I see that my
app is running and has 4 executors attached to it, each with 2 cores and 8g
of memory.

However, when I try to read some HDFS files, it hangs and gives me the
following message in the loop.

>>> df = sqlContext.read.parquet('/projects/kaggle-bimbo/dataset_full.pqt')

16/08/17 01:04:34 WARN DomainSocketFactory: The short-circuit local reads
feature cannot be used because UNIX Domain sockets are not available on
Windows.
16/08/17 01:04:52 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient resources

I go to Spark UI again and see that it actually tries to start another set
of 4 executors . These executors hang for some time, fail and start again.
So when this app is left alone it generates many executors in the status
"EXITED". Nothing really happens. If I go to application UI, it just shows
me
Stages : 0/1 (1 failed) and
TasksNo tasks have started yet

Is it a bug or am I doing something wrong? looks like re-occurence of
https://issues.apache.org/jira/browse/SPARK-2260

Can't connect to remote spark standalone cluster: getting WARN TaskSchedulerImpl: Initial job has not accepted any resources

Reply via email to