ll=true --conf
spark.yarn.executor.memoryOverhead=512M
Additionally, executor and memory have dedicated options:
pyspark --master yarn-client --conf spark.shuffle.spill=true --conf
spark.yarn.executor.memoryOverhead=512M --driver-memory 3G
--executor-memory 5G
-Sandy
On Tue, Sep 16,
Hello friends:
Yesterday I compiled Spark 1.1.0 against CDH5's Hadoop/YARN
distribution. Everything went fine, and everything seems
to work, but for the following.
Following are two invocations of the 'pyspark' script, one with
enclosing quotes around the options passed to
'--driver-java-op
Hello friends:
It was mentioned in another (Y.A.R.N.-centric) email thread that
'SPARK_JAR' was deprecated,
and to use the 'spark.yarn.jar' property instead for YARN submission.
For example:
user$ pyspark [some-options] --driver-java-options
spark.yarn.jar=hdfs://namenode:8020/path/to/spa
Hi:
Curious... is there any reason not to use one of the below pyspark options
(in red)? Assuming each file is, say 10k in size, is 50 files too much?
Does that touch on some practical limitation?
Usage: ./bin/pyspark [options]
Options:
--master MASTER_URL spark://host:port, mesos://h
Hello friends:
I have a follow-up to Andrew's well articulated answer below (thank you
for that).
(1) I've seen both of these invocations in various places:
(a) '--master yarn'
(b) '--master yarn-client'
the latter of which doesn't appear in
'/pyspark//|//spark-submit|spark-