Hi,

I've had a bit of trouble getting Spark on YARN to work. When executing in
this mode and submitting from outside the cluster, one must set
HADOOP_CONF_DIR or YARN_CONF_DIR
<https://spark.apache.org/docs/latest/running-on-yarn.html>, from which
spark-submit can find the params it needs to locate and talk to the YARN
application manager.

However, Spark also packages up all the Hadoop+YARN config files, ships
them to the cluster, and then uses them there.

Does it only override settings on the cluster using those shipped files? Or
does it use those entirely instead of the config the cluster already has?

My impression is that it currently replaces rather than overrides, which
means you can't construct a minimal client-side Hadoop/YARN config with
only the properties necessary to find the cluster. Is that right?

Reply via email to