Hello,

I want to run Spark or Flink jobs from a client (remote desktop) onto a
YARN cluster. Another example will be if I am running a YARN cluster on
VMs, then I would like to use the host OS as the client to submit Spark
Jobs to the VM YARN cluster.

What are the easiest ways to set the YARN_CONF_DIR environment variable on
the client machine so that it can submit Spark jobs to the YARN cluster?

>From reading online documents, I believe I am supposed to set the client's
YARN_CONF_DIR environment variable to $HADOOP_HOME/etc/hadoop or
$HADOOP_HOME/etc/hadoop/conf. However, I do not understand how do I get the
value of HADOOP_HOME, do i need to set this value on every machine in the
cluster, and how my client machine will know how to locate the NameNode in
the cluster?

Also, does $HADOOP_HOME/etc/hadoop have to be the same on every node in the
cluster, or is it on a special node, like NameNode or ResourceManager?

I have read there is an easier way by copying the /etc/hadoop contents into
the client machine, and then setting the client's YARN_CONF_DIR to that
location. Can someone please explain how to do this? Which node in my
cluster should I copy the /etc/hadoop contents from? Would this also work
if my client can only contact the cluster via ssh?

Thanks!

Piper

Reply via email to