Hi Piper, Just set HADOOP_CONF_DIR should work. Did you try that? BR, Zhankun
On Fri, 6 Dec 2019 at 00:43, Piper Piper <piperfl...@gmail.com> wrote: > Hello, > > I want to run Spark or Flink jobs from a client (remote desktop) onto a > YARN cluster. Another example will be if I am running a YARN cluster on > VMs, then I would like to use the host OS as the client to submit Spark > Jobs to the VM YARN cluster. > > What are the easiest ways to set the YARN_CONF_DIR environment variable on > the client machine so that it can submit Spark jobs to the YARN cluster? > > From reading online documents, I believe I am supposed to set the client's > YARN_CONF_DIR environment variable to $HADOOP_HOME/etc/hadoop or > $HADOOP_HOME/etc/hadoop/conf. However, I do not understand how do I get the > value of HADOOP_HOME, do i need to set this value on every machine in the > cluster, and how my client machine will know how to locate the NameNode in > the cluster? > > Also, does $HADOOP_HOME/etc/hadoop have to be the same on every node in > the cluster, or is it on a special node, like NameNode or ResourceManager? > > I have read there is an easier way by copying the /etc/hadoop contents > into the client machine, and then setting the client's YARN_CONF_DIR to > that location. Can someone please explain how to do this? Which node in my > cluster should I copy the /etc/hadoop contents from? Would this also work > if my client can only contact the cluster via ssh? > > Thanks! > > Piper > > >