OK, rebuilding the assembly jar file with cdh5 works now... Thanks.. -Simon
On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xche...@gmail.com> wrote: > That helped a bit... Now I have a different failure: the start up process > is stuck in an infinite loop outputting the following message: > > 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > appMasterRpcPort: -1 > appStartTime: 1401672868277 > yarnAppState: ACCEPTED > > I am using the hadoop 2 prebuild package. Probably it doesn't have the > latest yarn client. > > -Simon > > > > > On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pwend...@gmail.com> > wrote: > >> As a debugging step, does it work if you use a single resource manager >> with the key "yarn.resourcemanager.address" instead of using two named >> resource managers? I wonder if somehow the YARN client can't detect >> this multi-master set-up. >> >> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xche...@gmail.com> >> wrote: >> > Note that everything works fine in spark 0.9, which is packaged in >> CDH5: I >> > can launch a spark-shell and interact with workers spawned on my yarn >> > cluster. >> > >> > So in my /opt/hadoop/conf/yarn-site.xml, I have: >> > ... >> > <property> >> > <name>yarn.resourcemanager.address.rm1</name> >> > <value>controller-1.mycomp.com:23140</value> >> > </property> >> > ... >> > <property> >> > <name>yarn.resourcemanager.address.rm2</name> >> > <value>controller-2.mycomp.com:23140</value> >> > </property> >> > ... >> > >> > And the other usual stuff. >> > >> > So spark 1.0 is launched like this: >> > Spark Command: java -cp >> > >> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf >> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m >> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client >> --class >> > org.apache.spark.repl.Main >> > >> > I do see "/opt/hadoop/conf" included, but not sure it's the right place. >> > >> > Thanks.. >> > -Simon >> > >> > >> > >> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> >> >> >> I would agree with your guess, it looks like the yarn library isn't >> >> correctly finding your yarn-site.xml file. If you look in >> >> yarn-site.xml do you definitely the resource manager >> >> address/addresses? >> >> >> >> Also, you can try running this command with >> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being >> >> set-up correctly. >> >> >> >> - Patrick >> >> >> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xche...@gmail.com> >> >> wrote: >> >> > Hi all, >> >> > >> >> > I tried a couple ways, but couldn't get it to work.. >> >> > >> >> > The following seems to be what the online document >> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is >> >> > suggesting: >> >> > >> >> > >> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar >> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client >> >> > >> >> > Help info of spark-shell seems to be suggesting "--master yarn >> >> > --deploy-mode >> >> > cluster". >> >> > >> >> > But either way, I am seeing the following messages: >> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager >> at >> >> > /0.0.0.0:8032 >> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server: >> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is >> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >> SECONDS) >> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server: >> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is >> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >> SECONDS) >> >> > >> >> > My guess is that spark-shell is trying to talk to resource manager to >> >> > setup >> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came >> from >> >> > though. I am running CDH5 with two resource managers in HA mode. >> Their >> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both >> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up. >> >> > >> >> > Any ideas? Thanks. >> >> > -Simon >> > >> > >> > >