OK, rebuilding the assembly jar file with cdh5 works now...
Thanks..

-Simon


On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xche...@gmail.com> wrote:

> That helped a bit... Now I have a different failure: the start up process
> is stuck in an infinite loop outputting the following message:
>
> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1401672868277
>  yarnAppState: ACCEPTED
>
> I am using the hadoop 2 prebuild package. Probably it doesn't have the
> latest yarn client.
>
> -Simon
>
>
>
>
> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
>
>> As a debugging step, does it work if you use a single resource manager
>> with the key "yarn.resourcemanager.address" instead of using two named
>> resource managers? I wonder if somehow the YARN client can't detect
>> this multi-master set-up.
>>
>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xche...@gmail.com>
>> wrote:
>> > Note that everything works fine in spark 0.9, which is packaged in
>> CDH5: I
>> > can launch a spark-shell and interact with workers spawned on my yarn
>> > cluster.
>> >
>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
>> >     ...
>> >     <property>
>> >         <name>yarn.resourcemanager.address.rm1</name>
>> >         <value>controller-1.mycomp.com:23140</value>
>> >     </property>
>> >     ...
>> >     <property>
>> >         <name>yarn.resourcemanager.address.rm2</name>
>> >         <value>controller-2.mycomp.com:23140</value>
>> >     </property>
>> >     ...
>> >
>> > And the other usual stuff.
>> >
>> > So spark 1.0 is launched like this:
>> > Spark Command: java -cp
>> >
>> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
>> --class
>> > org.apache.spark.repl.Main
>> >
>> > I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>> >
>> > Thanks..
>> > -Simon
>> >
>> >
>> >
>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>> >>
>> >> I would agree with your guess, it looks like the yarn library isn't
>> >> correctly finding your yarn-site.xml file. If you look in
>> >> yarn-site.xml do you definitely the resource manager
>> >> address/addresses?
>> >>
>> >> Also, you can try running this command with
>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> >> set-up correctly.
>> >>
>> >> - Patrick
>> >>
>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xche...@gmail.com>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > I tried a couple ways, but couldn't get it to work..
>> >> >
>> >> > The following seems to be what the online document
>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> >> > suggesting:
>> >> >
>> >> >
>> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >> >
>> >> > Help info of spark-shell seems to be suggesting "--master yarn
>> >> > --deploy-mode
>> >> > cluster".
>> >> >
>> >> > But either way, I am seeing the following messages:
>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
>> at
>> >> > /0.0.0.0:8032
>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>> SECONDS)
>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>> SECONDS)
>> >> >
>> >> > My guess is that spark-shell is trying to talk to resource manager to
>> >> > setup
>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
>> from
>> >> > though. I am running CDH5 with two resource managers in HA mode.
>> Their
>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >> >
>> >> > Any ideas? Thanks.
>> >> > -Simon
>> >
>> >
>>
>
>

Reply via email to