Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
Hi Tom,
   Thank you for your response. I  have double checked that I had upload
both jar in the same folder on hdfs. I think the namefs.default.name/name
you pointed out is the old deprecated name for fs.defaultFS config
accordiing
http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.
 Anyway, we have tried both
fs.default.name and  fs.defaultFS set to hdfs namenode, and the situation
remained same. And we have removed SPARK_HOME env variable on worker node.
 An additional information might be related is that job submission is done
on the same machine of hdfs namenode.  But I'm not sure this will cause the
problem.

Thanks,
Jiacheng Guo


On Tue, Nov 19, 2013 at 11:50 AM, Tom Graves tgraves...@yahoo.com wrote:

 Sorry for the delay. What is the default filesystem on your HDFS setup?
  It looks like its set to file: rather then hdfs://.  That is the only
 reason I can think its listing the directory as  
 file:/home/work/.sparkStaging/application_1384588058297_0056.
  Its basically just copying it local rather then uploading to hdfs and its
 just trying to use the local
 file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar.
 It generally would create that in hdfs so it accessible on all the nodes.
  Is your /home/work nfs mounted on all the nodes?

 You can find the default fs by looking at the Hadoop config files.
  Generally in core-site.xml.  its specified by: name
 fs.default.name/name

 Its pretty odd if those are its erroring with file:// when you specified
 hdfs://.
 when you tried the hdfs:// did you upload both the spark jar and your
 client jar (SparkAUC-assembly-0.1.jar)?  If not try that and make sure to
 put hdfs:// on them when you export SPARK_JAR and specify the --jar option.


 I'll try to reproduce the error tomorrow to see if a bug was introduced
 when I added the feature to run spark from HDFS.

 Tom


   On Monday, November 18, 2013 11:13 AM, guojc guoj...@gmail.com wrote:
  Hi Tom,
I'm on Hadoop 2.05.  I can launch application spark 0.8 release
 normally. However I switch to git master branch version with application
 built with it, I got the jar not found exception and same happens to the
 example application. I have tried both file:// protocol and hdfs://
 protocol with jar in local file system and hdfs respectively, and even
 tried jar list parameter when new spark context.  The exception is slightly
 different for hdfs protocol and local file path. My application launch
 command is

  
 SPARK_JAR=/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 /home/work/guojiacheng/spark/spark-class
  org.apache.spark.deploy.yarn.Client --jar
 /home/work/guojiacheng/spark-auc/target/scala-2.9.3/SparkAUC-assembly-0.1.jar
 --class  myClass.SparkAUC --args -c --args yarn-standalone  --args -i
 --args hdfs://{hdfs_host}:9000/user/work/guojiacheng/data --args -m --args
 hdfs://{hdfs_host}:9000/user/work/guojiacheng/model_large --args -o --args
 hdfs://{hdfs_host}:9000/user/work/guojiacheng/score --num-workers 60
  --master-memory 6g --worker-memory 7g --worker-cores 1

 And my build command is SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true
 sbt/sbt assembly

 Only thing I can think of might be related is on each cluster node, it has
 a env SPARK_HOME point to a copy of 0.8 version's position, and its bin
 fold is in Path environment variable. And 0.9 version is not there.  It was
 something left over, when cluster was setup.  But I don't know whether it
 is related, as my understand is the yarn version try to distribute spark
 through yarn.

 hdfs version error message:

  appDiagnostics: Application application_1384588058297_0056 failed
 1 times due to AM Container for appattempt_1384588058297_0056_01 exited
 with  exitCode: -1000 due to: RemoteTrace:
 java.io.FileNotFoundException: File
 file:/home/work/.sparkStaging/application_1384588058297_0056/SparkAUC-assembly-0.1.jar
 does not exist

 local version error message.
 appDiagnostics: Application application_1384588058297_0066 failed 1 times
 due to AM Container for appattempt_1384588058297_0066_01 exited with
  exitCode: -1000 due to: java.io.FileNotFoundException: File
 file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 does not exist

 Best Regards,
 Jiacheng GUo



 On Mon, Nov 18, 2013 at 10:34 PM, Tom Graves tgraves...@yahoo.com wrote:

 Hey Jiacheng Guo,

 do you have SPARK_EXAMPLES_JAR env variable set?  If you do, you have to
 add the --addJars parameter to the yarn client and point to the spark
 examples jar.  Or just unset SPARK_EXAMPLES_JAR env variable.

 You should only have to set SPARK_JAR env variable.

 If that isn't the issue let me know the build command you used and hadoop
 version, and your defaultFs or hadoop.

 Tom


   On 

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread Tom Graves
The property is deprecated but will still work. Either one is fine.

Launching the job from the namenode is fine . 

I brought up a cluster with 2.0.5-alpha and built the latest spark master 
branch and it runs fine for me. It looks like namenode 2.0.5-alpha won't even 
start with the defaulFs of file:///.  Please make sure your namenode is 
actually up and running and you are pointing to it because you can run some 
jobs successfully without it (on a single node cluster), but when you have a 
multinode cluster  here is the error I get when I run without a namenode up and 
it looks very similar to your error message:

        appDiagnostics: Application application_1384876319080_0001 failed 1 
times due to AM Container for appattempt_1384876319080_0001_01 exited with  
exitCode: -1000 due to: java.io.FileNotFoundException: File 
file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 does not exist


When you changed the default fs config did you restart the cluster?


Can you try just running the examples jar:

SPARK_JAR=assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

./spark-class  org.apache.spark.deploy.yarn.Client --jar 
examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
  --class org.apache.spark.examples.SparkPi  --args yarn-standalone  
--num-workers 2  --master-memory 2g --worker-memory 2g --worker-cores 1

On the client side you should see messages like this:
13/11/19 15:41:30 INFO yarn.Client: Uploading 
file:/home/tgravescs/spark-master/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 to 
hdfs://namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
13/11/19 15:41:31 INFO yarn.Client: Uploading 
file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 to 
hdfs://namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

Tom



On Tuesday, November 19, 2013 5:35 AM, guojc guoj...@gmail.com wrote:
 
Hi Tom,
   Thank you for your response. I  have double checked that I had upload both 
jar in the same folder on hdfs. I think the namefs.default.name/name you 
pointed out is the old deprecated name for fs.defaultFS config accordiing  
http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
 .  Anyway, we have tried both  fs.default.name and  fs.defaultFS set to hdfs 
namenode, and the situation remained same. And we have removed SPARK_HOME env 
variable on worker node.  An additional information might be related is that 
job submission is done on the same machine of hdfs namenode.  But I'm not sure 
this will cause the problem.

Thanks,
Jiacheng Guo



On Tue, Nov 19, 2013 at 11:50 AM, Tom Graves tgraves...@yahoo.com wrote:

Sorry for the delay. What is the default filesystem on your HDFS setup?  It 
looks like its set to file: rather then hdfs://.  That is the only reason I can 
think its listing the directory as  
file:/home/work/.sparkStaging/application_1384588058297_0056.  Its basically 
just copying it local rather then uploading to hdfs and its just trying to use 
the local  
file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar.
  It generally would create that in hdfs so it accessible on all the nodes.  Is 
your /home/work nfs mounted on all the nodes?    


You can find the default fs by looking at the Hadoop config files.  Generally 
in core-site.xml.  its specified by:         namefs.default.name/name


Its pretty odd if those are its erroring with file:// when you specified 
hdfs://.
when you tried the hdfs:// did you upload both the spark jar and your client 
jar (SparkAUC-assembly-0.1.jar)?  If not try that and make sure to put hdfs:// 
on them when you export SPARK_JAR and specify the --jar option.  



I'll try to reproduce the error tomorrow to see if a bug was introduced when I 
added the feature to run spark from HDFS.


Tom



On Monday, November 18, 2013 11:13 AM, guojc guoj...@gmail.com wrote:
 
Hi Tom,
   I'm on Hadoop 2.05.  I can launch application spark 0.8 release normally. 
However I switch to git master branch version with application built with it, 
I got the jar not found exception and same happens to the example application. 
I have tried both file:// protocol and hdfs:// protocol with jar in local file 
system and hdfs respectively, and even tried jar list parameter when new spark 
context.  The exception is slightly different for hdfs protocol and local file 
path. My application launch command is   


 
SPARK_JAR=/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
Hi Tom,
 Thank you for your help. I finally found the problem. It's a silly
mistake for me. After checkout git repository, I forgot to change the
spark-env.sh under conf folder to add yarn config folder. I guess it might
be helpful to display warning message about that. Anyway, thank you for
your kindness for helping me ruling out the problem.

Best Regards,
Jiacheng Guo


On Tue, Nov 19, 2013 at 11:55 PM, Tom Graves tgraves...@yahoo.com wrote:

 The property is deprecated but will still work. Either one is fine.

 Launching the job from the namenode is fine .

 I brought up a cluster with 2.0.5-alpha and built the latest spark master
 branch and it runs fine for me. It looks like namenode 2.0.5-alpha won't
 even start with the defaulFs of file:///.  Please make sure your namenode
 is actually up and running and you are pointing to it because you can run
 some jobs successfully without it (on a single node cluster), but when you
 have a multinode cluster  here is the error I get when I run without a
 namenode up and it looks very similar to your error message:

 appDiagnostics: Application application_1384876319080_0001 failed
 1 times due to AM Container for appattempt_1384876319080_0001_01 exited
 with  exitCode: -1000 due to: java.io.FileNotFoundException: File
 file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 does not exist

 When you changed the default fs config did you restart the cluster?

 Can you try just running the examples jar:


 SPARK_JAR=assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

 ./spark-class  org.apache.spark.deploy.yarn.Client --jar
 examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
  --class org.apache.spark.examples.SparkPi  --args yarn-standalone
  --num-workers 2  --master-memory 2g --worker-memory 2g --worker-cores 1

 On the client side you should see messages like this:
 13/11/19 15:41:30 INFO yarn.Client: Uploading
 file:/home/tgravescs/spark-master/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 to hdfs://
 namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 13/11/19 15:41:31 INFO yarn.Client: Uploading
 file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
 to hdfs://
 namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

 Tom


   On Tuesday, November 19, 2013 5:35 AM, guojc guoj...@gmail.com wrote:
  Hi Tom,
Thank you for your response. I  have double checked that I had upload
 both jar in the same folder on hdfs. I think the namefs.default.name/name
 you pointed out is the old deprecated name for fs.defaultFS config
 accordiing
 http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.
   Anyway, we have tried both
 fs.default.name and  fs.defaultFS set to hdfs namenode, and the situation
 remained same. And we have removed SPARK_HOME env variable on worker node.
  An additional information might be related is that job submission is done
 on the same machine of hdfs namenode.  But I'm not sure this will cause the
 problem.

 Thanks,
 Jiacheng Guo


 On Tue, Nov 19, 2013 at 11:50 AM, Tom Graves tgraves...@yahoo.com wrote:

 Sorry for the delay. What is the default filesystem on your HDFS setup?
  It looks like its set to file: rather then hdfs://.  That is the only
 reason I can think its listing the directory as  
 file:/home/work/.sparkStaging/application_1384588058297_0056.
  Its basically just copying it local rather then uploading to hdfs and its
 just trying to use the local
 file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar.
 It generally would create that in hdfs so it accessible on all the nodes.
  Is your /home/work nfs mounted on all the nodes?

 You can find the default fs by looking at the Hadoop config files.
  Generally in core-site.xml.  its specified by: name
 fs.default.name/name

 Its pretty odd if those are its erroring with file:// when you specified
 hdfs://.
 when you tried the hdfs:// did you upload both the spark jar and your
 client jar (SparkAUC-assembly-0.1.jar)?  If not try that and make sure to
 put hdfs:// on them when you export SPARK_JAR and specify the --jar option.


 I'll try to reproduce the error tomorrow to see if a bug was introduced
 when I added the feature to run spark from HDFS.

 Tom


   On Monday, November 18, 2013 11:13 AM, guojc guoj...@gmail.com wrote:
  Hi Tom,
I'm on Hadoop 2.05.  I can launch application spark 0.8 release
 normally. However I switch to git master branch version with application
 built with it, I got the jar not found 

Re: App master failed to find application jar in the master branch on YARN

2013-11-18 Thread Tom Graves
Hey Jiacheng Guo,

do you have SPARK_EXAMPLES_JAR env variable set?  If you do, you have to add 
the --addJars parameter to the yarn client and point to the spark examples jar. 
 Or just unset SPARK_EXAMPLES_JAR env variable.

You should only have to set SPARK_JAR env variable.  

If that isn't the issue let me know the build command you used and hadoop 
version, and your defaultFs or hadoop.

Tom



On Saturday, November 16, 2013 2:32 AM, guojc guoj...@gmail.com wrote:
 
hi,
   After reading about the exiting progress in consolidating shuffle, I'm eager 
to trying out the last master branch. However up to launch the example 
application, the job failed with prompt the app master failed to find the 
target jar. appDiagnostics: Application application_1384588058297_0017 failed 1 
times due to AM Container for appattempt_1384588058297_0017_01 exited with  
exitCode: -1000 due to: java.io.FileNotFoundException: File 
file:/${my_work_dir}/spark/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 does not exist.

  Is there any change on how to launch a yarn job now?

Best Regards,
Jiacheng Guo

Re: App master failed to find application jar in the master branch on YARN

2013-11-18 Thread guojc
Hi Tom,
   I'm on Hadoop 2.05.  I can launch application spark 0.8 release
normally. However I switch to git master branch version with application
built with it, I got the jar not found exception and same happens to the
example application. I have tried both file:// protocol and hdfs://
protocol with jar in local file system and hdfs respectively, and even
tried jar list parameter when new spark context.  The exception is slightly
different for hdfs protocol and local file path. My application launch
command is

 
SPARK_JAR=/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
/home/work/guojiacheng/spark/spark-class
 org.apache.spark.deploy.yarn.Client --jar
/home/work/guojiacheng/spark-auc/target/scala-2.9.3/SparkAUC-assembly-0.1.jar
--class  myClass.SparkAUC --args -c --args yarn-standalone  --args -i
--args hdfs://{hdfs_host}:9000/user/work/guojiacheng/data --args -m --args
hdfs://{hdfs_host}:9000/user/work/guojiacheng/model_large --args -o --args
hdfs://{hdfs_host}:9000/user/work/guojiacheng/score --num-workers 60
 --master-memory 6g --worker-memory 7g --worker-cores 1

And my build command is SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true
sbt/sbt assembly

Only thing I can think of might be related is on each cluster node, it has
a env SPARK_HOME point to a copy of 0.8 version's position, and its bin
fold is in Path environment variable. And 0.9 version is not there.  It was
something left over, when cluster was setup.  But I don't know whether it
is related, as my understand is the yarn version try to distribute spark
through yarn.

hdfs version error message:

 appDiagnostics: Application application_1384588058297_0056 failed
1 times due to AM Container for appattempt_1384588058297_0056_01 exited
with  exitCode: -1000 due to: RemoteTrace:
java.io.FileNotFoundException: File
file:/home/work/.sparkStaging/application_1384588058297_0056/SparkAUC-assembly-0.1.jar
does not exist

local version error message.
appDiagnostics: Application application_1384588058297_0066 failed 1 times
due to AM Container for appattempt_1384588058297_0066_01 exited with
 exitCode: -1000 due to: java.io.FileNotFoundException: File
file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
does not exist

Best Regards,
Jiacheng GUo



On Mon, Nov 18, 2013 at 10:34 PM, Tom Graves tgraves...@yahoo.com wrote:

 Hey Jiacheng Guo,

 do you have SPARK_EXAMPLES_JAR env variable set?  If you do, you have to
 add the --addJars parameter to the yarn client and point to the spark
 examples jar.  Or just unset SPARK_EXAMPLES_JAR env variable.

 You should only have to set SPARK_JAR env variable.

 If that isn't the issue let me know the build command you used and hadoop
 version, and your defaultFs or hadoop.

 Tom


   On Saturday, November 16, 2013 2:32 AM, guojc guoj...@gmail.com wrote:
  hi,
After reading about the exiting progress in consolidating shuffle, I'm
 eager to trying out the last master branch. However up to launch the
 example application, the job failed with prompt the app master failed to
 find the target jar. appDiagnostics: Application
 application_1384588058297_0017 failed 1 times due to AM Container for
 appattempt_1384588058297_0017_01 exited with  exitCode: -1000 due to:
 java.io.FileNotFoundException: File
 file:/${my_work_dir}/spark/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 does not exist.

   Is there any change on how to launch a yarn job now?

 Best Regards,
 Jiacheng Guo






App master failed to find application jar in the master branch on YARN

2013-11-16 Thread guojc
hi,
   After reading about the exiting progress in consolidating shuffle, I'm
eager to trying out the last master branch. However up to launch the
example application, the job failed with prompt the app master failed to
find the target jar. appDiagnostics: Application
application_1384588058297_0017 failed 1 times due to AM Container for
appattempt_1384588058297_0017_01 exited with  exitCode: -1000 due to:
java.io.FileNotFoundException: File
file:/${my_work_dir}/spark/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
does not exist.

  Is there any change on how to launch a yarn job now?

Best Regards,
Jiacheng Guo