Re: How to run spark streaming application on YARN?

2015-06-05 Thread Saiph Kappa
I was able to run my application by just using an hadoop/YARN cluster with
1 machine. Today I tried to extend the cluster to use one more machine, but
I got some problems on the yarn node manager of that new added machine:

Node Manager Log:
«
2015-06-06 01:41:33,379 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Initializing user myuser
2015-06-06 01:41:33,382 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
from
/tmp/hadoop-myuser/nm-local-dir/nmPrivate/container_1433549642381_0004_01_03.tokens
to
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004/container_1433549642381_0004_01_03.tokens
2015-06-06 01:41:33,382 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Localizer CWD set to
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004
=
file:/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1433549642381_0004
2015-06-06 01:41:33,405 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
{
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar,
1433441011000, FILE, null } failed: Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar
changed on src filesystem (expected 1433441011000, was 1433531913000
java.io.IOException: Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar
changed on src filesystem (expected 1433441011000, was 1433531913000
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:255)
at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2015-06-06 01:41:33,405 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar(-/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/filecache/15/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar)
transitioned from DOWNLOADING to FAILED
2015-06-06 01:41:33,406 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1433549642381_0004_01_03 transitioned from
LOCALIZING to LOCALIZATION_FAILED
2015-06-06 01:41:33,406 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Container container_1433549642381_0004_01_03 sent RELEASE event on a
resource request {
file:/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar,
1433441011000, FILE, null } not present in cache.
2015-06-06 01:41:33,406 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send rpc request to server
»

I have this jar on both machines:
/home/myuser/my-spark/assembly/target/scala-2.10/spark-assembly-1.3.2-SNAPSHOT-hadoop2.7.0.jar

However, I simply copied my-spark folder from machine1 to machine2, so that
YARN could find the jar

Any ideas of what can be wrong? Isn't this the correct way to share spark
jars across YARN cluster?

Thanks.

On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa saiph.ka...@gmail.com wrote:

 Additionally, I think this document (
 https://spark.apache.org/docs/latest/building-spark.html ) should mention
 that the protobuf.version might need to be changed to match the one used in
 the chosen hadoop version. For instance, with hadoop 2.7.0 I had to change
 protobuf.version to 1.5.0 to be able to run my application.

 On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 That might work, but there might also be other steps that are required.

 -Sandy

 On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa saiph.ka...@gmail.com
 wrote:

 Thanks! It is working fine now with spark-submit. Just out of curiosity,
 how would you use org.apache.spark.deploy.yarn.Client? Adding that
 spark_yarn jar 

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Thanks! It is working fine now with spark-submit. Just out of curiosity,
how would you use org.apache.spark.deploy.yarn.Client? Adding that
spark_yarn jar to the configuration inside the application?

On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov vvs...@gmail.com wrote:

 You should run it with spark-submit or using org
 .apache.spark.deploy.yarn.Client.

 2015-06-04 20:30 GMT+03:00 Saiph Kappa saiph.ka...@gmail.com:

 No, I am not. I run it with sbt «sbt run-main Branchmark». I thought it
 was the same thing since I am passing all the configurations through the
 application code. Is that the problem?

 On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Saiph,

 Are you launching using spark-submit?

 -Sandy

 On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa saiph.ka...@gmail.com
 wrote:

 Hi,

 I've been running my spark streaming application in standalone mode
 without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
 but I am having some problems.

 Here are the config parameters of my application:
 «
 val sparkConf = new SparkConf()

 sparkConf.setMaster(yarn-client)
 sparkConf.set(spark.yarn.am.memory, 2g)
 sparkConf.set(spark.executor.instances, 2)

 sparkConf.setAppName(Benchmark)

 sparkConf.setJars(Array(target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar))
 sparkConf.set(spark.executor.memory, 4g)
 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.executor.extraJavaOptions, 
 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC  +
   -XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300
 )
 if (sparkConf.getOption(spark.master) == None) {
   sparkConf.setMaster(local[*])
 }
 »

 The jar I'm including there only contains the application classes.


 Here is the log of the application: http://pastebin.com/7RSktezA

 Here is the userlog on hadoop/YARN:
 «
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/Logging
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 14 more
 »

 I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
 persists. Am I doing something wrong?

 Thanks.







Re: How to run spark streaming application on YARN?

2015-06-04 Thread Sandy Ryza
That might work, but there might also be other steps that are required.

-Sandy

On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa saiph.ka...@gmail.com wrote:

 Thanks! It is working fine now with spark-submit. Just out of curiosity,
 how would you use org.apache.spark.deploy.yarn.Client? Adding that
 spark_yarn jar to the configuration inside the application?

 On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov vvs...@gmail.com wrote:

 You should run it with spark-submit or using org
 .apache.spark.deploy.yarn.Client.

 2015-06-04 20:30 GMT+03:00 Saiph Kappa saiph.ka...@gmail.com:

 No, I am not. I run it with sbt «sbt run-main Branchmark». I thought
 it was the same thing since I am passing all the configurations through the
 application code. Is that the problem?

 On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Saiph,

 Are you launching using spark-submit?

 -Sandy

 On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa saiph.ka...@gmail.com
 wrote:

 Hi,

 I've been running my spark streaming application in standalone mode
 without any worries. Now, I've been trying to run it on YARN (hadoop 
 2.7.0)
 but I am having some problems.

 Here are the config parameters of my application:
 «
 val sparkConf = new SparkConf()

 sparkConf.setMaster(yarn-client)
 sparkConf.set(spark.yarn.am.memory, 2g)
 sparkConf.set(spark.executor.instances, 2)

 sparkConf.setAppName(Benchmark)

 sparkConf.setJars(Array(target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar))
 sparkConf.set(spark.executor.memory, 4g)
 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.executor.extraJavaOptions, 
 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC  +
   -XX:+AggressiveOpts -XX:FreqInlineSize=300
 -XX:MaxInlineSize=300 )
 if (sparkConf.getOption(spark.master) == None) {
   sparkConf.setMaster(local[*])
 }
 »

 The jar I'm including there only contains the application classes.


 Here is the log of the application: http://pastebin.com/7RSktezA

 Here is the userlog on hadoop/YARN:
 «
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/Logging
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 14 more
 »

 I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
 persists. Am I doing something wrong?

 Thanks.








Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Additionally, I think this document (
https://spark.apache.org/docs/latest/building-spark.html ) should mention
that the protobuf.version might need to be changed to match the one used in
the chosen hadoop version. For instance, with hadoop 2.7.0 I had to change
protobuf.version to 1.5.0 to be able to run my application.

On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 That might work, but there might also be other steps that are required.

 -Sandy

 On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa saiph.ka...@gmail.com
 wrote:

 Thanks! It is working fine now with spark-submit. Just out of curiosity,
 how would you use org.apache.spark.deploy.yarn.Client? Adding that
 spark_yarn jar to the configuration inside the application?

 On Thu, Jun 4, 2015 at 6:37 PM, Vova Shelgunov vvs...@gmail.com wrote:

 You should run it with spark-submit or using org
 .apache.spark.deploy.yarn.Client.

 2015-06-04 20:30 GMT+03:00 Saiph Kappa saiph.ka...@gmail.com:

 No, I am not. I run it with sbt «sbt run-main Branchmark». I thought
 it was the same thing since I am passing all the configurations through the
 application code. Is that the problem?

 On Thu, Jun 4, 2015 at 6:26 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Hi Saiph,

 Are you launching using spark-submit?

 -Sandy

 On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa saiph.ka...@gmail.com
 wrote:

 Hi,

 I've been running my spark streaming application in standalone mode
 without any worries. Now, I've been trying to run it on YARN (hadoop 
 2.7.0)
 but I am having some problems.

 Here are the config parameters of my application:
 «
 val sparkConf = new SparkConf()

 sparkConf.setMaster(yarn-client)
 sparkConf.set(spark.yarn.am.memory, 2g)
 sparkConf.set(spark.executor.instances, 2)

 sparkConf.setAppName(Benchmark)

 sparkConf.setJars(Array(target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar))
 sparkConf.set(spark.executor.memory, 4g)
 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.executor.extraJavaOptions, 
 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC  +
   -XX:+AggressiveOpts -XX:FreqInlineSize=300
 -XX:MaxInlineSize=300 )
 if (sparkConf.getOption(spark.master) == None) {
   sparkConf.setMaster(local[*])
 }
 »

 The jar I'm including there only contains the application classes.


 Here is the log of the application: http://pastebin.com/7RSktezA

 Here is the userlog on hadoop/YARN:
 «
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/Logging
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:596)
 at
 org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
 Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 14 more
 »

 I tried to add the spark core jar to ${HADOOP_HOME}/lib but the error
 persists. Am I doing something wrong?

 Thanks.