Thanks Akhil.  Both ways work for me, but I'd like to know why that
exception was thrown. The class HBaseApp and related class were all
contained in my application jar, why was *com.xt.scala.HBaseApp$$*
*anonfun$testHBase$1* not found ?

2014-10-13 14:53 GMT+08:00 Akhil Das <ak...@sigmoidanalytics.com>:

> Adding your application jar to the sparkContext will resolve this issue.
>
> Eg:
> sparkContext.addJar("./target/scala-2.10/myTestApp_2.10-1.0.jar")
>
> Thanks
> Best Regards
>
> On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao <xiaotao.cs....@gmail.com>
> wrote:
>
>> In the beginning I tried to read HBase and found that exception was
>> thrown, then I start to debug the app. I removed the codes reading HBase
>> and tried to save an rdd containing a list and the exception was still
>> thrown. So I'm sure that exception was not caused by reading HBase.
>>
>> While debugging I did not change the object name and file name.
>>
>>
>>
>> 2014-10-13 0:00 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>>
>>> Your app is named scala.HBaseApp
>>> Does it read / write to HBase ?
>>>
>>> Just curious.
>>>
>>> On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao <xiaotao.cs....@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm using CDH 5.0.1 (Spark 0.9)  and submitting a job in Spark
>>>> Standalone Cluster mode.
>>>>
>>>> The job is quite simple as follows:
>>>>
>>>>   object HBaseApp {
>>>>     def main(args:Array[String]) {
>>>>         testHBase("student", "/test/xt/saveRDD")
>>>>     }
>>>>
>>>>
>>>>     def testHBase(tableName: String, outFile:String) {
>>>>       val sparkConf = new SparkConf()
>>>>             .setAppName("-- Test HBase --")
>>>>             .set("spark.executor.memory", "2g")
>>>>             .set("spark.cores.max", "16")
>>>>
>>>>       val sparkContext = new SparkContext(sparkConf)
>>>>
>>>>       val rdd = sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3)
>>>>
>>>>       val c = rdd.count     // successful
>>>>       println("\n\n\n"  + c + "\n\n\n")
>>>>
>>>>       rdd.saveAsTextFile(outFile)  // This line will throw
>>>> "java.lang.ClassNotFoundException:
>>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1"
>>>>
>>>>       println("\n  down  \n")
>>>>     }
>>>> }
>>>>
>>>> I submitted this job using the following script:
>>>>
>>>> #!/bin/bash
>>>>
>>>> HBASE_CLASSPATH=$(hbase classpath)
>>>> APP_JAR=/usr/games/spark/xt/SparkDemo-0.0.1-SNAPSHOT.jar
>>>>
>>>> SPARK_ASSEMBLY_JAR=/usr/games/spark/xt/spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar
>>>> SPARK_MASTER=spark://b02.jsepc.com:7077
>>>>
>>>> CLASSPATH=$CLASSPATH:$APP_JAR:$SPARK_ASSEMBLY_JAR:$HBASE_CLASSPATH
>>>> export SPARK_CLASSPATH=/usr/lib/hbase/lib/*
>>>>
>>>> CONFIG_OPTS="-Dspark.master=$SPARK_MASTER"
>>>>
>>>> java -cp $CLASSPATH $CONFIG_OPTS com.xt.scala.HBaseApp $@
>>>>
>>>> After I submitted the job, the count of rdd could be computed
>>>> successfully, but that rdd could not be saved into HDFS and the following
>>>> exception was thrown:
>>>>
>>>> 14/10/11 16:09:33 WARN scheduler.TaskSetManager: Loss was due to
>>>> java.lang.ClassNotFoundException
>>>> java.lang.ClassNotFoundException:
>>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1
>>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>  at java.lang.Class.forName0(Native Method)
>>>>  at java.lang.Class.forName(Class.java:270)
>>>>  at
>>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
>>>>  at
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>>>  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>>  at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>>  at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>  at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>  at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>>>  at
>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>>  at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>>  at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>  at
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>>>>  at
>>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>>>>  at
>>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>>>>  at
>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>>>>  at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>  at
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>>>>  at
>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>>>>  at
>>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
>>>>  at
>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>>>  at
>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>  at
>>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>>>  at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>
>>>>
>>>>
>>>> I also noted that, if I add "-Dspark.jars=$APP_JAR" to the variable
>>>> *CONFIG_OPTS*, i.e., CONFIG_OPTS="-Dspark.master=$SPARK_MASTER
>>>> Dspark.jars=$APP_JAR", the job will finish successfully and rdd can be
>>>> written into HDFS.
>>>> So, what does "java.lang.ClassNotFoundException:
>>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1" mean and why would it be
>>>> thrown ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>

Reply via email to