That sounds like a different issue. What is the type of myrdd (i.e., if you just type myrdd into the shell)? It's possible it's defined as an RDD[Nothing] and thus all operations try to typecast to Nothing, which always fails. Perhaps declaring it initially with respect to your class would help, something like val myrdd: RDD[mypackage.MyClass] = sc.sequenceFile(...)
On Sat, Jan 4, 2014 at 8:29 PM, Aureliano Buendia <[email protected]>wrote: > While myrdd.count() works, a lot of other actions and transformations do > not still work in spark-shell. Eg myrdd.first() gives this error: > > java.lang.ClassCastException: mypackage.MyClass cannot be cast to > scala.runtime.Nothing$ > > Also, myrdd.map(r => r) returns: > > org.apache.spark.rdd.RDD[*Nothing*] = MappedRDD[2] > > Basically, type mypackage.MyClass gets converted to Nothing during any > action/transformation. > > > > On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia <[email protected]>wrote: > >> Sorry, I had a typo. I can conform that using ADD_JARS together with >> SPARK_CLASSPATH works as expected in spark-shell. >> >> It'd make sense to have the two combined as one option. >> >> >> On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[email protected]>wrote: >> >>> Cool. To confirm, you said you can access the class and construct new >>> objects -- did you do this in the shell itself (i.e., on the driver), or on >>> the executors? >>> >>> Specifically, one of the following two should fail in the shell: >>> > new mypackage.MyClass() >>> > sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass()) >>> (or just import it) >>> >>> You could also try running MASTER=local-cluster[2,1,512] which launches >>> 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster >>> more closely, in case it's a bug only related to using local mode. >>> >>> >>> On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia >>> <[email protected]>wrote: >>> >>>> >>>> >>>> >>>> On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[email protected]>wrote: >>>> >>>>> Additionally, which version of Spark are you running? >>>>> >>>> >>>> 0.8.1. >>>> >>>> Unfortunately, this doesn't work either: >>>> >>>> MASTER=local[2] ADD_JARS=/path/to/my/jar >>>> SPARK_CLASSPATH=/path/to/my/jar ./spark-shell >>>> >>>> >>>>> >>>>> >>>>> On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[email protected]>wrote: >>>>> >>>>>> I am not an expert on these classpath issues, but if you're using >>>>>> local mode, you might also try to set SPARK_CLASSPATH to include the path >>>>>> to the jar file as well. This should not really help, since "adding jars" >>>>>> is the right way to get the jars to your executors (which is where the >>>>>> exception appears to be happening), but it would sure be interesting if >>>>>> it >>>>>> did. >>>>>> >>>>>> >>>>>> On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I should add that I can see in the log that the jar being shipped to >>>>>>> the workers: >>>>>>> >>>>>>> 14/01/04 15:34:52 INFO Executor: Fetching >>>>>>> http://192.168.1.111:51031/jars/my.jar.jar with timestamp >>>>>>> 1388881979092 >>>>>>> 14/01/04 15:34:52 INFO Utils: Fetching >>>>>>> http://192.168.1.111:51031/jars/my.jar.jar to >>>>>>> /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp >>>>>>> 14/01/04 15:34:53 INFO Executor: Adding >>>>>>> file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar >>>>>>> to class loader >>>>>>> >>>>>>> >>>>>>> On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm trying to access my stand alone spark app from spark-shell. I >>>>>>>> tried starting the shell by: >>>>>>>> >>>>>>>> MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell >>>>>>>> >>>>>>>> The log shows that the jar file was loaded. Also, I can access and >>>>>>>> create a new instance of mypackage.MyClass. >>>>>>>> >>>>>>>> The problem is that myRDD.collect() returns RDD[MyClass], and that >>>>>>>> throws this exception: >>>>>>>> >>>>>>>> java.lang.ClassNotFoundException: mypackage.MyClass >>>>>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>>>>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >>>>>>>> at java.lang.Class.forName0(Native Method) >>>>>>>> at java.lang.Class.forName(Class.java:264) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341) >>>>>>>> at >>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) >>>>>>>> at org.apache.spark.util.Utils$.deserialize(Utils.scala:59) >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573) >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573) >>>>>>>> at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440) >>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702) >>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698) >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872) >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872) >>>>>>>> at >>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107) >>>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:53) >>>>>>>> at >>>>>>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215) >>>>>>>> at >>>>>>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50) >>>>>>>> at >>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) >>>>>>>> at >>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>>>>>> at >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>>>>>> at java.lang.Thread.run(Thread.java:722) >>>>>>>> >>>>>>>> Does this mean that my jar was not shipped to the workers? Is this >>>>>>>> a known issue, or am I doing something wrong here? >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
