Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia Sun, 05 Jan 2014 06:22:32 -0800

On Sun, Jan 5, 2014 at 6:01 AM, Aaron Davidson <[email protected]> wrote:


> That sounds like a different issue. What is the type of myrdd (i.e., if
> you just type myrdd into the shell)? It's possible it's defined as an
> RDD[Nothing] and thus all operations try to typecast to Nothing, which
> always fails. Perhaps declaring it initially with respect to your class
> would help, something like
> val myrdd: RDD[mypackage.MyClass] = sc.sequenceFile(...)
>

This solved the problem, thanks!

Is it because sc.objectFile() returns RDD{Nothing], or is it a spark-shell
problem?


>
>
> On Sat, Jan 4, 2014 at 8:29 PM, Aureliano Buendia <[email protected]>wrote:
>
>> While myrdd.count() works, a lot of other actions and transformations do
>> not still work in spark-shell. Eg myrdd.first() gives this error:
>>
>> java.lang.ClassCastException: mypackage.MyClass cannot be cast to
>> scala.runtime.Nothing$
>>
>> Also, myrdd.map(r => r) returns:
>>
>> org.apache.spark.rdd.RDD[*Nothing*] = MappedRDD[2]
>>
>> Basically, type mypackage.MyClass gets converted to Nothing during any
>> action/transformation.
>>
>>
>>
>> On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia 
>> <[email protected]>wrote:
>>
>>> Sorry, I had a typo. I can conform that using ADD_JARS together with
>>> SPARK_CLASSPATH works as expected in spark-shell.
>>>
>>> It'd make sense to have the two combined as one option.
>>>
>>>
>>> On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[email protected]>wrote:
>>>
>>>> Cool. To confirm, you said you can access the class and construct new
>>>> objects -- did you do this in the shell itself (i.e., on the driver), or on
>>>> the executors?
>>>>
>>>> Specifically, one of the following two should fail in the shell:
>>>> > new mypackage.MyClass()
>>>> > sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
>>>> (or just import it)
>>>>
>>>> You could also try running MASTER=local-cluster[2,1,512] which launches
>>>> 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster
>>>> more closely, in case it's a bug only related to using local mode.
>>>>
>>>>
>>>> On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[email protected]
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[email protected]>wrote:
>>>>>
>>>>>> Additionally, which version of Spark are you running?
>>>>>>
>>>>>
>>>>> 0.8.1.
>>>>>
>>>>> Unfortunately, this doesn't work either:
>>>>>
>>>>> MASTER=local[2] ADD_JARS=/path/to/my/jar
>>>>> SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[email protected]>wrote:
>>>>>>
>>>>>>> I am not an expert on these classpath issues, but if you're using
>>>>>>> local mode, you might also try to set SPARK_CLASSPATH to include the 
>>>>>>> path
>>>>>>> to the jar file as well. This should not really help, since "adding 
>>>>>>> jars"
>>>>>>> is the right way to get the jars to your executors (which is where the
>>>>>>> exception appears to be happening), but it would sure be interesting if 
>>>>>>> it
>>>>>>> did.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I should add that I can see in the log that the jar being shipped
>>>>>>>> to the workers:
>>>>>>>>
>>>>>>>> 14/01/04 15:34:52 INFO Executor: Fetching
>>>>>>>> http://192.168.1.111:51031/jars/my.jar.jar with timestamp
>>>>>>>> 1388881979092
>>>>>>>> 14/01/04 15:34:52 INFO Utils: Fetching
>>>>>>>> http://192.168.1.111:51031/jars/my.jar.jar to
>>>>>>>> /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
>>>>>>>> 14/01/04 15:34:53 INFO Executor: Adding
>>>>>>>> file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar
>>>>>>>> to class loader
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm trying to access my stand alone spark app from spark-shell. I
>>>>>>>>> tried starting the shell by:
>>>>>>>>>
>>>>>>>>> MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell
>>>>>>>>>
>>>>>>>>> The log shows that the jar file was loaded. Also, I can access and
>>>>>>>>> create a new instance of mypackage.MyClass.
>>>>>>>>>
>>>>>>>>> The problem is that myRDD.collect() returns RDD[MyClass], and
>>>>>>>>> that throws this exception:
>>>>>>>>>
>>>>>>>>> java.lang.ClassNotFoundException: mypackage.MyClass
>>>>>>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>>>>   at java.lang.Class.forName0(Native Method)
>>>>>>>>>   at java.lang.Class.forName(Class.java:264)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
>>>>>>>>>   at
>>>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>>>>>>>   at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
>>>>>>>>>   at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>>>>>>>>>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
>>>>>>>>>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>>>>>>>>>   at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
>>>>>>>>>   at
>>>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
>>>>>>>>>   at
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>>>>>   at
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>>>>>   at java.lang.Thread.run(Thread.java:722)
>>>>>>>>>
>>>>>>>>> Does this mean that my jar was not shipped to the workers? Is this
>>>>>>>>> a known issue, or am I doing something wrong here?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ADD_JARS doesn't properly work for spark-shell

Reply via email to