i do not think the current solution will work. i tried writing a version of
ChildExecutorURLClassLoader that does have a proper parent and has a
modified loadClass to reverse the order of parent and child in finding
classes, and that seems to work, but now classes like SparkEnv are loaded
by the child and somehow this means the companion objects are reset or
something like that because i get NPEs.


On Fri, May 16, 2014 at 3:54 PM, Koert Kuipers <ko...@tresata.com> wrote:

> ok i think the issue is visibility: a classloader can see all classes
> loaded by its parent classloader. but userClassLoader does not have a
> parent classloader, so its not able to "see" any classes that parentLoader
> is responsible for. in my case userClassLoader is trying to get
> AvroInputFormat which probably somewhere statically references
> FileInputFormat, which is invisible to userClassLoader.
>
>
> On Fri, May 16, 2014 at 3:32 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> ok i put lots of logging statements in the ChildExecutorURLClassLoader.
>> this is what i see:
>>
>> * the urls for userClassLoader are correct and includes only my one jar.
>>
>> * for one class that only exists in my jar i see it gets loaded correctly
>> using userClassLoader
>>
>> * for a class that exists in both my jar and spark kernel it tries to use
>> userClassLoader and ends up with a NoClassDefFoundError. the class is
>> org.apache.avro.mapred.AvroInputFormat and the NoClassDefFoundError is for
>> org.apache.hadoop.mapred.FileInputFormat (which the parentClassLoader is
>> responsible for since it is not in my jar). i currently catch this
>> NoClassDefFoundError and call parentClassLoader.loadClass but thats clearly
>> not a solution since it loads the wrong version.
>>
>>
>>
>> On Fri, May 16, 2014 at 2:25 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> well, i modified ChildExecutorURLClassLoader to also delegate to
>>> parentClassloader if NoClassDefFoundError is thrown... now i get yet
>>> another error. i am clearly missing something with these classloaders. such
>>> nasty stuff... giving up for now. just going to have to not use
>>> spark.files.userClassPathFirst=true for now, until i have more time to look
>>> at this.
>>>
>>> 14/05/16 13:58:59 ERROR Executor: Exception in task ID 3
>>> java.lang.ClassCastException: cannot assign instance of scala.None$ to
>>> field org.apache.spark.rdd.RDD.checkpointData of type scala.Option in
>>> instance of MyRDD
>>>         at
>>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>>         at
>>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>>         at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1995)
>>>
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>         at
>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>         at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>
>>>
>>>
>>> On Fri, May 16, 2014 at 1:46 PM, Koert Kuipers <ko...@tresata.com>wrote:
>>>
>>>> after removing all class paramater of class Path from my code, i tried
>>>> again. different but related eror when i set
>>>> spark.files.userClassPathFirst=true
>>>>
>>>> now i dont even use FileInputFormat directly. HadoopRDD does...
>>>>
>>>> 14/05/16 12:17:17 ERROR Executor: Exception in task ID 45
>>>> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat
>>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
>>>>         at
>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>>>>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>         at
>>>> org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>         at
>>>> org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51)
>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>         at java.lang.Class.forName0(Native Method)
>>>>         at java.lang.Class.forName(Class.java:270)
>>>>         at
>>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57)
>>>>         at
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
>>>>         at
>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
>>>>         at
>>>> java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481)
>>>>         at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331)
>>>>         at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>         at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>         at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>         at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>         at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>         at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>         at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>         at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>         at
>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>         at
>>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>>         at
>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>>         at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>>>>         at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>         at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>
>>>>
>>>>
>>>> On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers <ko...@tresata.com>wrote:
>>>>
>>>>> when i set spark.files.userClassPathFirst=true, i get java
>>>>> serialization errors in my tasks, see below. when i set userClassPathFirst
>>>>> back to its default of false, the serialization errors are gone. my
>>>>> spark.serializer is KryoSerializer.
>>>>>
>>>>> the class org.apache.hadoop.fs.Path is in the spark assembly jar, but
>>>>> not in my task jars (the ones i added to the SparkConf). so looks like the
>>>>> ClosureSerializer is having trouble with this class once the
>>>>> ChildExecutorURLClassLoader is used? thats me just guessing.
>>>>>
>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>>> aborted due to stage failure: Task 1.0:5 failed 4 times, most recent
>>>>> failure: Exception failure in TID 31 on host node05.tresata.com:
>>>>> java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
>>>>>         java.lang.Class.getDeclaredConstructors0(Native Method)
>>>>>         java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
>>>>>         java.lang.Class.getDeclaredConstructors(Class.java:1838)
>>>>>
>>>>> java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
>>>>>         java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
>>>>>         java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
>>>>>         java.security.AccessController.doPrivileged(Native Method)
>>>>>
>>>>> java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)
>>>>>
>>>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)
>>>>>
>>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580)
>>>>>
>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
>>>>>
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>>>
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>>
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>>
>>>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>>>         sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>         java.lang.reflect.Method.invoke(Method.java:597)
>>>>>
>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>>>>>
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>>
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>>>
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>>
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>>
>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>>>
>>>>> org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:66)
>>>>>
>>>>> org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:139)
>>>>>
>>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>>>>>
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>>
>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>>>
>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:82)
>>>>>
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:190)
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>>>         java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to