GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/181
Use the Executor's ClassLoader in sc.objectFile().
This makes it possible to read classes from the object file which were
specified in the user-provided jars. (By default ObjectInputStream uses
latestUserDefinedLoader, which may or may not be the right one.)
I created this because I ran into the following problem. I have x:RDD[X]
with X being defined in the jar that I provide to SparkContext. I save it with
x.saveAsObjectFile("x"). I try to load it with sc.objectFile[X]("x"). It fails
with ClassNotFoundException.
After a good while of debugging I figured out that Utils.deserialize() most
likely uses the ClassLoader of Utils. This is the bootstrap ClassLoader, so it
is not aware of the dynamically added jars. This patch fixes the issue.
A more robust fix would be to always default to
Thread.currentThread.getContextClassLoader. This would prevent this problem
from biting anyone in the future. It would be a bit harder to test though. On
the topic of testing, if you'd like to see tests for this, I will need some
hand-holding. Thanks!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/darabos/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/181.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #181
----
commit 1b5df2cb5a69e357b1eb0b19cc05f4b1aedb5701
Author: Daniel Darabos <[email protected]>
Date: 2014-03-19T18:52:12Z
Use the Executor's ClassLoader in sc.objectFile(). This makes it possible
to read classes from the object file which were specified in the user-provided
jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or
may not be the right one.)
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---