Since the additional jars added by sc.addJars are through http server, even
it works, we still want to have a better way due to scalability (imagine
that thousands of workers downloading jars from driver).

If we ignore the fundamental scalability issue, this can be fixed by using
the customClassloader to create a wrapped class, and in this wrapped class,
the classloader is inherited from the customClassloader so that users don't
need to do reflection in the wrapped class. I'm working on this now.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


---------- Forwarded message ----------
From: Sandy Ryza <sandy.r...@cloudera.com>
Date: Sun, May 18, 2014 at 4:49 PM
Subject: Re: Calling external classes added by sc.addJar needs to be
through reflection
To: "dev@spark.apache.org" <dev@spark.apache.org>


Hey Xiangrui,

If the jars are placed in the distributed cache and loaded statically, as
the primary app jar is in YARN, then it shouldn't be an issue.  Other jars,
however, including additional jars that are sc.addJar'd and jars specified
with the spark-submit --jars argument, are loaded dynamically by executors
with a URLClassLoader.  These jars aren't next to the executors when they
start - the executors fetch them from the driver's HTTP server.


On Sun, May 18, 2014 at 4:05 PM, Xiangrui Meng <men...@gmail.com> wrote:

> Hi Sandy,
>
> It is hard to imagine that a user needs to create an object in that
> way. Since the jars are already in distributed cache before the
> executor starts, is there any reason we cannot add the locally cached
> jars to classpath directly?
>
> Best,
> Xiangrui
>
> On Sun, May 18, 2014 at 4:00 PM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
> > I spoke with DB offline about this a little while ago and he confirmed
> that
> > he was able to access the jar from the driver.
> >
> > The issue appears to be a general Java issue: you can't directly
> > instantiate a class from a dynamically loaded jar.
> >
> > I reproduced it locally outside of Spark with:
> > ---
> >     URLClassLoader urlClassLoader = new URLClassLoader(new URL[] { new
> > File("myotherjar.jar").toURI().toURL() }, null);
> >     Thread.currentThread().setContextClassLoader(urlClassLoader);
> >     MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar();
> > ---
> >
> > I was able to load the class with reflection.
> >
> >
> >
> > On Sun, May 18, 2014 at 11:58 AM, Patrick Wendell <pwend...@gmail.com
> >wrote:
> >
> >> @db - it's possible that you aren't including the jar in the classpath
> >> of your driver program (I think this is what mridul was suggesting).
> >> It would be helpful to see the stack trace of the CNFE.
> >>
> >> - Patrick
> >>
> >> On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell <pwend...@gmail.com>
> >> wrote:
> >> > @xiangrui - we don't expect these to be present on the system
> >> > classpath, because they get dynamically added by Spark (e.g. your
> >> > application can call sc.addJar well after the JVM's have started).
> >> >
> >> > @db - I'm pretty surprised to see that behavior. It's definitely not
> >> > intended that users need reflection to instantiate their classes -
> >> > something odd is going on in your case. If you could create an
> >> > isolated example and post it to the JIRA, that would be great.
> >> >
> >> > On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <men...@gmail.com>
> wrote:
> >> >> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870
> >> >>
> >> >> DB, could you add more info to that JIRA? Thanks!
> >> >>
> >> >> -Xiangrui
> >> >>
> >> >> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com>
> >> wrote:
> >> >>> Btw, I tried
> >> >>>
> >> >>> rdd.map { i =>
> >> >>>   System.getProperty("java.class.path")
> >> >>> }.collect()
> >> >>>
> >> >>> but didn't see the jars added via "--jars" on the executor
> classpath.
> >> >>>
> >> >>> -Xiangrui
> >> >>>
> >> >>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com>
> >> wrote:
> >> >>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
> >> >>>> reflection approach mentioned by DB didn't work either. I checked
> the
> >> >>>> distributed cache on a worker node and found the jar there. It is
> also
> >> >>>> in the Environment tab of the WebUI. The workaround is making an
> >> >>>> assembly jar.
> >> >>>>
> >> >>>> DB, could you create a JIRA and describe what you have found so
> far?
> >> Thanks!
> >> >>>>
> >> >>>> Best,
> >> >>>> Xiangrui
> >> >>>>
> >> >>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <
> >> mri...@gmail.com> wrote:
> >> >>>>> Can you try moving your mapPartitions to another class/object
> which
> >> is
> >> >>>>> referenced only after sc.addJar ?
> >> >>>>>
> >> >>>>> I would suspect CNFEx is coming while loading the class
containing
> >> >>>>> mapPartitions before addJars is executed.
> >> >>>>>
> >> >>>>> In general though, dynamic loading of classes means you use
> >> reflection to
> >> >>>>> instantiate it since expectation is you don't know which
> >> implementation
> >> >>>>> provides the interface ... If you statically know it apriori, you
> >> bundle it
> >> >>>>> in your classpath.
> >> >>>>>
> >> >>>>> Regards
> >> >>>>> Mridul
> >> >>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote:
> >> >>>>>
> >> >>>>>> Finally find a way out of the ClassLoader maze! It took me some
> >> times to
> >> >>>>>> understand how it works; I think it worths to document it in a
> >> separated
> >> >>>>>> thread.
> >> >>>>>>
> >> >>>>>> We're trying to add external utility.jar which contains
> >> CSVRecordParser,
> >> >>>>>> and we added the jar to executors through sc.addJar APIs.
> >> >>>>>>
> >> >>>>>> If the instance of CSVRecordParser is created without
> reflection, it
> >> >>>>>> raises *ClassNotFound
> >> >>>>>> Exception*.
> >> >>>>>>
> >> >>>>>> data.mapPartitions(lines => {
> >> >>>>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
> >> >>>>>>     lines.foreach(line => {
> >> >>>>>>       val lineElems = csvParser.parseLine(line)
> >> >>>>>>     })
> >> >>>>>>     ...
> >> >>>>>>     ...
> >> >>>>>>  )
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> If the instance of CSVRecordParser is created through
reflection,
> >> it works.
> >> >>>>>>
> >> >>>>>> data.mapPartitions(lines => {
> >> >>>>>>     val loader = Thread.currentThread.getContextClassLoader
> >> >>>>>>     val CSVRecordParser =
> >> >>>>>>
loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
> >> >>>>>>
> >> >>>>>>     val csvParser =
> CSVRecordParser.getConstructor(Character.TYPE)
> >> >>>>>>
.newInstance(delimiter.charAt(0).asInstanceOf[Character])
> >> >>>>>>
> >> >>>>>>     val parseLine = CSVRecordParser
> >> >>>>>>         .getDeclaredMethod("parseLine", classOf[String])
> >> >>>>>>
> >> >>>>>>     lines.foreach(line => {
> >> >>>>>>        val lineElems = parseLine.invoke(csvParser,
> >> >>>>>> line).asInstanceOf[Array[String]]
> >> >>>>>>     })
> >> >>>>>>     ...
> >> >>>>>>     ...
> >> >>>>>>  )
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> This is identical to this question,
> >> >>>>>>
> >> >>>>>>
> >>
>
http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
> >> >>>>>>
> >> >>>>>> It's not intuitive for users to load external classes through
> >> reflection,
> >> >>>>>> but couple available solutions including 1) messing around
> >> >>>>>> systemClassLoader by calling systemClassLoader.addURI through
> >> reflection or
> >> >>>>>> 2) forking another JVM to add jars into classpath before
> bootstrap
> >> loader
> >> >>>>>> are very tricky.
> >> >>>>>>
> >> >>>>>> Any thought on fixing it properly?
> >> >>>>>>
> >> >>>>>> @Xiangrui,
> >> >>>>>> netlib-java jniloader is loaded from netlib-java through
> >> reflection, so
> >> >>>>>> this problem will not be seen.
> >> >>>>>>
> >> >>>>>> Sincerely,
> >> >>>>>>
> >> >>>>>> DB Tsai
> >> >>>>>> -------------------------------------------------------
> >> >>>>>> My Blog: https://www.dbtsai.com
> >> >>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai
> >> >>>>>>
> >>
>

Reply via email to