Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-22 Thread Xiangrui Meng
Hi DB, I found it is a little hard to implement the solution I mentioned: > Do not send the primary jar and secondary jars to executors' > distributed cache. Instead, add them to "spark.jars" in SparkSubmit > and serve them via http by called sc.addJar in SparkContext. If you look at Application

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
@Xiangrui How about we send the primary jar and secondary jars into distributed cache without adding them into the system classloader of executors. Then we add them using custom classloader so we don't need to call secondary jars through reflection in primary jar. This will be consistent to what we

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
Hey I just looked at the fix here: https://github.com/apache/spark/pull/848 Given that this is quite simple, maybe it's best to just go with this and just explain that we don't support adding jars dynamically in YARN in Spark 1.0. That seems like a reasonable thing to do. On Wed, May 21, 2014 at

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
Of these two solutions I'd definitely prefer 2 in the short term. I'd imagine the fix is very straightforward (it would mostly just be remove code), and we'd be making this more consistent with the standalone mode which makes things way easier to reason about. In the long term we'll definitely wan

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Koert Kuipers
db tsai, i do not think userClassPathFirst is working, unless the classes you load dont reference any classes already loaded by the parent classloader (a mostly hypothetical situation)... i filed a jira for this here: https://issues.apache.org/jira/browse/SPARK-1863 On Tue, May 20, 2014 at 1:04

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Xiangrui Meng
That's a good example. If we really want to cover that case, there are two solutions: 1. Follow DB's patch, adding jars to the system classloader. Then we cannot put a user class in front of an existing class. 2. Do not send the primary jar and secondary jars to executors' distributed cache. Inste

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Sandy Ryza
Is that an assumption we can make? I think we'd run into an issue in this situation: *In primary jar:* def makeDynamicObject(clazz: String) = Class.forName(clazz).newInstance() *In app code:* sc.addJar("dynamicjar.jar") ... rdd.map(x => makeDynamicObject("some.class.from.DynamicJar")) It might

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
How about the jars added dynamically? Those will be in customLoader's classpath but not in the system one. Unfortunately, when we reference to those jars added dynamically in primary jar, the default classloader will be the system one not the custom one. It works in standalone mode since the prima

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Xiangrui Meng
I think adding jars dynamically should work as long as the primary jar and the secondary jars do not depend on dynamically added jars, which should be the correct logic. -Xiangrui On Wed, May 21, 2014 at 1:40 PM, DB Tsai wrote: > This will be another separate story. > > Since in the yarn deployme

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
This will be another separate story. Since in the yarn deployment, as Sandy said, the app.jar will be always in the systemclassloader which means any object instantiated in app.jar will have parent loader of systemclassloader instead of custom one. As a result, the custom class loader in yarn will

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Sandy Ryza
This will solve the issue for jars added upon application submission, but, on top of this, we need to make sure that anything dynamically added through sc.addJar works as well. To do so, we need to make sure that any jars retrieved via the driver's HTTP server are loaded by the same classloader th

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-20 Thread Xiangrui Meng
Talked with Sandy and DB offline. I think the best solution is sending the secondary jars to the distributed cache of all containers rather than just the master, and set the classpath to include spark jar, primary app jar, and secondary jars before executor starts. In this way, user only needs to s

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
In 1.0, there is a new option for users to choose which classloader has higher priority via spark.files.userClassPathFirst, I decided to submit the PR for 0.9 first. We use this patch in our lab and we can use those jars added by sc.addJar without reflection. https://github.com/apache/spark/pull/8

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
Good summary! We fixed it in branch 0.9 since our production is still in 0.9. I'm porting it to 1.0 now, and hopefully will submit PR for 1.0 tonight. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sandy Ryza
It just hit me why this problem is showing up on YARN and not on standalone. The relevant difference between YARN and standalone is that, on YARN, the app jar is loaded by the system classloader instead of Spark's custom URL classloader. On YARN, the system classloader knows about [the classes in

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Patrick Wendell
Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark. DB - I'd hold off on a re-architecting of this until

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sean Owen
I don't think a customer classloader is necessary. Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc all run custom user code that creates new user objects without reflection. I should go see how that's done. Maybe it's totally valid to set the thread's context classloader for

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Andrew Ash
Sounds like the problem is that classloaders always look in their parents before themselves, and Spark users want executors to pick up classes from their custom code before the ones in Spark plus its dependencies. Would a custom classloader that delegates to the parent after first checking itself

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
Hi Sean, It's true that the issue here is classloader, and due to the classloader delegation model, users have to use reflection in the executors to pick up the classloader in order to use those classes added by sc.addJars APIs. However, it's very inconvenience for users, and not documented in spa

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sean Owen
I might be stating the obvious for everyone, but the issue here is not reflection or the source of the JAR, but the ClassLoader. The basic rules are this. "new Foo" will use the ClassLoader that defines Foo. This is usually the ClassLoader that loaded whatever it is that first referenced Foo and c

Fwd: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
log: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai -- Forwarded message -- From: Sandy Ryza Date: Sun, May 18, 2014 at 4:49 PM Subject: Re: Calling external classes added by sc.addJar needs to be through reflection To: "dev@spark.apache.org" Hey Xi

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The jars are included in my driver, and I can successfully use them in the driver. I'm working on a patch, and it's almost working. Will submit a PR soon. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The reflection actually works. But you need to get the loader by `val loader = Thread.currentThread.getContextClassLoader` which is set by Spark executor. Our team verified this, and uses it as workaround. Sincerely, DB Tsai --- My Blog: https

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sandy Ryza
Hey Xiangrui, If the jars are placed in the distributed cache and loaded statically, as the primary app jar is in YARN, then it shouldn't be an issue. Other jars, however, including additional jars that are sc.addJar'd and jars specified with the spark-submit --jars argument, are loaded dynamical

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Hi Sandy, It is hard to imagine that a user needs to create an object in that way. Since the jars are already in distributed cache before the executor starts, is there any reason we cannot add the locally cached jars to classpath directly? Best, Xiangrui On Sun, May 18, 2014 at 4:00 PM, Sandy Ry

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Hi Patrick, If spark-submit works correctly, user only needs to specify runtime jars via `--jars` instead of using `sc.addJar`. Is it correct? I checked SparkSubmit and yarn.Client but didn't find any code to handle `args.jars` for YARN mode. So I don't know where in the code the jars in the distr

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sandy Ryza
I spoke with DB offline about this a little while ago and he confirmed that he was able to access the jar from the driver. The issue appears to be a general Java issue: you can't directly instantiate a class from a dynamically loaded jar. I reproduced it locally outside of Spark with: --- URL

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@xiangrui - we don't expect these to be present on the system classpath, because they get dynamically added by Spark (e.g. your application can call sc.addJar well after the JVM's have started). @db - I'm pretty surprised to see that behavior. It's definitely not intended that users need reflectio

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@db - it's possible that you aren't including the jar in the classpath of your driver program (I think this is what mridul was suggesting). It would be helpful to see the stack trace of the CNFE. - Patrick On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell wrote: > @xiangrui - we don't expect the

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Btw, I tried rdd.map { i => System.getProperty("java.class.path") }.collect() but didn't see the jars added via "--jars" on the executor classpath. -Xiangrui On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng wrote: > I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The > reflecti

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870 DB, could you add more info to that JIRA? Thanks! -Xiangrui On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng wrote: > Btw, I tried > > rdd.map { i => > System.getProperty("java.class.path") > }.collect() > > but didn't see the j

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The reflection approach mentioned by DB didn't work either. I checked the distributed cache on a worker node and found the jar there. It is also in the Environment tab of the WebUI. The workaround is making an assembly jar. DB, could y

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-17 Thread Mridul Muralidharan
Can you try moving your mapPartitions to another class/object which is referenced only after sc.addJar ? I would suspect CNFEx is coming while loading the class containing mapPartitions before addJars is executed. In general though, dynamic loading of classes means you use reflection to instantia

Calling external classes added by sc.addJar needs to be through reflection

2014-05-16 Thread DB Tsai
Finally find a way out of the ClassLoader maze! It took me some times to understand how it works; I think it worths to document it in a separated thread. We're trying to add external utility.jar which contains CSVRecordParser, and we added the jar to executors through sc.addJar APIs. If the insta