Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sean Owen
I might be stating the obvious for everyone, but the issue here is not reflection or the source of the JAR, but the ClassLoader. The basic rules are this. new Foo will use the ClassLoader that defines Foo. This is usually the ClassLoader that loaded whatever it is that first referenced Foo and

TorrentBroadcast aka Cornet?

2014-05-19 Thread Andrew Ash
Hi Spark devs, Is the algorithm for TorrentBroadcasthttps://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scalathe same as Cornet from the below paper? http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf If so it would be nice

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
Hi Sean, It's true that the issue here is classloader, and due to the classloader delegation model, users have to use reflection in the executors to pick up the classloader in order to use those classes added by sc.addJars APIs. However, it's very inconvenience for users, and not documented in

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Andrew Ash
Sounds like the problem is that classloaders always look in their parents before themselves, and Spark users want executors to pick up classes from their custom code before the ones in Spark plus its dependencies. Would a custom classloader that delegates to the parent after first checking itself

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sean Owen
I don't think a customer classloader is necessary. Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc all run custom user code that creates new user objects without reflection. I should go see how that's done. Maybe it's totally valid to set the thread's context classloader

queston about Spark repositories in GitHub

2014-05-19 Thread Gil Vernik
Hello, I am new to the Spark community, so I apologize if I ask something obvious. I follow the document about contribution to Spark where it's written that I need to fork the https://github.com/apache/spark repository. I got a little bit confused since the repository

Re: queston about Spark repositories in GitHub

2014-05-19 Thread Matei Zaharia
“master” is where development happens, while branch-1.0, branch-0.9, etc are for maintenance releases in those versions. Most likely if you want to contribute you should use master. Some of the other named branches were for big features in the past, but none are actively used now. Matei On

BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
graph.triplets does not work -- it returns incorrect results I have a graph with the following edges: orig_graph.edges.collect = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1),

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned. It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
Thanks, rxin, this worked! I am having a similar problem with .reduce... do I need to insert .copy() functions in that statement as well? This part works: orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr),

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
I tried adding .copy() everywhere, but still only get one element returned, not even an RDD object. orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) = { if (A._1.copy().dstId == B._1.copy().srcId)

spark 1.0 standalone application

2014-05-19 Thread nit
I am not much comfortable with sbt. I want to build a standalone application using spark 1.0 RC9. I can build sbt assembly for my application with Spark 0.9.1, and I think in that case spark is pulled from Aka Repository? Now if I want to use 1.0 RC9 for my application; what is the process ?

spark and impala , which is more fitter for MPP

2014-05-19 Thread liuguodong
Hi, ALL My question is that spark and impala , which is more fitter for MPP . The motivition as below case: 1. three big table need make join operation; (about 100 field per table, more than 1TB per table) 2. beside above tables, it is very possible to they

Re: spark 1.0 standalone application

2014-05-19 Thread Nan Zhu
en, you have to put spark-assembly-*.jar to the lib directory of your application Best, -- Nan Zhu On Monday, May 19, 2014 at 9:48 PM, nit wrote: I am not much comfortable with sbt. I want to build a standalone application using spark 1.0 RC9. I can build sbt assembly for my application

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Nan Zhu
just rerun my test on rc5 everything works build applications with sbt and the spark-*.jar which is compiled with Hadoop 2.3 +1 -- Nan Zhu On Sunday, May 18, 2014 at 11:07 PM, witgo wrote: How to reproduce this bug? -- Original -- From: Patrick

Re: spark 1.0 standalone application

2014-05-19 Thread Mark Hamstra
That's the crude way to do it. If you run `sbt/sbt publishLocal`, then you can resolve the artifact from your local cache in the same way that you would resolve it if it were deployed to a remote cache. That's just the build step. Actually running the application will require the necessary jars

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Patrick Wendell
Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark. DB - I'd hold off on a re-architecting of this

Re: spark 1.0 standalone application

2014-05-19 Thread Patrick Wendell
Whenever we publish a release candidate, we create a temporary maven repository that host the artifacts. We do this precisely for the case you are running into (where a user wants to build an application against it to test). You can build against the release candidate by just adding that

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
reduce always return a single element - maybe you are misunderstanding what the reduce function in collections does. On Mon, May 19, 2014 at 3:32 PM, GlennStrycker glenn.stryc...@gmail.comwrote: I tried adding .copy() everywhere, but still only get one element returned, not even an RDD

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sandy Ryza
It just hit me why this problem is showing up on YARN and not on standalone. The relevant difference between YARN and standalone is that, on YARN, the app jar is loaded by the system classloader instead of Spark's custom URL classloader. On YARN, the system classloader knows about [the classes

Re: spark 1.0 standalone application

2014-05-19 Thread nit
Thanks everyone. I followed Patrick's suggestion and it worked like a charm. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-1-0-standalone-application-tp6698p6710.html Sent from the Apache Spark Developers List mailing list archive at

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Patrick Wendell
We're cancelling this RC in favor of rc10. There were two blockers: an issue with Windows run scripts and an issue with the packaging for Hadoop 1 when hive support is bundled. https://issues.apache.org/jira/browse/SPARK-1875 https://issues.apache.org/jira/browse/SPARK-1876 Thanks everyone for

Re: spark 1.0 standalone application

2014-05-19 Thread Shivaram Venkataraman
On a related note there is also a staging Apache repository where the latest rc gets pushed to https://repository.apache.org/content/repositories/staging/org/apache/spark/spark-core_2.10/-- The artifact here is just named 1.0.0 (similar to the rc specific repository that Patrick mentioned). So if