I might be stating the obvious for everyone, but the issue here is not
reflection or the source of the JAR, but the ClassLoader. The basic
rules are this.
new Foo will use the ClassLoader that defines Foo. This is usually
the ClassLoader that loaded whatever it is that first referenced Foo
and
Hi Spark devs,
Is the algorithm for
TorrentBroadcasthttps://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scalathe
same as Cornet from the below paper?
http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf
If so it would be nice
Hi Sean,
It's true that the issue here is classloader, and due to the classloader
delegation model, users have to use reflection in the executors to pick up
the classloader in order to use those classes added by sc.addJars APIs.
However, it's very inconvenience for users, and not documented in
Sounds like the problem is that classloaders always look in their parents
before themselves, and Spark users want executors to pick up classes from
their custom code before the ones in Spark plus its dependencies.
Would a custom classloader that delegates to the parent after first
checking itself
I don't think a customer classloader is necessary.
Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc
all run custom user code that creates new user objects without
reflection. I should go see how that's done. Maybe it's totally valid
to set the thread's context classloader
Hello,
I am new to the Spark community, so I apologize if I ask something
obvious.
I follow the document about contribution to Spark where it's written that
I need to fork the https://github.com/apache/spark repository.
I got a little bit confused since the repository
“master” is where development happens, while branch-1.0, branch-0.9, etc are
for maintenance releases in those versions. Most likely if you want to
contribute you should use master. Some of the other named branches were for big
features in the past, but none are actively used now.
Matei
On
graph.triplets does not work -- it returns incorrect results
I have a graph with the following edges:
orig_graph.edges.collect
= Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1),
Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1),
Edge(5,2,1), Edge(5,3,1),
This was an optimization that reuses a triplet object in GraphX, and when
you do a collect directly on triplets, the same object is returned.
It has been fixed in Spark 1.0 here:
https://issues.apache.org/jira/browse/SPARK-1188
To work around in older version of Spark, you can add a copy step to
Thanks, rxin, this worked!
I am having a similar problem with .reduce... do I need to insert .copy()
functions in that statement as well?
This part works:
orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr),
I tried adding .copy() everywhere, but still only get one element returned,
not even an RDD object.
orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce(
(A,B) = { if (A._1.copy().dstId == B._1.copy().srcId)
I am not much comfortable with sbt. I want to build a standalone application
using spark 1.0 RC9. I can build sbt assembly for my application with Spark
0.9.1, and I think in that case spark is pulled from Aka Repository?
Now if I want to use 1.0 RC9 for my application; what is the process ?
Hi, ALL
My question is that spark and impala , which is more fitter for MPP .
The motivition as below case:
1. three big table need make join operation; (about 100 field per
table, more than 1TB per table)
2. beside above tables, it is very possible to they
en, you have to put spark-assembly-*.jar to the lib directory of your
application
Best,
--
Nan Zhu
On Monday, May 19, 2014 at 9:48 PM, nit wrote:
I am not much comfortable with sbt. I want to build a standalone application
using spark 1.0 RC9. I can build sbt assembly for my application
just rerun my test on rc5
everything works
build applications with sbt and the spark-*.jar which is compiled with Hadoop
2.3
+1
--
Nan Zhu
On Sunday, May 18, 2014 at 11:07 PM, witgo wrote:
How to reproduce this bug?
-- Original --
From: Patrick
That's the crude way to do it. If you run `sbt/sbt publishLocal`, then you
can resolve the artifact from your local cache in the same way that you
would resolve it if it were deployed to a remote cache. That's just the
build step. Actually running the application will require the necessary
jars
Having a user add define a custom class inside of an added jar and
instantiate it directly inside of an executor is definitely supported
in Spark and has been for a really long time (several years). This is
something we do all the time in Spark.
DB - I'd hold off on a re-architecting of this
Whenever we publish a release candidate, we create a temporary maven
repository that host the artifacts. We do this precisely for the case
you are running into (where a user wants to build an application
against it to test).
You can build against the release candidate by just adding that
reduce always return a single element - maybe you are misunderstanding what
the reduce function in collections does.
On Mon, May 19, 2014 at 3:32 PM, GlennStrycker glenn.stryc...@gmail.comwrote:
I tried adding .copy() everywhere, but still only get one element returned,
not even an RDD
It just hit me why this problem is showing up on YARN and not on standalone.
The relevant difference between YARN and standalone is that, on YARN, the
app jar is loaded by the system classloader instead of Spark's custom URL
classloader.
On YARN, the system classloader knows about [the classes
Thanks everyone. I followed Patrick's suggestion and it worked like a charm.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-1-0-standalone-application-tp6698p6710.html
Sent from the Apache Spark Developers List mailing list archive at
We're cancelling this RC in favor of rc10. There were two blockers: an
issue with Windows run scripts and an issue with the packaging for
Hadoop 1 when hive support is bundled.
https://issues.apache.org/jira/browse/SPARK-1875
https://issues.apache.org/jira/browse/SPARK-1876
Thanks everyone for
On a related note there is also a staging Apache repository where the
latest rc gets pushed to
https://repository.apache.org/content/repositories/staging/org/apache/spark/spark-core_2.10/--
The artifact here is just named 1.0.0 (similar to the rc specific
repository that Patrick mentioned). So if
23 matches
Mail list logo