This problem occurs because graph.triplets generates an iterator that reuses
the same EdgeTriplet object for every triplet in the partition. The
workaround is to force a copy using graph.triplets.map(_.copy()).
The solution in the AMPCamp tutorial is mistaken -- I'm not sure if that
ever worked.
The workaround is to force a copy using graph.triplets.map(_.copy()).
Sorry, this actually won't copy the entire triplet, only the attributes
defined in Edge. The right workaround is to copy the EdgeTriplet explicitly:
graph.triplets.map { et =
val et2 = new EdgeTriplet[VD, ED] // Replace
The examples in graphx/data are meant to show the input data format, but if
you want to play around with larger and more interesting datasets, we've
been using the following ones, among others:
- SNAP's web-Google dataset (5M edges):
https://snap.stanford.edu/data/web-Google.html
- SNAP's
I haven't been getting mail either. This was the last message I received:
http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5491.html
--
View this message in context:
Unfortunately it's very difficult to get uncaching right with GraphX due to
the complicated internal dependency structure that it creates. It's
necessary to know exactly what operations you're doing on the graph in order
to unpersist correctly (i.e., in a way that avoids recomputation).
I have a
In general, you can find out exactly what's not serializable by adding
-Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS.
Since a this reference to the enclosing class is often what's causing the
problem, a general workaround is to move the mapPartitions call to a static
method
Sorry, I missed vertex 6 in that example. It should be [{1}, {1}, {1}, {1},
{1, 6}, {6}, {7}, {7}].
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6378.html
Sent from the Apache Spark User List mailing list archive at
You should be able to construct the edges in a single map() call without
using collect():
val edges: RDD[Edge[String]] = sc.textFile(...).map { line =
val row = line.split(,)
Edge(row(0), row(1), row(2)
}
val graph: Graph[Int, String] = Graph.fromEdges(edges, defaultValue = 1)
--
View this