At 2014-11-11 01:51:43 +, Buttler, David buttl...@llnl.gov wrote:
I am building a graph from a large CSV file. Each record contains a couple
of nodes and about 10 edges. When I try to load a large portion of the
graph, using multiple partitions, I get inconsistent results in the number of
edges between different runs. However, if I use a single partition, or a
small portion of the CSV file (say 1000 rows), then I get a consistent number
of edges. Is there anything I should be aware of as to why this could be
happening in GraphX?
Is it possible there's some nondeterminism in the way you're reading the file?
It would be helpful if you could post the code you're using to load the graph.
Ankur
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org