Re: inconsistent edge counts in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-11 01:51:43 +, Buttler, David buttl...@llnl.gov wrote:
 I am building a graph from a large CSV file.  Each record contains a couple 
 of nodes and about 10 edges.  When I try to load a large portion of the 
 graph, using multiple partitions, I get inconsistent results in the number of 
 edges between different runs.  However, if I use a single partition, or a 
 small portion of the CSV file (say 1000 rows), then I get a consistent number 
 of edges.  Is there anything I should be aware of as to why this could be 
 happening in GraphX?

Is it possible there's some nondeterminism in the way you're reading the file? 
It would be helpful if you could post the code you're using to load the graph.

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



inconsistent edge counts in GraphX

2014-11-10 Thread Buttler, David
Hi,
I am building a graph from a large CSV file.  Each record contains a couple of 
nodes and about 10 edges.  When I try to load a large portion of the graph, 
using multiple partitions, I get inconsistent results in the number of edges 
between different runs.  However, if I use a single partition, or a small 
portion of the CSV file (say 1000 rows), then I get a consistent number of 
edges.  Is there anything I should be aware of as to why this could be 
happening in GraphX?

Thanks,
Dave