On Tue, Apr 21, 2015 at 10:39 AM, mas <mas.ha...@gmail.com> wrote: > How does GraphX stores the routing table? Is it stored on the master node > or > chunks of the routing table is send to each partition that maintains the > record of vertices and edges at that node? >
The latter: the routing table is stored alongside the vertices, and for each vertex it stores the set of edge partitions that reference that vertex. If only customized edge partitioning is performed will the corresponding > vertices be sent to same partition or not ? If I understand correctly, you're asking whether it's possible to colocate the vertices with the edges so they don't have to move during replication. It's possible to do this in some cases by partitioning each edge based on a hash partitioner of its source or destination vertex. GraphX will still do replication using a shuffle, but most of the shuffle files should be local in this case. I tried this a while ago but didn't find a very big improvement for PageRank. Ultimately a more general solution would be to unify the vertex and edge RDDs by designating one replica for each vertex as the master. This would also reduce the storage cost by a factor of (average degree - 1)/(average degree). Ankur <http://www.ankurdave.com/>