On Tue, Apr 21, 2015 at 10:39 AM, mas <mas.ha...@gmail.com> wrote:

> How does GraphX stores the routing table? Is it stored on the master node
> or
> chunks of the routing table is send to each partition that maintains the
> record of vertices and edges at that node?
>

The latter: the routing table is stored alongside the vertices, and for
each vertex it stores the set of edge partitions that reference that
vertex.

If only customized edge partitioning is performed will the corresponding
> vertices be sent to same partition or not ?


If I understand correctly, you're asking whether it's possible to colocate
the vertices with the edges so they don't have to move during replication.
It's possible to do this in some cases by partitioning each edge based on a
hash partitioner of its source or destination vertex. GraphX will still do
replication using a shuffle, but most of the shuffle files should be local
in this case.

I tried this a while ago but didn't find a very big improvement for
PageRank. Ultimately a more general solution would be to unify the vertex
and edge RDDs by designating one replica for each vertex as the master.
This would also reduce the storage cost by a factor of (average degree -
1)/(average degree).

Ankur <http://www.ankurdave.com/>

Reply via email to