Graph partitioning and data locality

Martin Junghanns Mon, 03 Nov 2014 23:37:34 -0800

Hi group,

I got a question concerning the graph partitioning step. If I understoodthe code correctly, the graph is distributed to n partitions by usingvertexID.hashCode() & n. I got two questions concerning that step.

1) Is the whole graph loaded and partitioned only by the Master? Thiswould mean, the whole data has to be moved to that Master map job andthen moved to the physical node the specific worker for the partitionruns on. As this sounds like a huge overhead, I further inspected the code:I saw that there is also a WorkerGraphPartitioner and I assume he callsthe partitioning method on his local data (lets say his local HDFSblocks) and if the resulting partition for a vertex is not himself, thedata gets moved to that worker, which reduces the overhead. Is thisassumption correct?

2) Let's say the graph is already partitioned in the file system, e.g.blocks on physical nodes contain logical connected graph nodes. Is itpossible to just read the data as it is and skip the partitioning step?In that case I currently assume, that the vertexID should contain thepartitionID and the custom partitioning would be an identity function inthat case (instead of hashing or range).


Thanks for your time and help!

Cheers,
Martin

Graph partitioning and data locality

Reply via email to