Re: Graph partitioning and data locality

Martin Junghanns Mon, 03 Nov 2014 23:41:44 -0800

sorry for the typo (no coffee yet): vertexID.hashCode() *%* n


On 04.11.2014 08:36, Martin Junghanns wrote:

Hi group,
I got a question concerning the graph partitioning step. If Iunderstood the code correctly, the graph is distributed to npartitions by using vertexID.hashCode() & n. I got two questionsconcerning that step.
1) Is the whole graph loaded and partitioned only by the Master? Thiswould mean, the whole data has to be moved to that Master map job andthen moved to the physical node the specific worker for the partitionruns on. As this sounds like a huge overhead, I further inspected thecode:I saw that there is also a WorkerGraphPartitioner and I assume hecalls the partitioning method on his local data (lets say his localHDFS blocks) and if the resulting partition for a vertex is nothimself, the data gets moved to that worker, which reduces theoverhead. Is this assumption correct?
2) Let's say the graph is already partitioned in the file system, e.g.blocks on physical nodes contain logical connected graph nodes. Is itpossible to just read the data as it is and skip the partitioningstep? In that case I currently assume, that the vertexID shouldcontain the partitionID and the custom partitioning would be anidentity function in that case (instead of hashing or range).
Thanks for your time and help!

Cheers,
Martin

Re: Graph partitioning and data locality

Reply via email to