Re: Question about range partitioner and data locality

Avery Ching Thu, 24 May 2012 23:59:38 -0700

You are definitely right that the old version of Giraph supported rangespretty well for loading, but could not support hash based distribution(much better for memory distribution across workers). It also made alot of assumptions (the data within each split was in a unique range andsorted).

Unless we make these type of assumptions, it would be pretty hard todo. One way might be to have all the workers examine each input split,and each input split would provide on information as to its range. Ifthe worker matches that range, it would attempt to load some or all ofthe vertices in that split. Otherwise, it would try the next split.


Any other ideas?

Avery

On 5/23/12 5:36 PM, Yuanyuan Tian wrote:

Hi,
I want to use better partitions of input graph for my algorithmrunning on Giraph. So, I partitioned my input graph and re-labeled thevertex ids so that vertex ids of the same partition are in aconsecutive range. I also reorganized the input file so that thevertices in the same range are together. I used the range partitionerfor the Giraph job to utilize the better partitions. However, thevertex loader still looks for the partition id of each vertex and shipit to the worker that owns the partition. On the other hand, I havealready prepared my data in a nice way, in the ideal case, I can justkeep all the vertices of an inputsplit local to the correspondingworker. Is there an easy way to do this? I know that in the very oldversion of giraph, giraph doesn't have a partitioner. The users haveto prepare the partitions. I essentially want to do a similar thing inthe current version of giraph. Please give me a pointer or two on howto do this.
Thanks,
Yuanyuan

Re: Question about range partitioner and data locality

Reply via email to