Re: Question about range partitioner and data locality

Avery Ching Fri, 25 May 2012 11:07:29 -0700

Writing a range based partitioner is for potentially reducing the numberof messages between workers (i.e. reverse lexical ordering of urls forpage rank). Without changes to the input splits loading, the averagenumber vertices shipped during the input superstep will be the same asthe using the hash partitioner. Is this what you are trying to achieve?


Avery


On 5/25/12 10:57 AM, Yuanyuan Tian wrote:

I am not suggesting to change the current range partitioner, as it isdesigned for a general case. I want to write a special partitionerbased on the existing range partitioner to achieve what I want to doin this special situation, but I don't know how.
Yuanyuan

-----Avery Ching <[email protected]> wrote: -----
To: [email protected]
From: Avery Ching <[email protected]>
Date: 05/24/2012 11:59PM
Subject: Re: Question about range partitioner and data locality
You are definitely right that the old version of Giraph supportedranges pretty well for loading, but could not support hash baseddistribution (much better for memory distribution across workers). Italso made a lot of assumptions (the data within each split was in aunique range and sorted).
Unless we make these type of assumptions, it would be pretty hard todo. One way might be to have all the workers examine each inputsplit, and each input split would provide on information as to itsrange. If the worker matches that range, it would attempt to loadsome or all of the vertices in that split. Otherwise, it would trythe next split.
Any other ideas?

Avery

On 5/23/12 5:36 PM, Yuanyuan Tian wrote:
Hi,
I want to use better partitions of input graph for my algorithmrunning on Giraph. So, I partitioned my input graph and re-labeledthe vertex ids so that vertex ids of the same partition are in aconsecutive range. I also reorganized the input file so that thevertices in the same range are together. I used the range partitionerfor the Giraph job to utilize the better partitions. However, thevertex loader still looks for the partition id of each vertex andship it to the worker that owns the partition. On the other hand, Ihave already prepared my data in a nice way, in the ideal case, I canjust keep all the vertices of an inputsplit local to thecorresponding worker. Is there an easy way to do this? I know that inthe very old version of giraph, giraph doesn't have a partitioner.The users have to prepare the partitions. I essentially want to do asimilar thing in the current version of giraph. Please give me apointer or two on how to do this.
Thanks,
Yuanyuan

Re: Question about range partitioner and data locality

Reply via email to