I agree, typically anything but hash partitioning results in hot keys and OOM on the nodes hosting oversized partitions On Sep 4, 2015 10:32 AM, "Flavian Jacquot" <[email protected]> wrote:
> Hi, > I did an internship in that field. I used spinner to compute a better > partitionning, here is the paper : http://arxiv.org/pdf/1404.3861v1.pdf > and the website http://grafos.ml/okapi.html > > I'm not sure if you want to use giraph to compute a patitioning or use it > or both ? > Anyway look at > https://github.com/grafos-ml/okapi/tree/master/src/main/java/ml/grafos/okapi/spinner > it will show you an example. > In fact you need to extends HashWorkerPartitioner and override > getPartitionOwner > > Also you need to change the type of vertex id in your compute class by > PartitionnedLongWritable in order to map vertex id and partition id. > There is an option giraph.graphPartitionerFactoryClass you need to change > to use your own partitioner class : http://giraph.apache.org/options.html > > Sorry if it's not very clear, it's been 6 month i didn't look at this. > > If your goal is to improve perf with partitionning you may be desapointed > because a non-random partitioning will cause an overload on one of the > machines. The hack to maintain perf is to assign more partitions per worker > (128 in Spinner paper). > > Regards, > Flavian > > > 2015-09-03 12:21 GMT+02:00 prateeksha varshney <[email protected]>: > >> Hi >> >> I am new to giraph and want to know how one can change the partitioning >> algorithm of giraph. >> In other words I want to repartition the graph using a different >> algorithm. >> Any suggestions and solutions for this will be highly appreciated. >> >> Thanks >> Prateeksha >> > >
