Hi Ravikant, Thanks for responding to my query. This definitely helped me validate the steps I did for creating creating a custom partitioner. I am currently looking for a mechanism in Giraph which lets me hold a global value to be used by giraph application across the distributed system; like hadoop counter for hadoop jobs.
I read about Giraph Aggregators; which does the same thing for Giraph jobs, but I am still figuring out a way to invoke aggregator from my custom partitioner class. The examples I see normally calls Aggregators from Computation classes only. Any pointers here would be helpful ! or if there is any alternative way of maintaining a global variable across the workers in Giraph, please do let me know Best Regards, Neha Raj On Tue, Jul 10, 2018 at 2:22 PM, Neha Raj <neharaj...@gmail.com> wrote: > Hi, > > I am working on a Graph Partitioning algorithms, and have chosen Giraph as > a Graph processing system to run Graph problems, and very new to both.I > would like to provide external partitioning information(in the form of txt > file) to Giraph. For this I have created a custom partition (something like > HashPartitionFactory), which reads the external file for graph partition Id. > > While debugg I realize that this parition logic is invoked several times > (during the Giraph supersteps) ,and reading the same external file multiple > times is not time efficient. To handle this I wish to create a > global(across distributed system) Map variable which holds {vertex Id , > partition Id} as a key value pair, and I want to populate this variable > from external file one time during a Giraph job run. I have tried several > ways to create & intialize such a global variable but the fact that global > variable will be populated for a Giraph job is very non deterministic (i.e > sometime the map is populated with value, sometimes not). > > I think there might be some issue in how I am creating the Map variable > and initializing it to be invoked before My custom Partitioning logic calls > it. Can somebody please guide me the correct place to plugin this piece of > information to a Giraph job; and possibly a correct way of creating a > global variable with respect to Giraph distributed processing > > Thanks & Regards, > Neha >