Hi,
I did same think in two M/R jobs during preprocesing - it was pretty powerful for web graphs but little bit slow.

Solution for Giraph is:
1. Implement own partition which will iterate vertices in order. Use appropriate partitioner. 2. During first iteration you need to rename vertexes in each partition without holes. Holes will be only between partitions. At the end, get min and max vertex index for each partion, send it to master in aggregator and compute mapping required to delete holes. 3. During second iteration iterate all vertexes and delete holes by shifting vertex indexes.

4. .... rename edges (two more iterations)...

Btw: Why do you need such indexes ? For HLL ?

Lukas

On 15.4.2014 15:33, Martin Neumann wrote:
Hej,

I have a huge edgelist (several billion edges) where node ID's are URL's.
The algorithm I want to run needs the ID's to be long and there should be no holes in the ID space (so I cant simply hash the URL's).

Is anyone aware of a simple solution that does not require a impractical huge hash map?

My idea currently is to load the graph into another giraph job and then assigning a number to each node. This way the mapping of number to URL would be stored in the Node. Problem is that I have to assign the numbers in a sequential way to ensure there are no holes and numbers are unique. No Idea if this is even possible in Giraph.

Any input is welcome

cheers Martin

Reply via email to