Hej,

I have a huge edgelist (several billion edges) where node ID's are URL's.
The algorithm I want to run needs the ID's to be long and there should be
no holes in the ID space (so I cant simply hash the URL's).

Is anyone aware of a simple solution that does not require a impractical
huge hash map?

My idea currently is to load the graph into another giraph job and then
assigning a number to each node. This way the mapping of number to URL
would be stored in the Node.
Problem is that I have to assign the numbers in a sequential way to ensure
there are no holes and numbers are unique. No Idea if this is even possible
in Giraph.

Any input is welcome

cheers Martin

Reply via email to