Hej, I have a huge edgelist (several billion edges) where node ID's are URL's. The algorithm I want to run needs the ID's to be long and there should be no holes in the ID space (so I cant simply hash the URL's).
Is anyone aware of a simple solution that does not require a impractical huge hash map? My idea currently is to load the graph into another giraph job and then assigning a number to each node. This way the mapping of number to URL would be stored in the Node. Problem is that I have to assign the numbers in a sequential way to ensure there are no holes and numbers are unique. No Idea if this is even possible in Giraph. Any input is welcome cheers Martin
