You can almost certainly take half of the UUID safely, assuming you're using random UUIDs. You could work out the math if you're really concerned, but the probability of a collision in 64 bits is probably pretty low even with a very large data set. If your UUIDs aren't version 4, you probably just need to select a good subset of the bits (e.g. avoid the MAC address in version 1 UUIDs).
February 24, 2014 at 10:53 AM
Hi Evan,

Thanks for the quick response. The only mapping between UUIDs and Longs that
I can think of is one where I sequentially assign Longs as I load the UUIDs
from the DB. But this results in having to centralize this mapping. I am
guessing that centralizing this is not a good idea for a distributed graph
processing engine.

Also, I will be running Spark on the same nodes as my distributed DB
(Cassandra) and I am hoping that the Spark worker on each node will load the
data from the local Cassandra node. I am not sure if this is possible with
GraphX, but I am hoping it is, and therefore my concern with centralizing
the UUID<->Long mapping.

Thanks.

-deepak




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-with-UUID-vertex-IDs-instead-of-Long-tp1953p1982.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
February 23, 2014 at 7:14 PM
How about generating Map[UUID,Long] and one in reverse, then use that map to replace your userids with the vertex IDs that graphx expects, and then reverse this process when presenting results? You could probably even do this with implicits and keep it fairly low overhead from a code perspective.

February 23, 2014 at 6:39 PM
Hi,

I am new to Spark and GraphX (I have read the documentation and tried out
basic Spark examples).

I am interested in using GraphX to process some data in my DB. I use UUIDs
to identify my data, but I see that GraphX uses Long to identify the
vertices (VertexId is defined to be of type Long).

I can redefine VertexId to be java.util.UUID and see if it compiles and
continues to work, but I am concerned that this may not work in future
releases even if it works now.

I did not want to log an enhancement ticket for this without first asking
about this on the mailing list. Also, I was not sure if this should be on
the developer list (most of the posts I saw on that list were related to
pull-requests).

I skipped the Bagel documentation since it said it was being replaced with
GraphX. I could be wrong, but a brief scan of Bagel documentation gave me
the impression that I might be able to use Bagel with UUID identifiers. So
if GraphX is considered a replacement for Bagel, I am hoping UUIDs will be
supported in GraphX. Maybe the Graph can be parameterized with the type of
the vertex-ID?

Any thoughts on how I should proceed? Is this of interest to anyone else?

Thanks.

-deepak




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-with-UUID-vertex-IDs-instead-of-Long-tp1953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to