I’ve found some references online to various implementations (such as Dendrite) 
leveraging HDFS via TitanDB + HBase for graph processing.  GraphLab also uses 
HDFS/Hadoop.  I am wondering if (and how) one might use TitanDB + Cassandra as 
the data source for Spark GraphX?  The Gremlin language seems more targeted 
towards basic traversals rather than analytics, and I’m unsure the performance 
of attempting to use Gremlin to load sub-graphs up into GraphX for analysis.  
For example, if I have a large property graph and wish to run algorithms to 
find similar sub-graphs within, would TitanDB/Gremlin even be a consideration?  
The underlying data model that Titan uses in Cassandra does not seem accessible 
for direct querying via CQL/Thrift.

Any guidance around this nebulous subject is much appreciated!

Joe Bako
Software Architect
Gracenote, Inc.
Mobile: 925.818.2230
http://www.gracenote.com/

[cid:24DDC72C-B607-4624-9CB7-8DB5E866F2BF]

Reply via email to