I’ve found some references online to various implementations (such as Dendrite) leveraging HDFS via TitanDB + HBase for graph processing. GraphLab also uses HDFS/Hadoop. I am wondering if (and how) one might use TitanDB + Cassandra as the data source for Spark GraphX? The Gremlin language seems more targeted towards basic traversals rather than analytics, and I’m unsure the performance of attempting to use Gremlin to load sub-graphs up into GraphX for analysis. For example, if I have a large property graph and wish to run algorithms to find similar sub-graphs within, would TitanDB/Gremlin even be a consideration? The underlying data model that Titan uses in Cassandra does not seem accessible for direct querying via CQL/Thrift.
Any guidance around this nebulous subject is much appreciated! Joe Bako Software Architect Gracenote, Inc. Mobile: 925.818.2230 http://www.gracenote.com/ [cid:24DDC72C-B607-4624-9CB7-8DB5E866F2BF]