Hi Joe

A while ago I was running a Titan + HBase datastore to store graph data. I
then used Spark (via TitanHBaseInputFormat, you could use the Cassandra
version) to access a RDD[Vertex] that I performed analytics and machine
learning on. That could form the basis of putting the data into a form
usable in GraphX.

The talk here gives a bit of info on this including a little code snippet:
https://spark-summit.org/2014/using-spark-and-shark-to-power-a-real-time-recommendation-and-customer-intelligence-platform

Titan also provides Faunus (or I think it is now Gremlin-Hadoop), though
that is Hadoop-only at the moment.

On Tue, Jan 26, 2016 at 10:19 PM, Joe Bako <jb...@gracenote.com> wrote:

> I’ve found some references online to various implementations (such as
> Dendrite) leveraging HDFS via TitanDB + HBase for graph processing.
> GraphLab also uses HDFS/Hadoop.  I am wondering if (and how) one might use
> TitanDB + Cassandra as the data source for Spark GraphX?  The Gremlin
> language seems more targeted towards basic traversals rather than
> analytics, and I’m unsure the performance of attempting to use Gremlin to
> load sub-graphs up into GraphX for analysis.  For example, if I have a
> large property graph and wish to run algorithms to find similar sub-graphs
> within, would TitanDB/Gremlin even be a consideration?  The underlying data
> model that Titan uses in Cassandra does not seem accessible for direct
> querying via CQL/Thrift.
>
> Any guidance around this nebulous subject is much appreciated!
>
> Joe Bako
> Software Architect
> Gracenote, Inc.
> Mobile: 925.818.2230
> http://www.gracenote.com/
>
> [cid:24DDC72C-B607-4624-9CB7-8DB5E866F2BF]
>
>

Reply via email to