bq. Is Apache Spark good as a general database I don't think Spark itself is a general database though there're connectors to various NoSQL databases, including HBase.
bq. using their graph database features? Sure. Take a look at http://spark.apache.org/graphx/ Cheers On Tue, Jan 20, 2015 at 9:02 PM, Alec Taylor <[email protected]> wrote: > Small amounts in a one node cluster (at first). > > As it scales I'll be looking at running various O(nk) algorithms, > where n is the number of distinct users and k are the overlapping > features I want to consider. > > Is Apache Spark good as a general database as well as it's more fancy > features? - E.g.: considering I'm building a network, maybe using > their graph database features? > > On Wed, Jan 21, 2015 at 2:27 AM, Ted Yu <[email protected]> wrote: > > Apache Spark supports integration with HBase (which has REST API). > > > > What's the amount of data you want to store in this system ? > > > > Cheers > > > > On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor <[email protected]> > wrote: > >> > >> I am architecting a platform incorporating: recommender systems, > >> information retrieval (ML), sequence mining, and Natural Language > >> Processing. > >> > >> Additionally I have the generic CRUD and authentication components, > >> with everything exposed RESTfully. > >> > >> For the storage layer(s), there are a few options which immediately > >> present themselves: > >> > >> Generic CRUD layer (high speed needed here, though I suppose I could use > >> Redis…) > >> > >> - Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema > >> SQL layer atop > >> - Apache Spark (perhaps piping to HDFS)… ¿maybe? > >> - MongoDB (or a similar document-store), a graph-database, or even > >> something like Postgres > >> > >> Analytics layer (to enable Big Data / Data-intensive computing features) > >> > >> - Apache Spark > >> - Hadoop with MapReduce and/or utilising some other Apache / > >> non-Apache project with integration > >> - Disco (from Nokia) > >> > >> ________________________________ > >> > >> Should I prefer one layer—e.g.: on HDFS—over multiple disparite > >> layers? - The advantage here is obvious, but I am certain there are > >> disadvantages. (and yes, I know there are various ways; automated and > >> manual; to push data from non HDFS-backed stores to HDFS) > >> > >> Also, as a bonus answer, which stack would you recommend for this > >> user-network I'm building? > > > > >
