Apache Spark supports integration with HBase (which has REST API). What's the amount of data you want to store in this system ?
Cheers On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor <[email protected]> wrote: > I am architecting a platform incorporating: recommender systems, > information retrieval (ML), sequence mining, and Natural Language > Processing. > > Additionally I have the generic CRUD and authentication components, > with everything exposed RESTfully. > > For the storage layer(s), there are a few options which immediately > present themselves: > > Generic CRUD layer (high speed needed here, though I suppose I could use > Redis…) > > - Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema > SQL layer atop > - Apache Spark (perhaps piping to HDFS)… ¿maybe? > - MongoDB (or a similar document-store), a graph-database, or even > something like Postgres > > Analytics layer (to enable Big Data / Data-intensive computing features) > > - Apache Spark > - Hadoop with MapReduce and/or utilising some other Apache / > non-Apache project with integration > - Disco (from Nokia) > > ________________________________ > > Should I prefer one layer—e.g.: on HDFS—over multiple disparite > layers? - The advantage here is obvious, but I am certain there are > disadvantages. (and yes, I know there are various ways; automated and > manual; to push data from non HDFS-backed stores to HDFS) > > Also, as a bonus answer, which stack would you recommend for this > user-network I'm building? >
