Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

Alec Taylor Tue, 20 Jan 2015 21:05:22 -0800

Small amounts in a one node cluster (at first).

As it scales I'll be looking at running various O(nk) algorithms,
where n is the number of distinct users and k are the overlapping
features I want to consider.


Is Apache Spark good as a general database as well as it's more fancy
features? - E.g.: considering I'm building a network, maybe using
their graph database features?

On Wed, Jan 21, 2015 at 2:27 AM, Ted Yu <[email protected]> wrote:
> Apache Spark supports integration with HBase (which has REST API).
>
> What's the amount of data you want to store in this system ?
>
> Cheers
>
> On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor <[email protected]> wrote:
>>
>> I am architecting a platform incorporating: recommender systems,
>> information retrieval (ML), sequence mining, and Natural Language
>> Processing.
>>
>> Additionally I have the generic CRUD and authentication components,
>> with everything exposed RESTfully.
>>
>> For the storage layer(s), there are a few options which immediately
>> present themselves:
>>
>> Generic CRUD layer (high speed needed here, though I suppose I could use
>> Redis…)
>>
>> - Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema
>> SQL layer atop
>> - Apache Spark (perhaps piping to HDFS)… ¿maybe?
>> - MongoDB (or a similar document-store), a graph-database, or even
>> something like Postgres
>>
>> Analytics layer (to enable Big Data / Data-intensive computing features)
>>
>> - Apache Spark
>> - Hadoop with MapReduce and/or utilising some other Apache /
>> non-Apache project with integration
>> - Disco (from Nokia)
>>
>> ________________________________
>>
>> Should I prefer one layer—e.g.: on HDFS—over multiple disparite
>> layers? - The advantage here is obvious, but I am certain there are
>> disadvantages. (and yes, I know there are various ways; automated and
>> manual; to push data from non HDFS-backed stores to HDFS)
>>
>> Also, as a bonus answer, which stack would you recommend for this
>> user-network I'm building?
>
>

Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

Reply via email to