Re: Spark for core business-logic? - Replacing: MongoDB?

Simon Chan Mon, 05 Jan 2015 22:57:07 -0800

PredictionIO comes with a event server that handles data collection:
http://docs.prediction.io/datacollection/overview/
It's based on HBase, which works fine with Spark as the data store of the
event/training data.


You probably need a separate CRUD-supported database for your application.
Your application will then communicate with PredictionIO after
authentication is done, for example.

I hope it helps.

Regards,
Simon

On Mon, Jan 5, 2015 at 6:22 PM, Alec Taylor <[email protected]> wrote:

> Thanks Simon, that's a good way to train on incoming events (and
> related problems / and result computations).
>
> However, does it handle the actual data storage? - E.g.: CRUD documents
>
> On Tue, Jan 6, 2015 at 1:18 PM, Simon Chan <[email protected]> wrote:
> > Alec,
> >
> > If you are looking for a Machine Learning stack that supports
> > business-logics, you may take a look at PredictionIO:
> > http://prediction.io/
> >
> > It's based on Spark and HBase.
> >
> > Simon
> >
> >
> > On Mon, Jan 5, 2015 at 6:14 PM, Alec Taylor <[email protected]>
> wrote:
> >>
> >> Thanks all. To answer your clarification questions:
> >>
> >> - I'm writing this in Python
> >> - A similar problem to my actual one is to find common 30 minute slots
> >> (over the next 12 months) [r] that k users have in common. Total
> >> users: n. Given n=10000 and r=17472 then the [naïve] time-complexity
> >> is $\mathcal{O}(nr)$. n*r=17,472,000. I may be able to get
> >> $\mathcal{O}(n \log r)$ if not $\log \log$ from reading the literature
> >> on sequence matching, however this is uncertain.
> >>
> >> So assuming all the other business-logic which needs to be built in,
> >> such as authentication and various other CRUD operations, as well as
> >> this more intensive sequence searching operation, what stack would be
> >> best for me?
> >>
> >> Thanks for all suggestions
> >>
> >> On Mon, Jan 5, 2015 at 4:24 PM, Jörn Franke <[email protected]>
> wrote:
> >> > Hallo,
> >> >
> >> > It really depends on your requirements, what kind of machine learning
> >> > algorithm your budget, if you do currently something really new or
> >> > integrate
> >> > it with an existing application, etc.. You can run MongoDB as well as
> a
> >> > cluster. I don't think this question can be answered generally, but
> >> > depends
> >> > on details of your case.
> >> >
> >> > Best regards
> >> >
> >> > Le 4 janv. 2015 01:44, "Alec Taylor" <[email protected]> a
> écrit :
> >> >>
> >> >> In the middle of doing the architecture for a new project, which has
> >> >> various machine learning and related components, including:
> >> >> recommender systems, search engines and sequence [common
> intersection]
> >> >> matching.
> >> >>
> >> >> Usually I use: MongoDB (as db), Redis (as cache) and celery (as
> queue,
> >> >> backed by Redis).
> >> >>
> >> >> Though I don't have experience with Hadoop, I was thinking of using
> >> >> Hadoop for the machine-learning (as this will become a Big Data
> >> >> problem quite quickly). To push the data into Hadoop, I would use a
> >> >> connector of some description, or push the MongoDB backups into HDFS
> >> >> at set intervals.
> >> >>
> >> >> However I was thinking that it might be better to put the whole thing
> >> >> in Hadoop, store all persistent data in Hadoop, and maybe do all the
> >> >> layers in Apache Spark (with caching remaining in Redis).
> >> >>
> >> >> Is that a viable option? - Most of what I see discusses Spark (and
> >> >> Hadoop in general) for analytics only. Apache Phoenix exposes a nice
> >> >> interface for read/write over HBase, so I might use that if Spark
> ends
> >> >> up being the wrong solution.
> >> >>
> >> >> Thanks for all suggestions,
> >> >>
> >> >> Alec Taylor
> >> >>
> >> >> PS: I need this for both "Big" and "Small" data. Note that I am using
> >> >> the Cloudera definition of "Big Data" referring to processing/storage
> >> >> across more than 1 machine.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>

Re: Spark for core business-logic? - Replacing: MongoDB?

Reply via email to