PredictionIO comes with a event server that handles data collection: http://docs.prediction.io/datacollection/overview/ It's based on HBase, which works fine with Spark as the data store of the event/training data.
You probably need a separate CRUD-supported database for your application. Your application will then communicate with PredictionIO after authentication is done, for example. I hope it helps. Regards, Simon On Mon, Jan 5, 2015 at 6:22 PM, Alec Taylor <[email protected]> wrote: > Thanks Simon, that's a good way to train on incoming events (and > related problems / and result computations). > > However, does it handle the actual data storage? - E.g.: CRUD documents > > On Tue, Jan 6, 2015 at 1:18 PM, Simon Chan <[email protected]> wrote: > > Alec, > > > > If you are looking for a Machine Learning stack that supports > > business-logics, you may take a look at PredictionIO: > > http://prediction.io/ > > > > It's based on Spark and HBase. > > > > Simon > > > > > > On Mon, Jan 5, 2015 at 6:14 PM, Alec Taylor <[email protected]> > wrote: > >> > >> Thanks all. To answer your clarification questions: > >> > >> - I'm writing this in Python > >> - A similar problem to my actual one is to find common 30 minute slots > >> (over the next 12 months) [r] that k users have in common. Total > >> users: n. Given n=10000 and r=17472 then the [naïve] time-complexity > >> is $\mathcal{O}(nr)$. n*r=17,472,000. I may be able to get > >> $\mathcal{O}(n \log r)$ if not $\log \log$ from reading the literature > >> on sequence matching, however this is uncertain. > >> > >> So assuming all the other business-logic which needs to be built in, > >> such as authentication and various other CRUD operations, as well as > >> this more intensive sequence searching operation, what stack would be > >> best for me? > >> > >> Thanks for all suggestions > >> > >> On Mon, Jan 5, 2015 at 4:24 PM, Jörn Franke <[email protected]> > wrote: > >> > Hallo, > >> > > >> > It really depends on your requirements, what kind of machine learning > >> > algorithm your budget, if you do currently something really new or > >> > integrate > >> > it with an existing application, etc.. You can run MongoDB as well as > a > >> > cluster. I don't think this question can be answered generally, but > >> > depends > >> > on details of your case. > >> > > >> > Best regards > >> > > >> > Le 4 janv. 2015 01:44, "Alec Taylor" <[email protected]> a > écrit : > >> >> > >> >> In the middle of doing the architecture for a new project, which has > >> >> various machine learning and related components, including: > >> >> recommender systems, search engines and sequence [common > intersection] > >> >> matching. > >> >> > >> >> Usually I use: MongoDB (as db), Redis (as cache) and celery (as > queue, > >> >> backed by Redis). > >> >> > >> >> Though I don't have experience with Hadoop, I was thinking of using > >> >> Hadoop for the machine-learning (as this will become a Big Data > >> >> problem quite quickly). To push the data into Hadoop, I would use a > >> >> connector of some description, or push the MongoDB backups into HDFS > >> >> at set intervals. > >> >> > >> >> However I was thinking that it might be better to put the whole thing > >> >> in Hadoop, store all persistent data in Hadoop, and maybe do all the > >> >> layers in Apache Spark (with caching remaining in Redis). > >> >> > >> >> Is that a viable option? - Most of what I see discusses Spark (and > >> >> Hadoop in general) for analytics only. Apache Phoenix exposes a nice > >> >> interface for read/write over HBase, so I might use that if Spark > ends > >> >> up being the wrong solution. > >> >> > >> >> Thanks for all suggestions, > >> >> > >> >> Alec Taylor > >> >> > >> >> PS: I need this for both "Big" and "Small" data. Note that I am using > >> >> the Cloudera definition of "Big Data" referring to processing/storage > >> >> across more than 1 machine. > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [email protected] > >> >> For additional commands, e-mail: [email protected] > >> >> > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > >
