Hi Marcin, Thanks for the reply. I've just spent a couple of days looking into prediction.io tutorials, so forgive me if I missed something. :)
IMO there are several pieces that needs to work together to make streaming work really well: 1. Online learning algorithm - MLlib provides 2 streaming algorithm (linear regression and k-means). Arguably more algorithms would be needed. This falls outside the prerogative of prediction.io, but it could mean that prediction.io need to integrate with other platform that provides online learning. I've been looking at other online learning projects but haven't got a good grasp on the landscape of this area. 2. Model training - it seems current prediction.io framework can be tweaked a little to make this work with MLlib streaming algorithms? Basically move model training to 'pio deploy' step, where the model is trained using DStream. 3. Data ingestion - There is probably going to be 2 modes for online training: store no data, or store data under a retention policy. We need a real-time ingestion mechanism other than REST (e.g. Kafka) as well. 4. Prediction - existing prediction API is relevant, but should also consider proactive predictions (like suggesting anomalies in data) and feedback mechanism. Perhaps we need "data sink" concept which can proactively generate notifications. Please let me know what you think. Thanks! James On Fri, Sep 30, 2016 at 4:01 AM, Marcin Ziemiński <[email protected]> wrote: > Hi James, > > Incorporating Spark Streaming or Structured Streaming in PredictionIO will > probably involve significant changes in the architecture. We are currently > in the state of rethinking the design, so that it could enable different > approaches of processing data. Future releases (after 0.10) should bring > some changes, I hope that introducing streaming will be one of them. > Do you have any thoughts on how you imagine putting stream processing into > PredictionIO and using it this way? Any input in this matter would be of a > great value. > > Thanks, > Marcin > > pt., 30.09.2016 o 01:02 użytkownik James Wu <[email protected]> napisał: >> >> Hi, >> >> Are there any plans for prediction.io to integrate with Spark >> Streaming and support the streaming algorithms in MLlib? >> >> Thanks! James
