THX Adam, having a separate topology for training is a actually a very nice idea. I will give it a try!
Cheers, Klaus On Wed, Feb 26, 2014 at 3:28 PM, Adam Lewis <[email protected]> wrote: > Hi Klaus, > > I've been dealing with similar use cases. I do a couple of things (which > may not be a final solution, but it is interesting to discuss alternate > approaches): I have passed trained models in the 200MB range through storm, > but I try to avoid it. The model gets dropped into persistence and then > only ID to the model is passed through the topology. So my training bolt > passes the whole model blob to the persistence bolt and that's it...in the > future I may even remove that step so that the model blob never gets > transferred by storm. Also, I use separate topologies for training, and > those tend to have timeouts much higher because the "train" aggregator can > take quite a while. Traditionally this would probably happen in Hadoop or > some other batch system, but I'm too busy to do the setup and storm is > handling it fine anyway. > > I don't have to do any polling because I have model selection running as a > logically different step, i.e. tuple shows up for prediction, run a > selection step which finds the model ID for scoring that tuple, then it > flows on to an actual scoring bolt which retrieves the model based on ID > and applies it to the tuple. If the creation of a new model leads you to > re-score "old" tuples, you could use the model write to trigger those > tuples to be replayed from some source of state such that they will pickup > the new model ID and proceed as normal. > > Best, > > Adam > > > > > On Wed, Feb 26, 2014 at 7:54 AM, Klausen Schaefersinho < > [email protected]> wrote: > >> THX, >> >> the idea is good, I will keep that in mind. The only drawback is that it >> relies on polling, what I do not like to much in the PredictionBolt. Off >> couse I could also pass S3 or File refernces around in the messages, to >> trigger an update. But for the sake of simplicity I was thinking of keeping >> everything in storm and do not rely if possible on other system. >> >> Cheers, >> >> Klaus >> >> >> On Wed, Feb 26, 2014 at 12:22 PM, Enno Shioji <[email protected]> wrote: >> >>> I can't comment on how large tuples fare, but about the synchronization, >>> would this not make more sense? >>> >>> InputSpout -> AggregationBolt -> PredictionBolt -> OutputBolt >>> | | >>> \/ | >>> Agg. State | >>> /\ | >>> | V >>> TrainingBolt -----> Model State >>> >>> I.e. AggregationBolt writes to AggregationState, which is polled by >>> TrainingBolt, which writes to ModelState. ModelState is then polled by >>> PredictionBolt. >>> >>> This way, you can get rid of the large tuples as well and use instead >>> something like S3 for these large states. >>> >>> >>> >>> >>> >>> On Wed, Feb 26, 2014 at 11:02 AM, Klausen Schaefersinho < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have a topology which process events and aggregates them in some form >>>> and performs some prediction based on a machine learning (ML) model. Every >>>> x events the one of the bolt involved in the normal processing emit an >>>> "trainModel" event, which is routed to a bolt which is just dedicated to >>>> the training. One the training is done, the new model should be send back >>>> to the prediction bolt. The topology looks like: >>>> >>>> >>>> InputSpout -> AggregationBolt -> PredictionBolt -> OutputBolt >>>> | /\ >>>> \/ | >>>> TrainingBolt -------------+ >>>> >>>> >>>> The model can get quite large (> 100 mb) so I am not sure how this >>>> would impact the performance of my cluster. Does anybody has experiences >>>> with transmitting large messages? >>>> >>>> Also the training might take a while, so the aggregation bolt should >>>> not trigger the training bolt if he is busy. Is there an established >>>> patterns how to archive this kind of synchronization? I could have some >>>> streams to send states, but then I would mix data stream with control >>>> stream, what I really would like to avoid. An alternative would be use >>>> ZooKeeper and perform the synchronization there. Lats but not least I could >>>> also make make the aggregation bolt into a data base and have the training >>>> bolt periodically wake up and read the data base. Does anybody has >>>> experience with such a setup? >>>> >>>> Kind Regards, >>>> >>>> Klaus >>>> >>>> >>> >> >
