I can't comment on how large tuples fare, but about the synchronization,
would this not make more sense?

InputSpout -> AggregationBolt -> PredictionBolt -> OutputBolt
             |             |
                          \/                           |
                       Agg. State                |
            /\             |
                          |                            V
                       TrainingBolt -----> Model State

I.e. AggregationBolt writes to AggregationState, which is polled by
TrainingBolt, which writes to ModelState. ModelState is then polled by
PredictionBolt.

This way, you can get rid of the large tuples as well and use instead
something like S3 for these large states.





On Wed, Feb 26, 2014 at 11:02 AM, Klausen Schaefersinho <
[email protected]> wrote:

> Hi,
>
> I have a topology which process events and aggregates them in some form
> and performs some prediction based on a machine learning (ML) model. Every
> x events the one of the bolt involved in the normal processing emit an
> "trainModel" event, which is routed to a bolt which is just dedicated to
> the training. One the training is done, the new model should be send back
> to the prediction bolt. The topology looks like:
>
>
> InputSpout -> AggregationBolt -> PredictionBolt -> OutputBolt
>              |             /\
>                           \/                           |
>                        TrainingBolt -------------+
>
>
> The model can get quite large (> 100 mb) so I am not sure how this would
> impact the performance of my cluster.  Does anybody has experiences with
> transmitting large messages?
>
> Also the training might take a while, so the aggregation bolt should not
> trigger the training bolt if he is busy. Is there an established patterns
> how to archive this kind of synchronization? I could have some streams to
> send states, but then I would mix data stream with control stream, what I
> really would like to avoid. An alternative would be use ZooKeeper and
> perform the synchronization there. Lats but not least I could also make
> make the aggregation bolt into a data base and have the training bolt
> periodically wake up and read the data base. Does anybody has experience
> with such a setup?
>
> Kind Regards,
>
> Klaus
>
>

Reply via email to