Hi Daniela, > Okay, could I do the grouping already in Kafka? For example would it be > possible to use one topic per region or to use one topic with a partition for > every region? Then the messages would already be grouped when the arrive at > Storm. Is this correct?
You would need a kafka spout instance per topic and a separate windowed bolt instance that receives from the corresponding kafka spout. But such a topology would be difficult to manage as the number of topics increases. The other option is to do the grouping within the windowed bolt like I mentioned in the last mail. > Would the windowing and the aggregation for each time window be separated in > two bolts or is both done in one bolt? Separate bolts are not needed for aggregation, it can be done inside the windowed bolt. Thanks, Arun On 3/31/16, 1:23 AM, "Maria Musterfrau" <[email protected]> wrote: >Hi Arun > >Sorry, I did not see your reply in the dev mailing list. Thank you very much! > >Okay, could I do the grouping already in Kafka? For example would it be >possible to use one topic per region or to use one topic with a partition for >every region? Then the messages would already be grouped when the arrive at >Storm. Is this correct? > >Would the windowing and the aggregation for each time window be separated in >two bolts or is both done in one bolt? > >Thank you in advance. > >Regards, >Daniela > > > >Gesendet: Mittwoch, 30. März 2016 um 20:15 Uhr >Von: "Arun Iyer" <[email protected]> >An: "[email protected]" <[email protected]>, "[email protected]" ><[email protected]> >Betreff: Re: Combining group by and time window > >Reposting the reply that was posted to dev mailing list :- > > >For storm core, windowed bolts would give you the tuples in the last minute >but you would have to do the grouping yourself. You could of-course use a >fields grouping to split the load across the windowed bolts. For trident you >might want to take a look at the windowing apis that were added recently and >see if it fits your need. You have to choose between trident and core based on >your use cases, the guarantee you need and if you need batching vs per tuple >processing etc. > >- Arun > > > >From: Maria Musterfrau >Reply-To: "[email protected]" >Date: Wednesday, March 30, 2016 at 10:56 PM >To: "[email protected][[email protected]]" >Subject: Fw: Combining group by and time window > > >Does anyone have an idea? > >Thank you in advance. > >Regards, >Daniela > > >Gesendet: Montag, 28. März 2016 um 21:06 Uhr >Von: "Maria Musterfrau" <[email protected][[email protected]]> >An: [email protected][[email protected]] >Betreff: Combining group by and time window > >Hi, > >I have a stream with time series data from different regions. I would like to >group the stream by the different regions and to add up the values of the last >minute (time window) per region. The sums should be persisted to Redis or >something like this. > >I already found out that Storm Trident provides a group by function to split >the stream. I think this could be useful. >Storm core provides time windows, so I could use it for the aggregation. > >But how can I combine these two components? Or is this not possible? > >Would it be useful to do the grouping already in Kafka (with different topics) >or is it better to do it in Storm > >Thank you in advance. > >Regards, >Daniela >
