I mean some processing relies on a Fields grouping or something similar to get the correct result.
For example word count out of the box does not work being split up into multiple different topologies. Each topology would only see a subset of all of the data. But you can work around this in most cases. If you know that to get the total count for a word you can ask each of the topologies separately and sum the results. Or if you are doing a streaming join you can have an external database that holds the data you are joining on, and possibly have a write through cache locally if you are OK with some staleness in the data from other topologies. It is not a perfect solution to all situations, but it can be helpful. Another thing that people do is to split up a larger topology into smaller ones with kafka or some other pub-sub in between. That gives you more flexibility in adding, removing or changing processing steps. - Bobby On Fri, May 18, 2018 at 10:12 AM Abdul Samad <[email protected]> wrote: > What do u mean by “global view of all of the data within storm” ? > > On Fri, May 18, 2018 at 8:08 PM Bobby Evans <[email protected]> wrote: > >> Depending on how you need to process your data it is possible to have >> multiple copies of a topology running at a time, but they will all need to >> have different topology names. >> >> We do this regularly to help with rolling upgrades and testing new >> setups. Some of the teams that we work with will have 10 copies of the >> same topology each of which is part of the same kafka consumer group. The >> teams will then kill one of the topologies, let it drain out completely, >> and then launch a new one to take its place. >> >> This only works if your processing does not require to have a global view >> of all of the data within storm. >> >> - Bobby >> >> On Fri, May 18, 2018 at 8:06 AM Abdul Samad <[email protected]> >> wrote: >> >>> Is it possible to add a new topology on runtime, given the previous >>> topologies are in place? >>> >>> On Fri, May 18, 2018 at 6:01 PM Stig Rohde Døssing <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> That's not possible as far as I know. You can route tuples based on the >>>> tuple content, but you can't restructure the topology while it is running. >>>> Consider elaborating on what you're trying to accomplish, that way it's >>>> easier to suggest a solution. >>>> >>>> 2018-05-18 12:47 GMT+02:00 Abdul Samad <[email protected]>: >>>> >>>>> Hi, >>>>> I am in the initial stage of designing a data pipeline for a client >>>>> and I'd like to know as of now is it possible that I add/delete/update >>>>> bolts to an existing topology based on a user input? >>>>> Thanks. >>>>> Abdul Samad >>>>> >>>> >>>> -- >>> Sent from Gmail Mobile >>> >> -- > Sent from Gmail Mobile >
