If there's no need for high volume and CEP capability in your bolts,
perhaps you'd want to augment the architecture with something like Apache
NiFi. You can dynamically add new connectors, be it Kafka or some RDBMS
source.

On Fri, May 18, 2018, 1:58 PM Abdul Samad <[email protected]> wrote:

> Let me try and explain my architecture to the best of my ability.
> We have a kafka cluster which receives streams of data via different
> sources and that data is then consumed via a Kafka Spout. Then the bolts
> subscribe to the kafka spout and receive data which is furthur processed
> and dumped to external destinations.
> Our problem is that we want to dynamically add bolts at runtime which
> subscribe to our Kafka Spout.
>
> Sources -> Kafka -> Kakfa Spout -> Bolt(s)
>
> We need some way to add bolts dynamically. If this is not possible then
> one solution then comes to my mind is to create topologies at runtime.
>
> On Fri, May 18, 2018 at 9:03 PM Bobby Evans <[email protected]> wrote:
>
>> I mean some processing relies on a Fields grouping or something similar
>> to get the correct result.
>>
>> For example word count out of the box does not work being split up into
>> multiple different topologies.  Each topology would only see a subset of
>> all of the data.
>>
>> But you can work around this in most cases.  If you know that to get the
>> total count for a word you can ask each of the topologies separately and
>> sum the results.  Or if you are doing a streaming join you can have an
>> external database that holds the data you are joining on, and possibly have
>> a write through cache locally if you are OK with some staleness in the data
>> from other topologies.
>>
>> It is not a perfect solution to all situations, but it can be helpful.
>>
>> Another thing that people do is to split up a larger topology into
>> smaller ones with kafka or some other pub-sub in between.  That gives you
>> more flexibility in adding, removing or changing processing steps.
>>
>> - Bobby
>>
>> On Fri, May 18, 2018 at 10:12 AM Abdul Samad <[email protected]>
>> wrote:
>>
>>> What do u mean by “global view of all of the data within storm” ?
>>>
>>> On Fri, May 18, 2018 at 8:08 PM Bobby Evans <[email protected]> wrote:
>>>
>>>> Depending on how you need to process your data it is possible to have
>>>> multiple copies of a topology running at a time, but they will all need to
>>>> have different topology names.
>>>>
>>>> We do this regularly to help with rolling upgrades and testing new
>>>> setups.  Some of the teams that we work with will have 10 copies of the
>>>> same topology each of which is part of the same kafka consumer group.  The
>>>> teams will then kill one of the topologies, let it drain out completely,
>>>> and then launch a new one to take its place.
>>>>
>>>> This only works if your processing does not require to have a global
>>>> view of all of the data within storm.
>>>>
>>>> - Bobby
>>>>
>>>> On Fri, May 18, 2018 at 8:06 AM Abdul Samad <[email protected]>
>>>> wrote:
>>>>
>>>>> Is it possible to add a new topology on runtime, given the previous
>>>>> topologies are in place?
>>>>>
>>>>> On Fri, May 18, 2018 at 6:01 PM Stig Rohde Døssing <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> That's not possible as far as I know. You can route tuples based on
>>>>>> the tuple content, but you can't restructure the topology while it is
>>>>>> running. Consider elaborating on what you're trying to accomplish, that 
>>>>>> way
>>>>>> it's easier to suggest a solution.
>>>>>>
>>>>>> 2018-05-18 12:47 GMT+02:00 Abdul Samad <[email protected]>:
>>>>>>
>>>>>>> Hi,
>>>>>>> I am in the initial stage of designing a data pipeline for a client
>>>>>>> and I'd like to know as of now is it possible that I add/delete/update
>>>>>>> bolts to an existing topology based on a user input?
>>>>>>> Thanks.
>>>>>>> Abdul Samad
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Sent from Gmail Mobile
>>>>>
>>>> --
>>> Sent from Gmail Mobile
>>>
>> --
> Sent from Gmail Mobile
>

Reply via email to