Greetings, I have seen some discussions on this topic online but nothing conclusive yet. I wanted to ask if there is any way to change the wiring of spouts and bolts in runtime, my intention is to make a platform where users can subscribe to different streams of data(Twitter, YT or whatever they want) and then apply to it a given filter, I thought I could implement it on Apache Storm but it seems to be more of a framework thought to deploy a single behaviour topology for each cluster. Instead, I think that for me would work better if I could implement bolts which do an specific filtering task(blocking a specific word, blocking tweets with less than N retweets, etc) and then wiring them according to what the user wants(some user may want to use some filtering feature while others don't). The thing is that given a configuration(by the user), I would like to change the wiring of the topology. And also the bolts should be able to read some parameters given by the user(like the word to block, or the number N used to block tweets with less than N retweets).
I have found a framework built on top of Storm called Flux that lets you change the wiring in runtime, the doubt I have is that if I use this I should provide every user with a different cluster/topology, that they could configure. But some problems will appear with this implementation: 1)If I wanted to add a new filtering feature, adding a new bolt to my topology, I would have to redeploy every single user's cluster. ¿Would this be a problem? 2)Would it have any sense that users will not share topologies? I mean if it would be killing flies with cannons, since probably most of the clusters would not be getting a big data size stream. 3)Scalability, How could I manage which bolts deploy where, in terms of expected performance demand, and how could I move them or give them more resources so they can cope with the stream they are receiving? I think that maybe these could be fixed with docker swarm so it would give me more flexibility but I do not know if there is another framework of real time stream processing that would fit my necessities better or other containering tool or other way of facing the problem that would help me. I appreciate any suggestion or help, I am pretty stuck right now. PD: I have checked distributed RPC but still in doubt. Thanks a lot, Alberto
