My company is using storm for various stream-processing solutions, mostly 
ingesting data from Kafka topics. We have chosen to implement our topologies in 
Scala, using APIs like Tormenta and Summingbird in the mix as well. We have 
about 9-10 topologies running in production as we speak.

I find tons of useful information about Storm in general, but VERY little about 
how folks are managing the deployment, git repos, etc.

Currently we have all of these topologies in the same GIT repo, with a 
main-class for each topology, allowing us to run them locally or remotely. Some 
of this code shares common components - we try to reuse some bolts we have 
written, and other dependencies cross topologies as well. 

So in our CI environment, we build an assembly jar using SBT containing all 
topologies and use storm jar command to deploy that jar N-times (N = number of 
topologies). We have functional tests that are run by Jenkins after each 
topology deployment to exercise the functionality of said topology. Given the 
number of topologies in our catalog, this is starting to become cumbersome in 
the current state, with the feedback loop from git push thru deployment-test 
getting longer and more unwieldy. The whole thing is starting to remind me too 
much of my Java EE container days with multiple EAR files or WAR files deployed 
in a cluster of WebSphere boxes (UGH!!!).

I say all of that to frame the question of how folks are managing a similar 
situations/deployments. There has been some thought around breaking up the git 
repo into multiple repos. Or maybe a git repo with a parent SBT project, with 
subproject(s) for common components and 1 subproject per topology.

I am interested to hear any thoughts or be pointed to any resources that have 
been helpful to others. 

Reply via email to