My company is using storm for various stream-processing solutions, mostly ingesting data from Kafka topics. We have chosen to implement our topologies in Scala, using APIs like Tormenta and Summingbird in the mix as well. We have about 9-10 topologies running in production as we speak.
I find tons of useful information about Storm in general, but VERY little about how folks are managing the deployment, git repos, etc. Currently we have all of these topologies in the same GIT repo, with a main-class for each topology, allowing us to run them locally or remotely. Some of this code shares common components - we try to reuse some bolts we have written, and other dependencies cross topologies as well. So in our CI environment, we build an assembly jar using SBT containing all topologies and use storm jar command to deploy that jar N-times (N = number of topologies). We have functional tests that are run by Jenkins after each topology deployment to exercise the functionality of said topology. Given the number of topologies in our catalog, this is starting to become cumbersome in the current state, with the feedback loop from git push thru deployment-test getting longer and more unwieldy. The whole thing is starting to remind me too much of my Java EE container days with multiple EAR files or WAR files deployed in a cluster of WebSphere boxes (UGH!!!). I say all of that to frame the question of how folks are managing a similar situations/deployments. There has been some thought around breaking up the git repo into multiple repos. Or maybe a git repo with a parent SBT project, with subproject(s) for common components and 1 subproject per topology. I am interested to hear any thoughts or be pointed to any resources that have been helpful to others.
