I'm trying to figure out some CI/CM best practices. I want to be able to design a flow, test the flow on some test data, then distribute that exact same configuration (definitely flows, probably services, and so on) into production. I may have multiple engineers working in this environment, and I want to be able to store my files in a repository, and be able to do standard git merge/branches etc. Of course, you don't want your branch to merge to master if it hasn't passed test. I have already scripted some simple python tests that can start nifi, start a flow, and verify output, so I know that CI CAN work. I may choose to go to a clustered solution, too, so I'd want to be able to spin up additional cluster nodes if needed.
So what is the recommended way to do this? Here are some of the options I've come up with: 1) Have a dedicated nifi instance, and only CM the flows (using nifi repository). If I understand this correctly, this means that the configuration of nifi itself, would not be CM'd. Im not clear on how services would be handled, if a new flow requires an internal service. I don't like this much, since it doesn't seem terribly repeatable, but maybe its how the overall system is designed. 2) Configuration control EVERY file. This means that as the database changes while authoring a flow, new commits from a developer would be required. This seems troublesome, though, as merges would be difficult, and it would be difficult to actually tell what changed. Hopefully no flowfiles would go into the repo. 3) Configuration control SOME of the files (though not all of them) in the nifi directory structure. I'm not clear on which ones though. Maybe whole directories? A guide would be helpful. 4) Have one git repository housing the nifi repository (the flows). Have another repository that houses the nifi software. The repo containing the flows would be updated frequently, the one containing the flows would NOT be updated as frequently. 5) Don't do CM at all. It can't be done. Rely on backups only. I'm still struggling with how to maintain some of the custom groovy scripts I've written too that are kept on disk. In any event how do others do this? Are there any wikis/articles on this? Thanks for your thoughts -geoff
