CM/CI best practices

Greene (US), Geoffrey N Thu, 21 Jan 2021 11:58:17 -0800

I'm trying to figure out some CI/CM best practices.  I want to be able to 
design a flow, test the flow on some test data, then distribute that exact same 
configuration (definitely flows, probably services, and so on)  into 
production.  I may have multiple engineers working in this environment, and I 
want to be able to store my files in a repository, and be able to do standard 
git merge/branches etc.  Of course, you don't want your branch to merge to 
master if it hasn't passed test.  I have already scripted some simple python 
tests that can start nifi, start a flow, and verify output, so I know that CI 
CAN work.
I may choose to go to a clustered solution, too, so I'd want to be able to spin 
up additional cluster nodes if needed.


So what is the recommended way to do this?   Here are some of the options I've 
come up with:


1)      Have a dedicated nifi instance, and only CM the flows (using nifi 
repository).  If I understand this correctly, this means that the configuration 
of nifi itself, would not be CM'd. Im not clear on how services would be 
handled, if a new flow requires an internal service.  I don't like this much, 
since it doesn't seem terribly repeatable, but maybe its how the overall system 
is designed.

2)      Configuration control EVERY file.  This means that as the database 
changes while authoring a flow, new commits from a developer would be required. 
 This seems troublesome, though, as merges would be difficult, and it would be 
difficult to actually tell what changed.  Hopefully no flowfiles would go into 
the repo.

3)      Configuration control SOME of the files (though not all of them) in the 
nifi directory structure.  I'm not clear on which ones though.  Maybe whole 
directories?  A guide would be helpful.

4)      Have one git repository housing the nifi repository (the flows).  Have 
another repository that houses the nifi software.  The repo containing the 
flows would be updated frequently, the one containing the flows would NOT be 
updated as frequently.

5)      Don't do CM at all.  It can't be done.  Rely on backups only.

I'm still struggling with how to maintain some of the custom groovy scripts 
I've written too that are kept on disk.

In any event how do others do this?  Are there any wikis/articles on this?

Thanks for your thoughts
-geoff

CM/CI best practices

Reply via email to