Hi,

From what I have learned, this is how I envision to release and restart the 
NiFi data flow using a build pipeline like Jenkins.

1. Using Git repository to version control the flow.xml.gz file (or the 
uncompressed flow.xml file).  
2. To make a change to the data flow, a dev can check out the git repo, modify 
the nifi.properties file, nifi.flow.configuration.file, to point to the 
flow.xml.gz file in the local git repo.  After, the dev pushes the changes to 
the Git repo.
3. In release, a Jenkins job can remotely stop NiFi, push the flow.xml.gz file 
from the Git repo to the desired location, and the restart NiFi.  As far as I 
know, the UUID of various elements, processors, processor groups, connectors do 
not change. The same flow.xml.gz file can also be pushed into different nodes 
if there is a cluster. 

Can this arrangement work? Any suggestions are welcome.

The questions I have are as follows:

1. Is there an more elegant way of stopping the data flow before release?  
Presumably, the data flow is working. Stopping the NiFi server is one way.  
Stopping the data flow is another.  Is the more elegant way to stop the initial 
processor(s), detect that there is no more queue in the entire workflow, and 
then stop the NiFi server?

2. When the new flow.xml.gz file is in place and the NiFi server is restarted, 
how do we start the data flow without doing it manually?  I can see an approach 
like this: unpack flow.xml.gz, make sure the value of all the “schedulingState” 
attributes of the XM file is running, compress it again before releasing it.  
Is there a better way?

3. Is the new data flow able to automatically pick up what is left before the 
release?  Any good practices of maintaining the state of the data flow?

Thanks,

Huagen

Reply via email to