Greetings, Situation: I have a set of operations that I need to perform on my data in a particular order. For every operation, I am making processor. I am organising different functionalities in seperate process groups, and connecting the process groups through input/output ports.
Now I need to build a dashboard to display a bunch of information about this 'flow'. The information I need to display on the dashboard : 1. The path of the flow - The 'path' of processors that the data went through in a particular flow ( Every time I start the respective process group). I also need to record this data,sorted by time, and show a 'flow history' of all the previous flows. 2. Wether a particular flow was succesful or not. This is the design I came up with to get all of this information : Step 1. Find the First processor for a particular process group. To do this, I will hit the respective process group's API http://localhost:8080/nifi-api/processors/{ID}, and then check which processor has 0 'readbytes' and some 'writebytes', or check for input ports, and find out the processor which has a connection with the input group. Step 2. To find the 'processor path' of my flow files, I will check the connections of the process group, and map the source and destination processors for the connections. I can't completely figure out how to find wether a particular flow succeeded or failed, and how to record the flow history of the previous flows. To get the success/failure status, I was thinking I can find out the flowFilesIn and flowFilesOut properties of every processor and match them to find out how many files went through the success processor, and how many went through the failure processor. But this approach can fail because the FlowFilesIn and FlowFilesOut numbers from the previous runs would be added up too. ( Unless I can clear that data everytime I start my process group ?) The other approach I thought of for getting this is to find out the data provenance for the respective processors and based on the provenance event, figure out wether the processor did what it is supposed to do ( Haven't completely figured this out ). Would really appreciate some help for this. My biggest problem right now is how to get the 'flow history' of my process groups. I need to segregrate and record my flows by time. I am not able to figure out how exactly to accomplish this. One way that I thought of is to record the statsLastRefreshed of every processor, and use those timestamps to construct a timeline for each flow. I think data provenance can be effectively used for this, but I am confused about the provenance events' timestamps, and how to use them for this purpose. Any help on these two issues is much appreciated. Also, would really appreciate some feedback/improvements on my solution, or if you think I need to correct/change something. Thanks, Utkarsh
