I am relatively new to Nifi. I have written a processor in Java for Nifi ( which gives you an understanding of my knowledge about nifi; which is little)
I have a scenario where there are about 100k flow files a day representing about 100m records; which needs to be aggregated across 1m data points across 100 dimensions. If in my architecture, I split the initial flow file into records and write them into Kafka for 1000 records per flow file and read in parallel, how do I do data provenance of the aggregated values. The use case that I am interested in is showing how one of the data points ( out of 1m) arrived at the daily aggregated value for an average of 100 records coming out of very few of the 100k files. I can't expand the data provenance through the UI (1000 initial records ) and THEN through 1m data points OR traverse through 1 m data points in the UI as my starting point. I know the exact reference of the data point ( it's truncated version of the sha1 of a complex but unique datapoint string). Is there a command line equivalent of the UI that can be more precisely targeted for one data point? Thanks Milind
