Milind, I'm not sure if I understand the question correctly, but are you asking how to find a specific provenance event beyond the 1,000 most recent that are displayed when loading the provenance view?
If so, there is a Search button in the top right of the Provenance window that brings up a search window to search on specific fields or time ranges. The fields available to search on can be customized in nifi.properties through the following: # Comma-separated list of fields. Fields that are not indexed will not be searchable. Valid fields are: # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, Relationship, Details nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID # FlowFile Attributes that should be indexed and made searchable nifi.provenance.repository.indexed.attributes=twitter.msg, language In the above example, the attributes twitter.msg and language are attributes that are being extracted from tweets using EvalueJSONPath. Does this help? -Bryan On Thu, Jul 7, 2016 at 1:27 AM, milind parikh <[email protected]> wrote: > I am relatively new to Nifi. I have written a processor in Java for Nifi ( > which gives you an understanding of my knowledge about nifi; which is > little) > > I have a scenario where there are about 100k flow files a day representing > about 100m records; which needs to be aggregated across 1m data points > across 100 dimensions. > > If in my architecture, I split the initial flow file into records and > write them into Kafka for 1000 records per flow file and read in parallel, > how do I do data provenance of the aggregated values. > > The use case that I am interested in is showing how one of the data points > ( out of 1m) arrived at the daily aggregated value for an average of 100 > records coming out of very few of the 100k files. > > I can't expand the data provenance through the UI (1000 initial records ) > and THEN through 1m data points OR traverse through 1 m data points in the > UI as my starting point. > > I know the exact reference of the data point ( it's truncated version of > the sha1 of a complex but unique datapoint string). > > Is there a command line equivalent of the UI that can be more precisely > targeted for one data point? > > Thanks > Milind >
