Hi Bryan Thanks. This helps!
Regards Milind On Jul 7, 2016 8:13 AM, "Bryan Bende" <[email protected]> wrote: > Milind, > > I'm not sure if I understand the question correctly, but are you asking > how to find a specific provenance event beyond the 1,000 most recent that > are displayed when loading the provenance view? > > If so, there is a Search button in the top right of the Provenance window > that brings up a search window to search on specific fields or time ranges. > > The fields available to search on can be customized in nifi.properties > through the following: > > # Comma-separated list of fields. Fields that are not indexed will not be > searchable. Valid fields are: > # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, > AlternateIdentifierURI, Relationship, Details > nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, > Filename, ProcessorID > > # FlowFile Attributes that should be indexed and made searchable > nifi.provenance.repository.indexed.attributes=twitter.msg, language > > In the above example, the attributes twitter.msg and language are > attributes that are being extracted from tweets using EvalueJSONPath. > > Does this help? > > -Bryan > > On Thu, Jul 7, 2016 at 1:27 AM, milind parikh <[email protected]> > wrote: > >> I am relatively new to Nifi. I have written a processor in Java for Nifi >> ( which gives you an understanding of my knowledge about nifi; which is >> little) >> >> I have a scenario where there are about 100k flow files a day >> representing about 100m records; which needs to be aggregated across 1m >> data points across 100 dimensions. >> >> If in my architecture, I split the initial flow file into records and >> write them into Kafka for 1000 records per flow file and read in parallel, >> how do I do data provenance of the aggregated values. >> >> The use case that I am interested in is showing how one of the data >> points ( out of 1m) arrived at the daily aggregated value for an average >> of 100 records coming out of very few of the 100k files. >> >> I can't expand the data provenance through the UI (1000 initial records ) >> and THEN through 1m data points OR traverse through 1 m data points in the >> UI as my starting point. >> >> I know the exact reference of the data point ( it's truncated version of >> the sha1 of a complex but unique datapoint string). >> >> Is there a command line equivalent of the UI that can be more precisely >> targeted for one data point? >> >> Thanks >> Milind >> > >
