Re: Data Provenance @scale in Nifi

milind parikh Thu, 07 Jul 2016 08:21:12 -0700

Hi Bryan

Thanks. This helps!


Regards
Milind
On Jul 7, 2016 8:13 AM, "Bryan Bende" <[email protected]> wrote:

> Milind,
>
> I'm not sure if I understand the question correctly, but are you asking
> how to find a specific provenance event beyond the 1,000 most recent that
> are displayed when loading the provenance view?
>
> If so, there is a Search button in the top right of the Provenance window
> that brings up a search window to search on specific fields or time ranges.
>
> The fields available to search on can be customized in nifi.properties
> through the following:
>
> # Comma-separated list of fields. Fields that are not indexed will not be
> searchable. Valid fields are:
> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
> AlternateIdentifierURI, Relationship, Details
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,
> Filename, ProcessorID
>
> # FlowFile Attributes that should be indexed and made searchable
> nifi.provenance.repository.indexed.attributes=twitter.msg, language
>
> In the above example, the attributes twitter.msg and language are
> attributes that are being extracted from tweets using EvalueJSONPath.
>
> Does this help?
>
> -Bryan
>
> On Thu, Jul 7, 2016 at 1:27 AM, milind parikh <[email protected]>
> wrote:
>
>> I am relatively new to Nifi. I have written a processor in Java for Nifi
>> ( which gives you an understanding of my knowledge about nifi; which is
>> little)
>>
>> I have a scenario where there are about 100k flow files a day
>> representing about 100m records; which needs to be aggregated across 1m
>> data points across 100 dimensions.
>>
>> If in my architecture, I split the initial flow file into records and
>> write them into Kafka for 1000 records per flow file and read in parallel,
>> how do I do data provenance of the aggregated values.
>>
>> The use case that I am interested in is showing how one of the data
>> points ( out  of 1m) arrived at the daily aggregated value for an average
>> of 100 records coming out of very few of the 100k files.
>>
>> I can't expand the data provenance through the UI (1000 initial records )
>> and THEN through 1m data points OR traverse through 1 m data points in the
>> UI as my starting point.
>>
>> I know the exact reference of the data point ( it's truncated version of
>> the sha1 of a complex but unique datapoint string).
>>
>> Is there a command line equivalent of the UI that can be more precisely
>> targeted for one data point?
>>
>> Thanks
>> Milind
>>
>
>

Re: Data Provenance @scale in Nifi

Reply via email to