Re: Data Provenance @scale in Nifi

Bryan Bende Thu, 07 Jul 2016 08:14:00 -0700

Milind,

I'm not sure if I understand the question correctly, but are you asking how
to find a specific provenance event beyond the 1,000 most recent that are
displayed when loading the provenance view?

If so, there is a Search button in the top right of the Provenance window
that brings up a search window to search on specific fields or time ranges.

The fields available to search on can be customized in nifi.properties
through the following:

# Comma-separated list of fields. Fields that are not indexed will not be
searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,
Filename, ProcessorID

# FlowFile Attributes that should be indexed and made searchable
nifi.provenance.repository.indexed.attributes=twitter.msg, language

In the above example, the attributes twitter.msg and language are
attributes that are being extracted from tweets using EvalueJSONPath.

Does this help?

-Bryan

On Thu, Jul 7, 2016 at 1:27 AM, milind parikh <[email protected]>
wrote:

> I am relatively new to Nifi. I have written a processor in Java for Nifi (
> which gives you an understanding of my knowledge about nifi; which is
> little)
>
> I have a scenario where there are about 100k flow files a day representing
> about 100m records; which needs to be aggregated across 1m data points
> across 100 dimensions.
>
> If in my architecture, I split the initial flow file into records and
> write them into Kafka for 1000 records per flow file and read in parallel,
> how do I do data provenance of the aggregated values.
>
> The use case that I am interested in is showing how one of the data points
> ( out  of 1m) arrived at the daily aggregated value for an average of 100
> records coming out of very few of the 100k files.
>
> I can't expand the data provenance through the UI (1000 initial records )
> and THEN through 1m data points OR traverse through 1 m data points in the
> UI as my starting point.
>
> I know the exact reference of the data point ( it's truncated version of
> the sha1 of a complex but unique datapoint string).
>
> Is there a command line equivalent of the UI that can be more precisely
> targeted for one data point?
>
> Thanks
> Milind
>

Re: Data Provenance @scale in Nifi

Reply via email to