Yep exactly.

And if you go here [1] you can see this template [2] that takes its
own provenance stream then does record conversion.  It gets it from a
reporting task which sends provenance events out via site to site.  So
you can imagine this means you could send those events any number of
places processors let you.  Or you can implement your own reporting
task. You can even script a reporting task on the fly.

[1] https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
[2] 
https://cwiki.apache.org/confluence/download/attachments/57904847/Provenance_Stream_Record_ReadWrite.xml?api=v2

Thanks
Joe

On Wed, Aug 23, 2017 at 4:11 PM, Mike Thomsen <[email protected]> wrote:
> Thanks, Joe. Is this what you were referring to?
> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#reporting-tasks
>
> I think a coworker took a stab at one already, so I'll have to look into
> that.
>
> On Wed, Aug 23, 2017 at 4:05 PM, Joe Witt <[email protected]> wrote:
>>
>> Mike
>>
>> The lifecycle of provenance data today is independent of the lifecycle
>> of the flowfile.  They are separate repositories. The built in repo
>> makes it easy for us to support click to content, replay, following
>> the detailed lineage of an object through the flow in a nice
>> integrated way.
>>
>> That said the built in provenance repository we have though can retain
>> for days weeks maybe months but you're right for longer term retention
>> it should be sent elsewhere.  This is why we offer the ReportingTask
>> API so that you can grab the events and stream them elsewhere. Common
>> places I've seen people send this data are to HDFS, HBase, Accumulo,
>> etc..
>>
>> Hopefully that gives some ideas/direction to head in.  Definitely want
>> to hear more about what you're thinking and where you're headed.  This
>> data is very useful for sure.
>>
>> Joe
>>
>> On Wed, Aug 23, 2017 at 4:01 PM, Mike Thomsen <[email protected]>
>> wrote:
>> > Does anyone have any experience persisting provenance beyond the
>> > lifecycle
>> > of a flowfile? The high level use case I have in mind is some sort of
>> > traceability database or index where the provenance events of every
>> > datum
>> > that comes in gets sent.
>> >
>> > Thanks,
>> >
>> > Mike
>
>

Reply via email to