Thanks, Joe. Is this what you were referring to? https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#reporting-tasks
I think a coworker took a stab at one already, so I'll have to look into that. On Wed, Aug 23, 2017 at 4:05 PM, Joe Witt <[email protected]> wrote: > Mike > > The lifecycle of provenance data today is independent of the lifecycle > of the flowfile. They are separate repositories. The built in repo > makes it easy for us to support click to content, replay, following > the detailed lineage of an object through the flow in a nice > integrated way. > > That said the built in provenance repository we have though can retain > for days weeks maybe months but you're right for longer term retention > it should be sent elsewhere. This is why we offer the ReportingTask > API so that you can grab the events and stream them elsewhere. Common > places I've seen people send this data are to HDFS, HBase, Accumulo, > etc.. > > Hopefully that gives some ideas/direction to head in. Definitely want > to hear more about what you're thinking and where you're headed. This > data is very useful for sure. > > Joe > > On Wed, Aug 23, 2017 at 4:01 PM, Mike Thomsen <[email protected]> > wrote: > > Does anyone have any experience persisting provenance beyond the > lifecycle > > of a flowfile? The high level use case I have in mind is some sort of > > traceability database or index where the provenance events of every datum > > that comes in gets sent. > > > > Thanks, > > > > Mike >
