Mike The lifecycle of provenance data today is independent of the lifecycle of the flowfile. They are separate repositories. The built in repo makes it easy for us to support click to content, replay, following the detailed lineage of an object through the flow in a nice integrated way.
That said the built in provenance repository we have though can retain for days weeks maybe months but you're right for longer term retention it should be sent elsewhere. This is why we offer the ReportingTask API so that you can grab the events and stream them elsewhere. Common places I've seen people send this data are to HDFS, HBase, Accumulo, etc.. Hopefully that gives some ideas/direction to head in. Definitely want to hear more about what you're thinking and where you're headed. This data is very useful for sure. Joe On Wed, Aug 23, 2017 at 4:01 PM, Mike Thomsen <mikerthom...@gmail.com> wrote: > Does anyone have any experience persisting provenance beyond the lifecycle > of a flowfile? The high level use case I have in mind is some sort of > traceability database or index where the provenance events of every datum > that comes in gets sent. > > Thanks, > > Mike