Mike

The lifecycle of provenance data today is independent of the lifecycle
of the flowfile.  They are separate repositories. The built in repo
makes it easy for us to support click to content, replay, following
the detailed lineage of an object through the flow in a nice
integrated way.

That said the built in provenance repository we have though can retain
for days weeks maybe months but you're right for longer term retention
it should be sent elsewhere.  This is why we offer the ReportingTask
API so that you can grab the events and stream them elsewhere. Common
places I've seen people send this data are to HDFS, HBase, Accumulo,
etc..

Hopefully that gives some ideas/direction to head in.  Definitely want
to hear more about what you're thinking and where you're headed.  This
data is very useful for sure.

Joe

On Wed, Aug 23, 2017 at 4:01 PM, Mike Thomsen <mikerthom...@gmail.com> wrote:
> Does anyone have any experience persisting provenance beyond the lifecycle
> of a flowfile? The high level use case I have in mind is some sort of
> traceability database or index where the provenance events of every datum
> that comes in gets sent.
>
> Thanks,
>
> Mike

Reply via email to