Re: Provenance Event performance

2017-04-20 Thread simon
Thanks Joe, Juan, Perhaps it would be useful to be able to generate provenance events for a _sample_ of flowfiles? eg every Nth flowfile created by a "data ingress" (GET* or LISTEN*) processor gets tracked? Or maybe better: every flowfile gets tracked with a probability of N, to ensure that

Re: Provenance Event performance

2017-04-19 Thread Joe Witt
You're right that the generation and indexing of provenance data creates overhead. We've put considerable effort in minimizing that overhead to a point where you should not have to think about it and still get all the powerful user experience/auditing gains it provides. However, when you're

Re: Provenance Event performance

2017-04-19 Thread Juan Sequeiros
Simon, I feel that " provenance event is emitted for each flowfile for each processor." is accurate understanding "each processor" means the unique processors the flowFile goes through. The provenance database is a lucene database and 1 million provenance events is not unreasonable. It would

Provenance Event performance

2017-04-19 Thread simon
Hi All, In some parts of the NiFi documentation, it is stated that a provenance event is emitted for each flowfile for each processor. However elsewhere it is stated that no provenance-event is generated for a flowfile sent to the “success” output of a processor - which is true? And are