Simon, I feel that " provenance event is emitted for each flowfile for each processor." is accurate understanding "each processor" means the unique processors the flowFile goes through.
The provenance database is a lucene database and 1 million provenance events is not unreasonable. It would have to do with how you configure your NIFI and a best practice is to store your provenance on its own disk. Many tweak able settings for provenance are on nifi.properties [1] [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html On Wed, Apr 19, 2017 at 6:50 AM <[email protected]> wrote: > Hi All, > > In some parts of the NiFi documentation, it is stated that a provenance > event is emitted for each flowfile for each processor. However elsewhere > it is stated that no provenance-event is generated for a flowfile sent > to the “success” output of a processor - which is true? > > And are there mechanisms for reducing the number of provenance events > generated by a NiFi flow? When a dataflow is processing large numbers of > events, it would seem to me that the generation of provenance events > will be the limiting factor for performance. When processing 1 million > records per day, generating 1 million provenance events (or worse) is > not helpful.. > > Thanks in advance, > > Simon >
