James - I believe the complication for me is both the number of objects as well as the number of processors the data goes through. I talked with a few people and it sounds like NIFI writes each event out disk and then executes a commit, which really does have a major impact on the performance. I don't have the liberty of resolving the disk performance, though I think I will try moving the journals directory to /dev/shm. I know on reboot I'll loose data, but that is just like 1-2 times a year, so I think that loss is acceptable. Also, I'm not specifying anything on what data get's indexed so it's what ever the default is.
If I'm producing about 6000 (just a guess, though I think it's pretty large) events per second, it would be nice if there was an option not to perform a commit on every one of the 6000 items. In reality, I would say a commit should never occur more than once a second and that is likely way too often. Last, is there a way to measure the actual provenance events going through as I'm guessing on what it's actually doing here. brett On Fri, Sep 30, 2016 at 2:16 PM, James Wing <[email protected]> wrote: > Brett, > > The default provenance store, PersistentProvenanceRepository, does > require I/O in proportion to flowfile events. Flowfiles with many > attributes, especially large attributes, are a frequent contributor to > provenance overload because attribute state is tracked in provenance > events. But this is different from flowfile content reads and writes, > which use the separate content repository. You might consider moving the > provenance repository to a separate disk for additional I/O capacity. > > Does this sound relevant? Can you share some details of your flow volumes > and attribute sizes? > > nifi.provenance.repository.buffer.size is only used by the > VolatileProvenanceRepository implementation, an in-memory provenance > store. The property defines the size of the in-memory store. The volatile > store can avoid disk I/O issues, but at the expense of reduced provenance > functionality. > > Thanks, > > James > > On Thu, Sep 29, 2016 at 1:37 PM, Brett Tiplitz < > [email protected]> wrote: > >> I'm having a throughput problem when processing data with Provenance >> recording enabled. I've pretty much disabled it, so I believe that is the >> source of my issue. On occasion, I get a message saying the flow is >> slowing due to provenance recording. I was running the out of the box >> configuration for provenance. >> >> I believe the issue might be related to commit writes, though it's just a >> theory. There is a variable nifi.provenance.repository.buffer.size, >> though I don't see anything about what that does. >> >> Any suggestions ? >> >> thanks, >> >> brett >> >> -- >> Brett Tiplitz >> Systolic, Inc >> > > -- Brett Tiplitz Systolic, Inc
