Re: provenance

Brett Tiplitz Wed, 05 Oct 2016 08:39:03 -0700

James -

I believe the complication for me is both the number of objects as well as
the number of processors the data goes through.  I talked with a few people
and it sounds like NIFI writes each event out disk and then executes a
commit, which really does have a major impact on the performance.  I don't
have the liberty of resolving the disk performance, though I think I will
try moving the journals directory to /dev/shm.  I know on reboot I'll loose
data, but that is just like 1-2 times a year, so I think that loss is
acceptable.  Also, I'm not specifying anything on what data get's indexed
so it's what ever the default is.


If I'm producing about 6000 (just a guess, though I think it's pretty
large) events per second, it would be nice if there was an option not to
perform a commit on every one of the 6000 items.  In reality, I would say a
commit should never occur more than once a second and that is likely way
too often.

Last, is there a way to measure the actual provenance events going through
as I'm guessing on what it's actually doing here.

brett

On Fri, Sep 30, 2016 at 2:16 PM, James Wing <[email protected]> wrote:

> Brett,
>
> The default provenance store, PersistentProvenanceRepository, does
> require I/O in proportion to flowfile events.  Flowfiles with many
> attributes, especially large attributes, are a frequent contributor to
> provenance overload because attribute state is tracked in provenance
> events.  But this is different from flowfile content reads and writes,
> which use the separate content repository.  You might consider moving the
> provenance repository to a separate disk for additional I/O capacity.
>
> Does this sound relevant?  Can you share some details of your flow volumes
> and attribute sizes?
>
> nifi.provenance.repository.buffer.size is only used by the
> VolatileProvenanceRepository implementation, an in-memory provenance
> store.  The property defines the size of the in-memory store.  The volatile
> store can avoid disk I/O issues, but at the expense of reduced provenance
> functionality.
>
> Thanks,
>
> James
>
> On Thu, Sep 29, 2016 at 1:37 PM, Brett Tiplitz <
> [email protected]> wrote:
>
>> I'm having a throughput problem when processing data with Provenance
>> recording enabled.  I've pretty much disabled it, so I believe that is the
>> source of my issue.  On occasion, I get a message saying the flow is
>> slowing due to provenance recording.  I was running the out of the box
>> configuration for provenance.
>>
>> I believe the issue might be related to commit writes, though it's just a
>> theory.  There is a variable nifi.provenance.repository.buffer.size,
>> though I don't see anything about what that does.
>>
>> Any suggestions ?
>>
>> thanks,
>>
>> brett
>>
>> --
>> Brett Tiplitz
>> Systolic, Inc
>>
>
>


-- 
Brett Tiplitz
Systolic, Inc

Re: provenance

Reply via email to