I agree - good point :)

It is possible indexing was stuck with the older implementation.  Can
you run   'bin/nifi.sh dump' and share the logs/nifi-bootstrap.log
file if it is in that state/behavior again?

Thanks

On Tue, May 22, 2018 at 1:33 PM, Tim Dean <[email protected]> wrote:
> Thanks Joe - I’ll try those changes and report back with the results.
>
> Just out of curiosity, if my problem is happening because I am generating 
> more than 1 GB of provenance data, wouldn’t I expect to see the older 
> provenance data being deleted leaving the newer provenance data in tact? It 
> seems to me that my old data is still there and my new data is not.
>
> -Tim
>
>
>> On May 22, 2018, at 12:15 PM, Joe Witt <[email protected]> wrote:
>>
>> Tim
>>
>> Got ya.  So yeah keep in mind you'll only have at most 1GB of prov
>> data and for at most 24 hours with that configuration.  Also, as James
>> mentioned the default searching for provenance can be too restrictive
>> and you have to pay close attention to time stamps relative to the
>> system doing the query/etc..  In general though it should work just
>> fine.
>>
>> 1) definitely use the newer provenance.  We need to change the default
>> as the new one is very fast and very stable.
>>
>> To do this change
>>
>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
>> to
>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
>>
>> 2) Change retention period and size values such as
>>
>> nifi.provenance.repository.max.storage.time=72 hours
>> nifi.provenance.repository.max.storage.size=50 GB
>>
>> There are some other tweaks you can do in terms of
>> threads/sharding/etc.. that help with performance but the above are
>> good to do now regardless of performance.
>>
>> Thanks
>>
>> On Tue, May 22, 2018 at 10:50 AM, Tim Dean <[email protected]> wrote:
>>> Thanks Joe:
>>>
>>> I have not yet made any changes to the configuration. We are just beginning
>>> the process of running out flow at scale and figuring out how to best
>>> optimize the configuration, and I plan to make changes as needed once we can
>>> get the flow functionally correct. Right now I’m having difficulty doing
>>> that because the lack of provenance events.
>>>
>>> Here is the provenance-related properties I have in my nifi.properties file:
>>>
>>> # Provenance Repository Properties
>>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
>>> nifi.provenance.repository.debug.frequency=1_000_000
>>> nifi.provenance.repository.encryption.key.provider.implementation=
>>> nifi.provenance.repository.encryption.key.provider.location=
>>> nifi.provenance.repository.encryption.key.id=
>>> nifi.provenance.repository.encryption.key=
>>>
>>> # Persistent Provenance Repository Properties
>>> nifi.provenance.repository.directory.default=./provenance_repository
>>> nifi.provenance.repository.max.storage.time=24 hours
>>> nifi.provenance.repository.max.storage.size=1 GB
>>> nifi.provenance.repository.rollover.time=30 secs
>>> nifi.provenance.repository.rollover.size=100 MB
>>> nifi.provenance.repository.query.threads=2
>>> nifi.provenance.repository.index.threads=2
>>> nifi.provenance.repository.compress.on.rollover=true
>>> nifi.provenance.repository.always.sync=false
>>> nifi.provenance.repository.journal.count=16
>>> # Comma-separated list of fields. Fields that are not indexed will not be
>>> searchable. Valid fields are:
>>> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
>>> AlternateIdentifierURI, Relationship, Details
>>> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename,
>>> ProcessorID, Relationship
>>> # FlowFile Attributes that should be indexed and made searchable.  Some
>>> examples to consider are filename, uuid, mime.type
>>> nifi.provenance.repository.indexed.attributes=
>>> # Large values for the shard size will result in more Java heap usage when
>>> searching the Provenance Repository
>>> # but should provide better performance
>>> nifi.provenance.repository.index.shard.size=500 MB
>>> # Indicates the maximum length that a FlowFile attribute can be when
>>> retrieving a Provenance Event from
>>> # the repository. If the length of any attribute exceeds this value, it will
>>> be truncated when the event is retrieved.
>>> nifi.provenance.repository.max.attribute.length=65536
>>> nifi.provenance.repository.concurrent.merge.threads=2
>>> nifi.provenance.repository.warm.cache.frequency=1 hour
>>>
>>> # Volatile Provenance Respository Properties
>>> nifi.provenance.repository.buffer.size=100000
>>>
>>>
>>> Thanks for any help you can provide on this
>>>
>>> -Tim
>>>
>>> On May 21, 2018, at 11:23 PM, Joe Witt <[email protected]> wrote:
>>>
>>> Tim,
>>>
>>> The default configuration for provenance event retention is
>>> potentially a factor.
>>>
>>> Did you make any changes to those?  Can you share relevant segments
>>> from the nifi.properties file?
>>>
>>> Thanks
>>>
>>> On Mon, May 21, 2018 at 8:32 PM, Tim Dean <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> I am having a hard time troubleshooting a NiFi flow to see where things are
>>> failing. I am trying to look at the provenance repository for a variety of
>>> processors, but for some reason nothing more recent seems to be appearing
>>> there. For example:
>>>
>>> At approximately 10:30 this morning I started a flow and observed it for a
>>> couple of hours before disabling it to look into a few unexpected results.
>>> By right-clicking individual processors and selecting “View data provenance”
>>> I can see the NiFi Data Provenance view
>>> For each processor I investigate I can see anywhere from 10 to 100
>>> provenance events that came in during the hours I was running my flow
>>> A few hours later I restart the flow. Data once again flows through and
>>> after a while I stop my flow again
>>> Now I again right-click on the processors and select “View data provenance”.
>>> No new provenance events seem to show up in the NiFi Data Provenance view
>>>
>>>
>>> I have checked m search filter to make sure I am not accidentally filtering
>>> out events. I have looked at the external systems that this flow touches and
>>> confirmed that data is/was flowing through these processors. But for some
>>> reason I can see no provenance records in the UI.
>>>
>>> I am using NiFi version 1.5
>>>
>>> I have not (yet) changed any of the default settings for NiFi and how its
>>> provenance repository is configured
>>>
>>> Any advice on where my provenance events are going or what I might be doing
>>> that causes the provenance system to go silent on me?
>>>
>>> Thanks
>>>
>>> -Tim
>>>
>>>
>

Reply via email to