I agree - good point :) It is possible indexing was stuck with the older implementation. Can you run 'bin/nifi.sh dump' and share the logs/nifi-bootstrap.log file if it is in that state/behavior again?
Thanks On Tue, May 22, 2018 at 1:33 PM, Tim Dean <[email protected]> wrote: > Thanks Joe - I’ll try those changes and report back with the results. > > Just out of curiosity, if my problem is happening because I am generating > more than 1 GB of provenance data, wouldn’t I expect to see the older > provenance data being deleted leaving the newer provenance data in tact? It > seems to me that my old data is still there and my new data is not. > > -Tim > > >> On May 22, 2018, at 12:15 PM, Joe Witt <[email protected]> wrote: >> >> Tim >> >> Got ya. So yeah keep in mind you'll only have at most 1GB of prov >> data and for at most 24 hours with that configuration. Also, as James >> mentioned the default searching for provenance can be too restrictive >> and you have to pay close attention to time stamps relative to the >> system doing the query/etc.. In general though it should work just >> fine. >> >> 1) definitely use the newer provenance. We need to change the default >> as the new one is very fast and very stable. >> >> To do this change >> >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository >> to >> nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository >> >> 2) Change retention period and size values such as >> >> nifi.provenance.repository.max.storage.time=72 hours >> nifi.provenance.repository.max.storage.size=50 GB >> >> There are some other tweaks you can do in terms of >> threads/sharding/etc.. that help with performance but the above are >> good to do now regardless of performance. >> >> Thanks >> >> On Tue, May 22, 2018 at 10:50 AM, Tim Dean <[email protected]> wrote: >>> Thanks Joe: >>> >>> I have not yet made any changes to the configuration. We are just beginning >>> the process of running out flow at scale and figuring out how to best >>> optimize the configuration, and I plan to make changes as needed once we can >>> get the flow functionally correct. Right now I’m having difficulty doing >>> that because the lack of provenance events. >>> >>> Here is the provenance-related properties I have in my nifi.properties file: >>> >>> # Provenance Repository Properties >>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository >>> nifi.provenance.repository.debug.frequency=1_000_000 >>> nifi.provenance.repository.encryption.key.provider.implementation= >>> nifi.provenance.repository.encryption.key.provider.location= >>> nifi.provenance.repository.encryption.key.id= >>> nifi.provenance.repository.encryption.key= >>> >>> # Persistent Provenance Repository Properties >>> nifi.provenance.repository.directory.default=./provenance_repository >>> nifi.provenance.repository.max.storage.time=24 hours >>> nifi.provenance.repository.max.storage.size=1 GB >>> nifi.provenance.repository.rollover.time=30 secs >>> nifi.provenance.repository.rollover.size=100 MB >>> nifi.provenance.repository.query.threads=2 >>> nifi.provenance.repository.index.threads=2 >>> nifi.provenance.repository.compress.on.rollover=true >>> nifi.provenance.repository.always.sync=false >>> nifi.provenance.repository.journal.count=16 >>> # Comma-separated list of fields. Fields that are not indexed will not be >>> searchable. Valid fields are: >>> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, >>> AlternateIdentifierURI, Relationship, Details >>> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, >>> ProcessorID, Relationship >>> # FlowFile Attributes that should be indexed and made searchable. Some >>> examples to consider are filename, uuid, mime.type >>> nifi.provenance.repository.indexed.attributes= >>> # Large values for the shard size will result in more Java heap usage when >>> searching the Provenance Repository >>> # but should provide better performance >>> nifi.provenance.repository.index.shard.size=500 MB >>> # Indicates the maximum length that a FlowFile attribute can be when >>> retrieving a Provenance Event from >>> # the repository. If the length of any attribute exceeds this value, it will >>> be truncated when the event is retrieved. >>> nifi.provenance.repository.max.attribute.length=65536 >>> nifi.provenance.repository.concurrent.merge.threads=2 >>> nifi.provenance.repository.warm.cache.frequency=1 hour >>> >>> # Volatile Provenance Respository Properties >>> nifi.provenance.repository.buffer.size=100000 >>> >>> >>> Thanks for any help you can provide on this >>> >>> -Tim >>> >>> On May 21, 2018, at 11:23 PM, Joe Witt <[email protected]> wrote: >>> >>> Tim, >>> >>> The default configuration for provenance event retention is >>> potentially a factor. >>> >>> Did you make any changes to those? Can you share relevant segments >>> from the nifi.properties file? >>> >>> Thanks >>> >>> On Mon, May 21, 2018 at 8:32 PM, Tim Dean <[email protected]> wrote: >>> >>> Hello, >>> >>> I am having a hard time troubleshooting a NiFi flow to see where things are >>> failing. I am trying to look at the provenance repository for a variety of >>> processors, but for some reason nothing more recent seems to be appearing >>> there. For example: >>> >>> At approximately 10:30 this morning I started a flow and observed it for a >>> couple of hours before disabling it to look into a few unexpected results. >>> By right-clicking individual processors and selecting “View data provenance” >>> I can see the NiFi Data Provenance view >>> For each processor I investigate I can see anywhere from 10 to 100 >>> provenance events that came in during the hours I was running my flow >>> A few hours later I restart the flow. Data once again flows through and >>> after a while I stop my flow again >>> Now I again right-click on the processors and select “View data provenance”. >>> No new provenance events seem to show up in the NiFi Data Provenance view >>> >>> >>> I have checked m search filter to make sure I am not accidentally filtering >>> out events. I have looked at the external systems that this flow touches and >>> confirmed that data is/was flowing through these processors. But for some >>> reason I can see no provenance records in the UI. >>> >>> I am using NiFi version 1.5 >>> >>> I have not (yet) changed any of the default settings for NiFi and how its >>> provenance repository is configured >>> >>> Any advice on where my provenance events are going or what I might be doing >>> that causes the provenance system to go silent on me? >>> >>> Thanks >>> >>> -Tim >>> >>> >
