Joe -
I am currently in this state where new provenance events are not showing up in
the UI (I have not yet made the configuration changes you suggested below)
When I try to run the dump command you suggested, I get a timeout error:
Java home: /usr/lib/jvm/default-java/jre
NiFi home: /opt/nifi-1.5.0
Bootstrap Config File: /opt/nifi-1.5.0/conf/bootstrap.conf
Exception in thread "main" java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at org.apache.nifi.bootstrap.RunNiFi.dump(RunNiFi.java:707)
at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:233)
The system does seem to be running, although a little slow right now. I’m not
sure if the error above is expected given that the system is running a bit
slowly, or if there is something more fundamentally wrong with my system. I did
restart the NiFi service and it seems to clear out the problem. Provenance
events are once again showing up in the user interface. So even though NiFi
seemed to be running (flow files were being processed, the user interface was
slow but functioning) it appears that provenance reporting/indexing and
whatever is used by the dump utility were not functioning.
We’re in the process of assessing our memory use and adjusting configuration as
needed, so some of these problems may go away once we’ve tuned the system.
Other than that, are there specific tools I should be using or logging I should
be monitoring to track down problems with provenance reporting?
Thanks for all your help.
-Tim
> On May 22, 2018, at 12:38 PM, Joe Witt <[email protected]> wrote:
>
> I agree - good point :)
>
> It is possible indexing was stuck with the older implementation. Can
> you run 'bin/nifi.sh dump' and share the logs/nifi-bootstrap.log
> file if it is in that state/behavior again?
>
> Thanks
>
> On Tue, May 22, 2018 at 1:33 PM, Tim Dean <[email protected]> wrote:
>> Thanks Joe - I’ll try those changes and report back with the results.
>>
>> Just out of curiosity, if my problem is happening because I am generating
>> more than 1 GB of provenance data, wouldn’t I expect to see the older
>> provenance data being deleted leaving the newer provenance data in tact? It
>> seems to me that my old data is still there and my new data is not.
>>
>> -Tim
>>
>>
>>> On May 22, 2018, at 12:15 PM, Joe Witt <[email protected]> wrote:
>>>
>>> Tim
>>>
>>> Got ya. So yeah keep in mind you'll only have at most 1GB of prov
>>> data and for at most 24 hours with that configuration. Also, as James
>>> mentioned the default searching for provenance can be too restrictive
>>> and you have to pay close attention to time stamps relative to the
>>> system doing the query/etc.. In general though it should work just
>>> fine.
>>>
>>> 1) definitely use the newer provenance. We need to change the default
>>> as the new one is very fast and very stable.
>>>
>>> To do this change
>>>
>>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
>>> to
>>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
>>>
>>> 2) Change retention period and size values such as
>>>
>>> nifi.provenance.repository.max.storage.time=72 hours
>>> nifi.provenance.repository.max.storage.size=50 GB
>>>
>>> There are some other tweaks you can do in terms of
>>> threads/sharding/etc.. that help with performance but the above are
>>> good to do now regardless of performance.
>>>
>>> Thanks
>>>
>>> On Tue, May 22, 2018 at 10:50 AM, Tim Dean <[email protected]> wrote:
>>>> Thanks Joe:
>>>>
>>>> I have not yet made any changes to the configuration. We are just beginning
>>>> the process of running out flow at scale and figuring out how to best
>>>> optimize the configuration, and I plan to make changes as needed once we
>>>> can
>>>> get the flow functionally correct. Right now I’m having difficulty doing
>>>> that because the lack of provenance events.
>>>>
>>>> Here is the provenance-related properties I have in my nifi.properties
>>>> file:
>>>>
>>>> # Provenance Repository Properties
>>>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
>>>> nifi.provenance.repository.debug.frequency=1_000_000
>>>> nifi.provenance.repository.encryption.key.provider.implementation=
>>>> nifi.provenance.repository.encryption.key.provider.location=
>>>> nifi.provenance.repository.encryption.key.id=
>>>> nifi.provenance.repository.encryption.key=
>>>>
>>>> # Persistent Provenance Repository Properties
>>>> nifi.provenance.repository.directory.default=./provenance_repository
>>>> nifi.provenance.repository.max.storage.time=24 hours
>>>> nifi.provenance.repository.max.storage.size=1 GB
>>>> nifi.provenance.repository.rollover.time=30 secs
>>>> nifi.provenance.repository.rollover.size=100 MB
>>>> nifi.provenance.repository.query.threads=2
>>>> nifi.provenance.repository.index.threads=2
>>>> nifi.provenance.repository.compress.on.rollover=true
>>>> nifi.provenance.repository.always.sync=false
>>>> nifi.provenance.repository.journal.count=16
>>>> # Comma-separated list of fields. Fields that are not indexed will not be
>>>> searchable. Valid fields are:
>>>> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
>>>> AlternateIdentifierURI, Relationship, Details
>>>> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,
>>>> Filename,
>>>> ProcessorID, Relationship
>>>> # FlowFile Attributes that should be indexed and made searchable. Some
>>>> examples to consider are filename, uuid, mime.type
>>>> nifi.provenance.repository.indexed.attributes=
>>>> # Large values for the shard size will result in more Java heap usage when
>>>> searching the Provenance Repository
>>>> # but should provide better performance
>>>> nifi.provenance.repository.index.shard.size=500 MB
>>>> # Indicates the maximum length that a FlowFile attribute can be when
>>>> retrieving a Provenance Event from
>>>> # the repository. If the length of any attribute exceeds this value, it
>>>> will
>>>> be truncated when the event is retrieved.
>>>> nifi.provenance.repository.max.attribute.length=65536
>>>> nifi.provenance.repository.concurrent.merge.threads=2
>>>> nifi.provenance.repository.warm.cache.frequency=1 hour
>>>>
>>>> # Volatile Provenance Respository Properties
>>>> nifi.provenance.repository.buffer.size=100000
>>>>
>>>>
>>>> Thanks for any help you can provide on this
>>>>
>>>> -Tim
>>>>
>>>> On May 21, 2018, at 11:23 PM, Joe Witt <[email protected]> wrote:
>>>>
>>>> Tim,
>>>>
>>>> The default configuration for provenance event retention is
>>>> potentially a factor.
>>>>
>>>> Did you make any changes to those? Can you share relevant segments
>>>> from the nifi.properties file?
>>>>
>>>> Thanks
>>>>
>>>> On Mon, May 21, 2018 at 8:32 PM, Tim Dean <[email protected]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I am having a hard time troubleshooting a NiFi flow to see where things are
>>>> failing. I am trying to look at the provenance repository for a variety of
>>>> processors, but for some reason nothing more recent seems to be appearing
>>>> there. For example:
>>>>
>>>> At approximately 10:30 this morning I started a flow and observed it for a
>>>> couple of hours before disabling it to look into a few unexpected results.
>>>> By right-clicking individual processors and selecting “View data
>>>> provenance”
>>>> I can see the NiFi Data Provenance view
>>>> For each processor I investigate I can see anywhere from 10 to 100
>>>> provenance events that came in during the hours I was running my flow
>>>> A few hours later I restart the flow. Data once again flows through and
>>>> after a while I stop my flow again
>>>> Now I again right-click on the processors and select “View data
>>>> provenance”.
>>>> No new provenance events seem to show up in the NiFi Data Provenance view
>>>>
>>>>
>>>> I have checked m search filter to make sure I am not accidentally filtering
>>>> out events. I have looked at the external systems that this flow touches
>>>> and
>>>> confirmed that data is/was flowing through these processors. But for some
>>>> reason I can see no provenance records in the UI.
>>>>
>>>> I am using NiFi version 1.5
>>>>
>>>> I have not (yet) changed any of the default settings for NiFi and how its
>>>> provenance repository is configured
>>>>
>>>> Any advice on where my provenance events are going or what I might be doing
>>>> that causes the provenance system to go silent on me?
>>>>
>>>> Thanks
>>>>
>>>> -Tim
>>>>
>>>>
>>