Hello! I am working on the following use case - I have a flow which is listing the json files in a folder (between 200-300), then splitting each file to high level 'message' (up to 2000 messages/file) and then splitting each high level message into low level messages (up to 60low level messages per high level message). The low level messages are then processed and inserted into a database. The flow then moves to the next folder once all files in the current folder are split into lower level messages and inserted into a database. This flow is quite data intensive and quickly the queues between processors get full although the back pressure threshold is set to 20000 flowfiles and the most computationally intensive processors are scheduled with 2 or more threads.
To do that I have implemented three pairs of Wait-Notify processors. To do that I have followed this great tutorial http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/ The Wait processors are connected to the respective original SplitJSON connection as follows: the first is Wait processor when the list of files is split to files, the second - when each file is split into high level 'messages' and the third - when the high level messages are split into low level messages. The corresponding Notify processors are as follows: - Notify for the low level messages is after they are inserted into the database. - Notify for the high level messages is connected to the third Wait processor (the one waiting for the low level messages to be processed) - The last Notify is connected to the Wait processor which is waiting for the high level messages in each file to be processed. I have also created one Distributed MapCacheServer with the respective Clients for each Wait-Notify pair. This set up worked fine for approximately a day while I was developing it and the flow was moving to the next folders. After that I changed the input source and it stopped releasing files from the second Wait-Notify pair (the only difference between the two data sources is that the second one has larger files). The Wait-Notify pair dealing with the low level messages continued to release flow files but the one which was working on the high level messages did not - the high level messages Wait processor was not sending any files to the following processor. I have read about the wait queue prioritizers and changed the priority to both Wait-Notify pairs to FIFO - this worked, meaning flowfiles were released from the high-level messages Wait-Notify pair, for some time and then it stopped - no more files were released. I have also read about wait penalty duration and experimented with different values but with no success. I have also started looking into the wait_buffer_count and relesable_flow_file_count properties of the Wait processor but can't fully understand how they should be used. I would expect that their values should be chosen depending on the incoming data and have tried values between 20000 and 100000 but it does not seem to help releasing files from the high level messages Wait processor. I have also looked into the Notify signal_buffer_count and thought that it should be connected to the values for Wait wait_buffer_count somehow for the balanced work of the Wait-Notify pair. As it was working with the first data source it feels like property configuration issue but I am stuck now and have no more ideas where to look at. Any suggestions and hints about the interaction of the different properties would be greatly appreciated. Many thanks in advance! Valentina