We are on 1.7.1.g and have just recently established our first clustered configuration. Using Pierre Villard's article from Feb 2017 ( https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/ ) and a few other related technical articles to flesh out some details, we have gotten a ListFile / FetchFile to distribute load using Remote Process Group - almost.
Downstream of the FetchFile running on all nodes I connect to a Monitor Activity processor simply to examine the flowfiles that result from the fetch, in that following queue. In that queue one can look at the info for each flowfile and find what appears to be the node on which the flowfile was processed by field Node Address. I have four nodes in my cluster - one primary, three not primary. I can see in the queue listing that three flowfiles share common Position values. Three have Position 1, three have Position 2, etc etc etc in a pattern that repeats throughout the entire queue. Within each Position group, the flowfiles have been distributed to node2, node3, and node4 - *but none at all to node1*. What would cause such behavior? How can I get my files to distribute across all four nodes? I should mention: 1. all four node URLS are in the RPG URL configuration parameter, delimited by commas. 2. node1 is currently assigned by my external Zookeeper as my Primary, and is where the ListFile processor executes. 3. all four nodes are granted access for "retrieve site-to-site details" in my Hamburger Menu, Access Policies. 4. all four nodes are granted access for "receive data via site-to-site" in the Access Policies for the RPG Input Port. My concern is that I am leaving nearly 25% of my available cluster capacity unused.
