We are on 1.7.1.g and have just recently established our first clustered
configuration. Using Pierre Villard's article from Feb 2017 (
https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/
) and a few other related technical articles to flesh out some details, we
have gotten a ListFile / FetchFile to distribute load using Remote Process
Group - almost.

Downstream of the FetchFile running on all nodes I connect to a Monitor
Activity processor simply to examine the flowfiles that result from the
fetch, in that following queue. In that queue one can look at the info for
each flowfile and find what appears to be the node on which the flowfile
was processed by field Node Address.

I have four nodes in my cluster - one primary, three not primary. I can see
in the queue listing that three flowfiles share common Position values.
Three have Position 1, three have Position 2, etc etc etc in a pattern that
repeats throughout the entire queue. Within each Position group, the
flowfiles have been distributed to node2, node3, and node4 - *but none at
all to node1*.

What would cause such behavior? How can I get my files to distribute across
all four nodes?
I should mention:
1. all four node URLS are in the RPG URL configuration parameter, delimited
by commas.
2. node1 is currently assigned by my external Zookeeper as my Primary, and
is where the ListFile processor executes.
3. all four nodes are granted access for "retrieve site-to-site details" in
my Hamburger Menu, Access Policies.
4. all four nodes are granted access for "receive data via site-to-site" in
the Access Policies for the RPG Input Port.

My concern is that I am leaving nearly 25% of my available cluster capacity
unused.

Reply via email to