I have a three node cluster and I am trying to rewrite a dataflow that's used in several places to have the common parts distribute the data across the cluster in a more efficient and load balanced way. This is my first experience with RPGs, so I was just starting from basics and working my way up, but I am just out of the gate and already confused.
Here's the setup. I have an input port on my root dataflow which points to a LogMessage processor. In another process group I have an RPG configured with the three endpoints of the cluster separated by commas. Feeding into that is a GenerateFlowFile processor which is running every 5ms with 9 concurrent tasks on the primary node only. Everything else has default values. When I start the dataflow it more or less works as expected except that the distribution of FlowFiles looks uneven. That is if I look at the Status History of the LogMessage processor and select the FlowFiles In it looks like the two non-primary nodes have the bulk of the flows files moving through them. I can wrap my head around that. But then I rewrote it to put a DistributeLoad processor in front of three RPGs, one for each node in the cluster, and left it set to `round robin`. The FlowFiles In on the LogMessage processor looks exactly the same as before. The bulk of the FlowFiles In are on the two non-primary nodes. In 5 minutes there are about 500K FlowFiles being processed and two non-primary nodes are processing 234238 and 233089, with the primary node processing 47597. What am I missing? Why doesn't a round robin distribute them evenly? Neil
