Re: Different NiFi Node sizes within same cluster

Koji Kawamura Thu, 07 Mar 2019 16:59:18 -0800

> The last thing I'm looking to understand is what Byran B brought up, do load 
> balanced connections take into consideration the load of each node?


No, load balanced connection doesn't use load of each node to
calculate destination currently.

As future improvement ideas.
We can implement another FlowFilePartitioner that uses QueuePartition.size().
Or add a nifi.property to specify the number of partitions each node
has. This may be helpful if the cluster consists of nodes having
different specs.

The rest is a note for some important lines of code to understand how
load balancing and partitioning works.

None of FlowFilePartitioner implementation takes into consideration
the load of each node.
- PARTITION_BY_ATTRIBUTE: Calculate hash from FlowFile attribute
value, then calculate target partition using consistent hashing. If
the attribute value doesn't distribute well, some node gets higher
number of FlowFiles.
- ROUND_ROBIN: We could implement another round robin strategy, that
uses QueuePartition.size() to pick a destination with less queued
FlowFile.
- SINGLE_NODE: Always uses the partitions[0]. Meaning the first node
in node identifier order.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/partition/FlowFilePartitioner.java

For example, let's use 5 node cluster.

Partitions are created using sorted node identifiers.
The num of partitions = the num of nodes.
Each node will have 5 partitions. 1 LocalPartition, and 4 RemoteQueuePartition.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/SocketLoadBalancedFlowFileQueue.java#L140,L162

Each RemoteQueuePartition register itself to clientRegistry.
In this case, there are 4 clients for this loop.
Each node execute this task periodically.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/client/async/nio/NioAsyncLoadBalanceClientTask.java#L50,L76

Interestingly, the task is created for N times. N is configured at
nifi.cluster.load.balance.max.thread.count. 8 by default.
So, 8 threads loops through 4 clients?
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java#L652

Thanks,
Koji

On Thu, Mar 7, 2019 at 9:19 PM Chad Woodhead <[email protected]> wrote:
>
> Thanks all for the input. So from what I'm gathering, storage differences of 
> around 5 GB (125 GB vs 130 GB) should not cause any problems/load impacts. 
> Larger storage differences could have load impacts. Differences in CPU and 
> RAM could definitely have load impacts. Luckily my older nodes have the same 
> CPU and RAM counts/specs as my new nodes.
>
> The last thing I'm looking to understand is what Byran B brought up, do load 
> balanced connections take into consideration the load of each node?
>
> Thanks,
> Chad
>
> On Wed, Mar 6, 2019 at 4:50 PM Bryan Bende <[email protected]> wrote:
>>
>> Yea ListenTCP also doesn't handle the back-pressure with the client
>> the way it really should.
>>
>> Regarding the load balancing, I believe traditional s2s does factor in
>> the load of each node when deciding how to load balance, but I don't
>> know if this is part of load balanced connections or not. Mark P would
>> know for sure.
>>
>> On Wed, Mar 6, 2019 at 4:47 PM James Srinivasan
>> <[email protected]> wrote:
>> >
>> > Yup, but because of the unfortunate way the source (outside NiFi)
>> > works, it doesn't buffer for long when the connection doesn't pull or
>> > drops. It behaves far more like a 5 Mbps UDP stream really :-(
>> >
>> > On Wed, 6 Mar 2019 at 21:44, Bryan Bende <[email protected]> wrote:
>> > >
>> > > James, just curious, what was your source processor in this case? 
>> > > ListenTCP?
>> > >
>> > > On Wed, Mar 6, 2019 at 4:26 PM Jon Logan <[email protected]> wrote:
>> > > >
>> > > > What really would resolve some of these issues is backpressure on CPU 
>> > > > -- ie. let Nifi throttle itself down to not choke the machine until it 
>> > > > dies if constrained on CPU. Easier said than done unfortunately.
>> > > >
>> > > > On Wed, Mar 6, 2019 at 4:23 PM James Srinivasan 
>> > > > <[email protected]> wrote:
>> > > >>
>> > > >> In our case, backpressure applied all the way up to the TCP network
>> > > >> source which meant we lost data. AIUI, current load balancing is round
>> > > >> robin (and two other options prob not relevant). Would actual load
>> > > >> balancing (e.g. send to node with lowest OS load, or number of active
>> > > >> threads) be a reasonable request?
>> > > >>
>> > > >> On Wed, 6 Mar 2019 at 20:51, Joe Witt <[email protected]> wrote:
>> > > >> >
>> > > >> > This is generally workable (heterogenous node capabilities) in NiFi 
>> > > >> > clustering.  But you do want to leverage back-pressure and load 
>> > > >> > balanced connections so that faster nodes will have an opportunity 
>> > > >> > to take on the workload for slower nodes.
>> > > >> >
>> > > >> > Thanks
>> > > >> >
>> > > >> > On Wed, Mar 6, 2019 at 3:48 PM James Srinivasan 
>> > > >> > <[email protected]> wrote:
>> > > >> >>
>> > > >> >> Yes, we hit this with the new load balanced queues (which, to be 
>> > > >> >> fair, we also had with remote process groups previously). Two 
>> > > >> >> "old" nodes got saturated and their queues filled while three 
>> > > >> >> "new" nodes were fine.
>> > > >> >>
>> > > >> >> My "solution" was to move everything to new hardware which we had 
>> > > >> >> inbound anyway.
>> > > >> >>
>> > > >> >> On Wed, 6 Mar 2019, 20:40 Jon Logan, <[email protected]> wrote:
>> > > >> >>>
>> > > >> >>> You may run into issues with different processing power, as some 
>> > > >> >>> machines may be overwhelmed in order to saturate other machines.
>> > > >> >>>
>> > > >> >>> On Wed, Mar 6, 2019 at 3:34 PM Mark Payne <[email protected]> 
>> > > >> >>> wrote:
>> > > >> >>>>
>> > > >> >>>> Chad,
>> > > >> >>>>
>> > > >> >>>> This should not be a problem, given that all nodes have enough 
>> > > >> >>>> storage available to handle the influx of data.
>> > > >> >>>>
>> > > >> >>>> Thanks
>> > > >> >>>> -Mark
>> > > >> >>>>
>> > > >> >>>>
>> > > >> >>>> > On Mar 6, 2019, at 1:44 PM, Chad Woodhead 
>> > > >> >>>> > <[email protected]> wrote:
>> > > >> >>>> >
>> > > >> >>>> > Are there any negative effects of having filesystem mounts 
>> > > >> >>>> > (dedicated mounts for each repo) used by the different NiFi 
>> > > >> >>>> > repositories differ in size on NiFi nodes within the same 
>> > > >> >>>> > cluster? For instance, if some nodes have a content_repo mount 
>> > > >> >>>> > of 130 GB and other nodes have a content_repo mount of 125 GB, 
>> > > >> >>>> > could that cause any problems or cause one node to be used 
>> > > >> >>>> > more since it has more space? What about if the difference was 
>> > > >> >>>> > larger, by say a 100 GB difference?
>> > > >> >>>> >
>> > > >> >>>> > Trying to repurpose old nodes and add them as NiFi nodes, but 
>> > > >> >>>> > their mount sizes are different than my current cluster’s 
>> > > >> >>>> > nodes and I’ve noticed I can’t set the max size limit to use 
>> > > >> >>>> > of a particular mount for a repo.
>> > > >> >>>> >
>> > > >> >>>> > -Chad
>> > > >> >>>>

Re: Different NiFi Node sizes within same cluster

Reply via email to