Re: Problem with load balancing option

Jean-Sebastien Vachon Mon, 25 Mar 2019 04:31:44 -0700

Hi,

I saw that bug report and I will upgrade to the latest version ASAP. But my 
main problem was the lack of the section to configure the load balancer 
correctly. Once I've added the section and opened the required ports in my 
infrastructure, everything started to work as expected and it is a life changer 
😉


The load is now properly balanced between all nodes and the performance boost I 
got is outstanding

One note however, I've checked the migration guide from 1.8 to 1.9 and didn't 
see any mention of this new section within nifi.properties. It might be good 
idea to add a section about this so that people upgrading their cluster have 
all the information at hand. This might save them some time.

Thanks all for your outstanding work
________________________________
From: Koji Kawamura <[email protected]>
Sent: Sunday, March 24, 2019 10:39 PM
To: [email protected]
Cc: Jean-Sebastien Vachon
Subject: Re: Problem with load balancing option

Hi,

That looks similar to this one:
Occasionally FlowFiles appear to get "stuck" in a Load-Balanced Connection
https://issues.apache.org/jira/browse/NIFI-5919

If you're using NiFi 1.8.0, I recommend trying the latest 1.9.1 which
has the fix for the above issue.

Hope this helps.

Koji

On Sat, Mar 23, 2019 at 12:15 AM Jean-Sebastien Vachon
<[email protected]> wrote:
>
> Hi,
>
> FYI, I managed to get my node back by removing the node from the cluster, 
> deleting the local flow and restart Nifi.
>
> Hope this helps identify the issue
> ________________________________
> From: Jean-Sebastien Vachon <[email protected]>
> Sent: Friday, March 22, 2019 10:56 AM
> To: [email protected]
> Subject: Re: Problem with load balancing option
>
> Hi again,
>
> I thought everything was fine but one of my node can not start..
>
> 2019-03-22 14:51:27,811 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
> Successfully recovered 10396 records in 367 milliseconds. Now checkpointing 
> to ensure that Write-Ahead Log is in a consistent state
> 2019-03-22 14:51:28,046 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
> Checkpointed Write-Ahead Log with 10396 Records and 0 Swap Files in 235 
> milliseconds (Stop-the-world time = 6 milliseconds), max Transaction ID 24370
> 2019-03-22 14:51:28,065 ERROR [main] o.a.nifi.controller.StandardFlowService 
> Failed to load flow from cluster due to: 
> org.apache.nifi.cluster.ConnectionExcepti
> on: Failed to connect node to cluster due to: 
> java.lang.ArrayIndexOutOfBoundsException: -1
> org.apache.nifi.cluster.ConnectionException: Failed to connect node to 
> cluster due to: java.lang.ArrayIndexOutOfBoundsException: -1
>         at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1009)
>         at 
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
>         at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:939)
>         at org.apache.nifi.NiFi.<init>(NiFi.java:157)
>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
>         at org.apache.nifi.NiFi.main(NiFi.java:296)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
>         at 
> org.apache.nifi.controller.queue.clustered.partition.CorrelationAttributePartitioner.getPartition(CorrelationAttributePartitioner.java:44)
>         at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.getPartition(SocketLoadBalancedFlowFileQueue.java:611)
>         at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.putAndGetPartition(SocketLoadBalancedFlowFileQueue.java:749)
>         at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.put(SocketLoadBalancedFlowFileQueue.java:739)
>         at 
> org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:587)
>         at 
> org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:818)
>         at 
> org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:1019)
>         at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:991)
>         ... 5 common frames omitted
>
> Any idea?
> ________________________________
> From: Jean-Sebastien Vachon
> Sent: Friday, March 22, 2019 10:34 AM
> To: Jean-Sebastien Vachon; [email protected]
> Subject: Re: Problem with load balancing option
>
> Hi,
>
> I stopped each node one by one and the queue is now empty. Not sure if this 
> is a bug or intended but it does look strange from a user point of view
>
> Thanks
> ________________________________
> From: Jean-Sebastien Vachon <[email protected]>
> Sent: Friday, March 22, 2019 10:28 AM
> To: [email protected]
> Subject: Problem with load balancing option
>
> Hi all,
>
> I've configured one of my connection to use the "partition by attribute" load 
> balancing option.
> It was not working as expected and after a few tests I realized I was missing 
> some dependencies on the cluster nodes so I stopped everything (not related 
> to the load balancing or Nifi at all)
>
> Now, I stopped everything before fixing  my dependencies issues and the UI 
> shows 1906 items in the queue for that connection but I can't list them or 
> empty the queue.
> Nifi tells me that there are no flow files in the queue when I try to list 
> them and that 0 flowfiles out of 1906 were removed from the queue.
>
> I tried connecting the destination to some other process like a LogMessage 
> processor but nothing is happening. The 1906 items are stuck and I cannot 
> delete the connection because it's not empty.
>
> Any recommendations to fix this?
>
> thanks
>

Re: Problem with load balancing option

Reply via email to