Re: Load balancer queues stuck on 1.9.2?

Joe Gresock Tue, 04 Jun 2019 09:15:11 -0700

prod-5 and -6 don't appear to be receiving any data in that queue, based on
the status history.  Is there anything I should see in the logs to confirm
this?


On Tue, Jun 4, 2019 at 4:05 PM Mark Payne <[email protected]> wrote:

> Joe,
>
> So it looks like from the Diagnostics info, that there are currently 500
> FlowFiles queued up.
> They all live on prod-8.ec2.internal:8443. Of those 500, 250 are waiting
> to go to prod-5.ec2.internal:8443,
> and 250 are waiting to go to prod-6.ec2.internal:8443.
>
> So this tells us that if there are any problems, they are likely occurring
> on one of those 3 nodes. It's also not
> related to swapping if it's in this state with only 500 FlowFiles queued.
>
> Are you able to confirm that you are indeed receiving data from the load
> balanced queue on both prod-5 and prod-6?
>
>
> On Jun 4, 2019, at 11:47 AM, Joe Gresock <[email protected]> wrote:
>
> Thanks Mark.
>
> I'm running on Linux.  I've followed your suggestion and added an
> UpdateAttribute processor to the flow, and attached the diagnostics for it.
>
> I also don't see any errors in the logs.
>
> On Tue, Jun 4, 2019 at 3:34 PM Mark Payne <[email protected]> wrote:
>
>> Joe,
>>
>> The first thing that comes to mind would be NIFI-6285, as Bryan points
>> out. However,
>> that only would affect you if you are running on Windows. So, the first
>> question is:
>> what operating system are you running on? :)
>>
>> If it's not Windows, I would recommend getting some diagnostics info if
>> possible. To do this,
>> you can go to 
>> http://<hostname>:<port>/nifi-api/processors/<processor-id>/diagnostics.
>> For example,
>> if you get to nifi by going to http://nifi01:8080/nifi, and you want
>> diagnostics for processor with ID 1234,
>> then try going to http://nifi01:8080/nifi-api/processors/1234/diagnostics in
>> your browser.
>>
>> But a couple of caveats on the 'diagnostics' approach above. It will only
>> work if you are running an insecure
>> NiFi instance, or if you are secured using certificates. We want the
>> diagnostics for the Processor that is either
>> the source of the connection or the destination of the connection - it
>> doesn't matter which. This will give us a
>> lot of information about the internal structure of the connection's
>> FlowFile Queue. Of course, you said that your
>> connection is between two Process Groups, which means that neither the
>> source nor the destination is a Processor,
>> so I would recommend creating a dummy Processor like UpdateAttribute and
>> temporarily dragging the Connection
>> so that it points to that Processor, just to get the diagnostic
>> information, then dragging the connection back.
>>
>> Of course, it would also be helpful to look for any errors in the logs.
>> But if you are able to get the diagnostics info
>> as described above, that's usually the best bet for debugging this sort
>> of thing.
>>
>> Thanks
>> -Mark
>>
>>
>> On Jun 4, 2019, at 11:13 AM, Bryan Bende <[email protected]> wrote:
>>
>> Joe,
>>
>> There are two known issues that possibly seem related...
>>
>> The first was already addressed in 1.9.0, but the reason I mention it
>> is because it was specific to a connection between two ports:
>>
>> https://issues.apache.org/jira/browse/NIFI-5919
>>
>> The second is not in a release yet, but is addressed in master, and
>> has to do with swapping:
>>
>> https://issues.apache.org/jira/browse/NIFI-6285
>>
>> Seems like you wouldn't hit the first one since you are on 1.9.2, but
>> does seem odd that is the same scenario.
>>
>> Mark P probably knows best about debugging, but I'm guessing possibly
>> a thread dump while in this state would be helpful.
>>
>> -Bryan
>>
>> On Tue, Jun 4, 2019 at 10:56 AM Joe Gresock <[email protected]> wrote:
>>
>>
>> I have round robin load balanced connections working on one cluster, but
>> on another, this type of connection seems to be stuck.
>>
>> What would be the best way to debug this problem?  The connection is from
>> one processor group to another, so it's from an Output Port to an Input
>> Port.
>>
>> My configuration is as follows:
>> nifi.cluster.load.balance.host=
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> And I ensured port 6342 is open from one node to another using the
>> cluster node addresses.
>>
>> Is there some error that should appear in the logs if flow files get
>> stuck here?
>>
>> I suspect they are actually stuck, not just missing, because the
>> remainder of the flow is back-pressured up until this point in the flow.
>>
>> Thanks!
>> Joe
>>
>>
>>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
> <diagnostics.json.gz>
>
>
>

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: Load balancer queues stuck on 1.9.2?

Reply via email to