Re: [EXT] Re: Cluster Warnings

Martijn Dekkers Tue, 06 Nov 2018 07:39:03 -0800

Did you ever get to the bottom of this Karthik? I have seen similar
issue, and turned out to be related to hardware not keeping up with the
increased volume of work (which is why a node was added to begin with)


On Fri, 26 Oct 2018, at 19:18, Karthik Kothareddy (karthikk) [CONT - Type 2] 
wrote:> Any light on this? We’re still seeing warnings every 10 seconds which
> is really annoying and no idea what’s triggering them.>  


> -Karthik


>  


> *From:* Karthik Kothareddy (karthikk) [CONT - Type 2] *Sent:* Tuesday,
> October 16, 2018 8:27 AM *To:* [email protected] *Subject:* RE:
> [EXT] Re: Cluster Warnings>  


> Below is some of the nifi-app.log when DEBUG mode is enabled.


>  


> *2018-10-16 14:21:11,577 DEBUG [Replicate Request Thread-3]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Received response from
> NIFI01:8443 for GET /nifi-api/site-to-site*> *2018-10-16 14:21:11,577 DEBUG 
> [Replicate Request Thread-3]
> o.a.n.c.c.h.r.StandardAsyncClusterResponse Received response 4 out of
> 4 for c910f24a-96fe-4efc-a8b8-6ef8d2674524 from NIFI01:8443*> *2018-10-16 
> 14:21:11,577 DEBUG [Replicate Request Thread-3]
> o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged
> response is ready for c910f24a-96fe-4efc-a8b8-6ef8d2674524*> *2018-10-16 
> 14:21:11,578 DEBUG [Replicate Request Thread-3]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator For GET /nifi-api/site-to-
> site (Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524), minimum
> response time = 2, max = 224, average = 59.0 ms*> *2018-10-16 14:21:11,578 
> DEBUG [Replicate Request Thread-3]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET 
> /nifi-api/site-to-
> site (Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524):*> *NIFI04:8443: 2 
> millis*


> *NIFI02:8443: 5 millis*


> *NIFI01:8443: 224 millis*


> *NIFI03:8443: 5 millis*


> * *


> *2018-10-16 14:21:11,578 DEBUG [Replicate Request Thread-3]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Notified monitor
> java.lang.Object@1da27fc4 because request GET
> https://NIFI03:8443/nifi-api/site-to-site has completed*> *2018-10-16 
> 14:21:11,578 DEBUG [NiFi Web Server-304048]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Unlocked  java.util.concurr-
> ent.locks.ReentrantReadWriteLock$ReadLock@3d507660[Read locks = 0]
> after replication completed for GET
> https://NIFI03:8443/nifi-api/site-to-site*> *2018-10-16 14:21:11,578 DEBUG 
> [NiFi Web Server-304048]
> o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged
> response is complete for c910f24a-96fe-4efc-a8b8-6ef8d2674524*> *2018-10-16 
> 14:21:11,578 DEBUG [NiFi Web Server-304048]
> o.a.n.c.c.h.r.StandardAsyncClusterResponse For GET
> https://NIFI03:8443/nifi-api/site-to-site Timing Info is as follows:*> 
> *Verify Cluster State for All Nodes took 0 millis*


> *Wait for HTTP Request Replication to be triggered for NIFI04:8443
> took 0 millis*> *Wait for HTTP Request Replication to be triggered for 
> NIFI02:8443
> took 0 millis*> *Wait for HTTP Request Replication to be triggered for 
> NIFI01:8443
> took 0 millis*> *Wait for HTTP Request Replication to be triggered for 
> NIFI03:8443
> took 0 millis*> *Perform HTTP Request for NIFI04:8443 took 2 millis*


> *Perform HTTP Request for NIFI03:8443 took 5 millis*


> *Perform HTTP Request for NIFI02:8443 took 5 millis*


> *Perform HTTP Request for NIFI01:8443 took 224 millis*


> *Phase Completed for All Nodes took 224 millis*


> *Notifying All Threads that Request is Complete took 0 millis*


> *Total Time for All Nodes took 224 millis*


> *Map/Merge Responses for All Nodes took 0 millis*


>  


> -Karthik


>  


> *From:* Joe Witt [mailto:[email protected]] *Sent:* Tuesday, October
> 16, 2018 8:03 AM *To:* [email protected] *Subject:* Re: [EXT] Re:
> Cluster Warnings>  


> karthik


>  


> understood.  do you have those logs?


>  


>  


>  


> On Tue, Oct 16, 2018, 9:59 AM Karthik Kothareddy (karthikk) [CONT -
> Type 2] <[email protected]> wrote:>> Joe,


>>  


>> The slow node is Node04 in this case but we get one such slow
>> response from a random node(Node01, Node02,Node03) every time we see
>> this warning.>>  


>> -Karthik


>>  


>> *From:* Joe Witt [mailto:[email protected]] *Sent:* Tuesday, October
>> 16, 2018 7:55 AM *To:* [email protected] *Subject:* [EXT] Re:
>> Cluster Warnings>>  


>> the logs show the fourth node is the slowest by far in all cases.
>> possibly a dns or other related issue?  but def focus on that node as
>> the outlier and presuming nifi config is identical it suggest
>> system/network differences from other nodes.>>  


>> thanks


>>  


>> On Tue, Oct 16, 2018, 9:51 AM Karthik Kothareddy (karthikk) [CONT -
>> Type 2] <[email protected]> wrote:>>>  


>>> Hello,


>>>  


>>> We’re running a 4-node cluster on NiFi 1.7.1. The fourth node was
>>> added recently and as soon as we added the 4th node, we started
>>> seeing below warnings>>>  


>>> *Response time from NODE2 was slow for each of the last 3 requests
>>> made. To see more information about timing, enable DEBUG logging for
>>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolReq-
>>> uestReplicator*>>> * *


>>> Initially we though the problem was with the recent node added and
>>> cross checked all the configs on the box and everything seemed to be
>>> just fine. After enabling the DEBUG mode for cluster logging we
>>> noticed that the warning is not specific to any node and every-time
>>> we see a warning like above there is one slow node which takes
>>> forever to send a response like below (in this case the slow node is
>>> NIFI04). Sometimes these will lead to node-disconnects needing a
>>> manual intervention.>>>  


>>> *DEBUG [Replicate Request Thread-50]
>>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET 
>>> /nifi-api/site-to-
>>> site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5):*>>> *NIFI04:8443: 
>>> 1386 millis*


>>> *NIFI02:8443: 3 millis*


>>> *NIFI01:8443: 5 millis*


>>> *NIFI03:8443: 3 millis*


>>> *DEBUG [Replicate Request Thread-41]
>>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET 
>>> /nifi-api/site-to-
>>> site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622):*>>> *NIFI04:8443: 
>>> 1143 millis*


>>> *NIFI02:8443: 22 millis*


>>> *NIFI01:8443: 3 millis*


>>> *NIFI03:8443: 2 millis*


>>> *DEBUG [Replicate Request Thread-31]
>>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET 
>>> /nifi-api/site-to-
>>> site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920):*>>> *NIFI04:8443: 
>>> 1053 millis*


>>> *NIFI02:8443: 3 millis*


>>> *NIFI01:8443: 3 millis*


>>> *NIFI03:8443: 2 millis*


>>>  


>>> We tried changing the configurations in nifi.properties like bumping
>>> up the “nifi.cluster.node.protocol.max.threads” but none of them
>>> seems to be working and we’re still stuck with the slow
>>> communication between the nodes. We use an external zookeeper as
>>> this is our production server.>>> Below are some of our configs


>>>  


>>> *# cluster node properties (only configure for cluster nodes) #*


>>> *nifi.cluster.is.node=true*


>>> *nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com*


>>> *nifi.cluster.node.protocol.port=11443*


>>> *nifi.cluster.node.protocol.threads=100*


>>> *nifi.cluster.node.protocol.max.threads=120*


>>> *nifi.cluster.node.event.history.size=25*


>>> *nifi.cluster.node.connection.timeout=90 sec*


>>> *nifi.cluster.node.read.timeout=90 sec*


>>> *nifi.cluster.node.max.concurrent.requests=1000*


>>> *nifi.cluster.firewall.file=*


>>> *nifi.cluster.flow.election.max.wait.time=30 sec*


>>> *nifi.cluster.flow.election.max.candidates=*


>>>  


>>> Any thoughts on why this is happening?


>>>  


>>>  


>>> -Karthik

Re: [EXT] Re: Cluster Warnings

Reply via email to