Did you ever get to the bottom of this Karthik? I have seen similar issue, and turned out to be related to hardware not keeping up with the increased volume of work (which is why a node was added to begin with)
On Fri, 26 Oct 2018, at 19:18, Karthik Kothareddy (karthikk) [CONT - Type 2] wrote:> Any light on this? We’re still seeing warnings every 10 seconds which > is really annoying and no idea what’s triggering them.> > -Karthik > > *From:* Karthik Kothareddy (karthikk) [CONT - Type 2] *Sent:* Tuesday, > October 16, 2018 8:27 AM *To:* [email protected] *Subject:* RE: > [EXT] Re: Cluster Warnings> > Below is some of the nifi-app.log when DEBUG mode is enabled. > > *2018-10-16 14:21:11,577 DEBUG [Replicate Request Thread-3] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Received response from > NIFI01:8443 for GET /nifi-api/site-to-site*> *2018-10-16 14:21:11,577 DEBUG > [Replicate Request Thread-3] > o.a.n.c.c.h.r.StandardAsyncClusterResponse Received response 4 out of > 4 for c910f24a-96fe-4efc-a8b8-6ef8d2674524 from NIFI01:8443*> *2018-10-16 > 14:21:11,577 DEBUG [Replicate Request Thread-3] > o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged > response is ready for c910f24a-96fe-4efc-a8b8-6ef8d2674524*> *2018-10-16 > 14:21:11,578 DEBUG [Replicate Request Thread-3] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator For GET /nifi-api/site-to- > site (Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524), minimum > response time = 2, max = 224, average = 59.0 ms*> *2018-10-16 14:21:11,578 > DEBUG [Replicate Request Thread-3] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET > /nifi-api/site-to- > site (Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524):*> *NIFI04:8443: 2 > millis* > *NIFI02:8443: 5 millis* > *NIFI01:8443: 224 millis* > *NIFI03:8443: 5 millis* > * * > *2018-10-16 14:21:11,578 DEBUG [Replicate Request Thread-3] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Notified monitor > java.lang.Object@1da27fc4 because request GET > https://NIFI03:8443/nifi-api/site-to-site has completed*> *2018-10-16 > 14:21:11,578 DEBUG [NiFi Web Server-304048] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Unlocked java.util.concurr- > ent.locks.ReentrantReadWriteLock$ReadLock@3d507660[Read locks = 0] > after replication completed for GET > https://NIFI03:8443/nifi-api/site-to-site*> *2018-10-16 14:21:11,578 DEBUG > [NiFi Web Server-304048] > o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged > response is complete for c910f24a-96fe-4efc-a8b8-6ef8d2674524*> *2018-10-16 > 14:21:11,578 DEBUG [NiFi Web Server-304048] > o.a.n.c.c.h.r.StandardAsyncClusterResponse For GET > https://NIFI03:8443/nifi-api/site-to-site Timing Info is as follows:*> > *Verify Cluster State for All Nodes took 0 millis* > *Wait for HTTP Request Replication to be triggered for NIFI04:8443 > took 0 millis*> *Wait for HTTP Request Replication to be triggered for > NIFI02:8443 > took 0 millis*> *Wait for HTTP Request Replication to be triggered for > NIFI01:8443 > took 0 millis*> *Wait for HTTP Request Replication to be triggered for > NIFI03:8443 > took 0 millis*> *Perform HTTP Request for NIFI04:8443 took 2 millis* > *Perform HTTP Request for NIFI03:8443 took 5 millis* > *Perform HTTP Request for NIFI02:8443 took 5 millis* > *Perform HTTP Request for NIFI01:8443 took 224 millis* > *Phase Completed for All Nodes took 224 millis* > *Notifying All Threads that Request is Complete took 0 millis* > *Total Time for All Nodes took 224 millis* > *Map/Merge Responses for All Nodes took 0 millis* > > -Karthik > > *From:* Joe Witt [mailto:[email protected]] *Sent:* Tuesday, October > 16, 2018 8:03 AM *To:* [email protected] *Subject:* Re: [EXT] Re: > Cluster Warnings> > karthik > > understood. do you have those logs? > > > > On Tue, Oct 16, 2018, 9:59 AM Karthik Kothareddy (karthikk) [CONT - > Type 2] <[email protected]> wrote:>> Joe, >> >> The slow node is Node04 in this case but we get one such slow >> response from a random node(Node01, Node02,Node03) every time we see >> this warning.>> >> -Karthik >> >> *From:* Joe Witt [mailto:[email protected]] *Sent:* Tuesday, October >> 16, 2018 7:55 AM *To:* [email protected] *Subject:* [EXT] Re: >> Cluster Warnings>> >> the logs show the fourth node is the slowest by far in all cases. >> possibly a dns or other related issue? but def focus on that node as >> the outlier and presuming nifi config is identical it suggest >> system/network differences from other nodes.>> >> thanks >> >> On Tue, Oct 16, 2018, 9:51 AM Karthik Kothareddy (karthikk) [CONT - >> Type 2] <[email protected]> wrote:>>> >>> Hello, >>> >>> We’re running a 4-node cluster on NiFi 1.7.1. The fourth node was >>> added recently and as soon as we added the 4th node, we started >>> seeing below warnings>>> >>> *Response time from NODE2 was slow for each of the last 3 requests >>> made. To see more information about timing, enable DEBUG logging for >>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolReq- >>> uestReplicator*>>> * * >>> Initially we though the problem was with the recent node added and >>> cross checked all the configs on the box and everything seemed to be >>> just fine. After enabling the DEBUG mode for cluster logging we >>> noticed that the warning is not specific to any node and every-time >>> we see a warning like above there is one slow node which takes >>> forever to send a response like below (in this case the slow node is >>> NIFI04). Sometimes these will lead to node-disconnects needing a >>> manual intervention.>>> >>> *DEBUG [Replicate Request Thread-50] >>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET >>> /nifi-api/site-to- >>> site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5):*>>> *NIFI04:8443: >>> 1386 millis* >>> *NIFI02:8443: 3 millis* >>> *NIFI01:8443: 5 millis* >>> *NIFI03:8443: 3 millis* >>> *DEBUG [Replicate Request Thread-41] >>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET >>> /nifi-api/site-to- >>> site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622):*>>> *NIFI04:8443: >>> 1143 millis* >>> *NIFI02:8443: 22 millis* >>> *NIFI01:8443: 3 millis* >>> *NIFI03:8443: 2 millis* >>> *DEBUG [Replicate Request Thread-31] >>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET >>> /nifi-api/site-to- >>> site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920):*>>> *NIFI04:8443: >>> 1053 millis* >>> *NIFI02:8443: 3 millis* >>> *NIFI01:8443: 3 millis* >>> *NIFI03:8443: 2 millis* >>> >>> We tried changing the configurations in nifi.properties like bumping >>> up the “nifi.cluster.node.protocol.max.threads” but none of them >>> seems to be working and we’re still stuck with the slow >>> communication between the nodes. We use an external zookeeper as >>> this is our production server.>>> Below are some of our configs >>> >>> *# cluster node properties (only configure for cluster nodes) #* >>> *nifi.cluster.is.node=true* >>> *nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com* >>> *nifi.cluster.node.protocol.port=11443* >>> *nifi.cluster.node.protocol.threads=100* >>> *nifi.cluster.node.protocol.max.threads=120* >>> *nifi.cluster.node.event.history.size=25* >>> *nifi.cluster.node.connection.timeout=90 sec* >>> *nifi.cluster.node.read.timeout=90 sec* >>> *nifi.cluster.node.max.concurrent.requests=1000* >>> *nifi.cluster.firewall.file=* >>> *nifi.cluster.flow.election.max.wait.time=30 sec* >>> *nifi.cluster.flow.election.max.candidates=* >>> >>> Any thoughts on why this is happening? >>> >>> >>> -Karthik
