karthik understood. do you have those logs?
On Tue, Oct 16, 2018, 9:59 AM Karthik Kothareddy (karthikk) [CONT - Type 2] <[email protected]> wrote: > Joe, > > > > The slow node is Node04 in this case but we get one such slow response > from a random node(Node01, Node02,Node03) every time we see this warning. > > > > -Karthik > > > > *From:* Joe Witt [mailto:[email protected]] > *Sent:* Tuesday, October 16, 2018 7:55 AM > *To:* [email protected] > *Subject:* [EXT] Re: Cluster Warnings > > > > the logs show the fourth node is the slowest by far in all cases. > possibly a dns or other related issue? but def focus on that node as the > outlier and presuming nifi config is identical it suggest system/network > differences from other nodes. > > > > thanks > > > > On Tue, Oct 16, 2018, 9:51 AM Karthik Kothareddy (karthikk) [CONT - Type > 2] <[email protected]> wrote: > > > > Hello, > > > > We’re running a 4-node cluster on NiFi 1.7.1. The fourth node was added > recently and as soon as we added the 4th node, we started seeing below > warnings > > > > *Response time from NODE2 was slow for each of the last 3 requests made. > To see more information about timing, enable DEBUG logging for > org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator* > > > > Initially we though the problem was with the recent node added and cross > checked all the configs on the box and everything seemed to be just fine. > After enabling the DEBUG mode for cluster logging we noticed that the > warning is not specific to any node and every-time we see a warning like > above there is one slow node which takes forever to send a response like > below (in this case the slow node is NIFI04). Sometimes these will lead to > node-disconnects needing a manual intervention. > > > > *DEBUG [Replicate Request Thread-50] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET > /nifi-api/site-to-site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5):* > > *NIFI04:8443: 1386 millis* > > *NIFI02:8443: 3 millis* > > *NIFI01:8443: 5 millis* > > *NIFI03:8443: 3 millis* > > *DEBUG [Replicate Request Thread-41] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET > /nifi-api/site-to-site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622):* > > *NIFI04:8443: 1143 millis* > > *NIFI02:8443: 22 millis* > > *NIFI01:8443: 3 millis* > > *NIFI03:8443: 2 millis* > > *DEBUG [Replicate Request Thread-31] > o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET > /nifi-api/site-to-site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920):* > > *NIFI04:8443: 1053 millis* > > *NIFI02:8443: 3 millis* > > *NIFI01:8443: 3 millis* > > *NIFI03:8443: 2 millis* > > > > We tried changing the configurations in nifi.properties like bumping up > the “nifi.cluster.node.protocol.max.threads” but none of them seems to be > working and we’re still stuck with the slow communication between the > nodes. We use an external zookeeper as this is our production server. > > Below are some of our configs > > > > *# cluster node properties (only configure for cluster nodes) #* > > *nifi.cluster.is.node=true* > > *nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com > <http://fslhdppnifi01.imfs.micron.com>* > > *nifi.cluster.node.protocol.port=11443* > > *nifi.cluster.node.protocol.threads=100* > > *nifi.cluster.node.protocol.max.threads=120* > > *nifi.cluster.node.event.history.size=25* > > *nifi.cluster.node.connection.timeout=90 sec* > > *nifi.cluster.node.read.timeout=90 sec* > > *nifi.cluster.node.max.concurrent.requests=1000* > > *nifi.cluster.firewall.file=* > > *nifi.cluster.flow.election.max.wait.time=30 sec* > > *nifi.cluster.flow.election.max.candidates=* > > > > Any thoughts on why this is happening? > > > > > > -Karthik > >
