RE: [EXT] Re: Cluster Warnings

Karthik Kothareddy (karthikk) [CONT - Type 2] Tue, 16 Oct 2018 07:27:01 -0700

Below is some of the nifi-app.log when DEBUG mode is enabled.

2018-10-16 14:21:11,577 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.ThreadPoolRequestReplicator Received response from NIFI01:8443 
for GET /nifi-api/site-to-site
2018-10-16 14:21:11,577 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.StandardAsyncClusterResponse Received response 4 out of 4 for 
c910f24a-96fe-4efc-a8b8-6ef8d2674524 from NIFI01:8443
2018-10-16 14:21:11,577 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged response 
is ready for c910f24a-96fe-4efc-a8b8-6ef8d2674524
2018-10-16 14:21:11,578 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.ThreadPoolRequestReplicator For GET /nifi-api/site-to-site 
(Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524), minimum response time = 2, 
max = 224, average = 59.0 ms
2018-10-16 14:21:11,578 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET 
/nifi-api/site-to-site (Request ID c910f24a-96fe-4efc-a8b8-6ef8d2674524):
NIFI04:8443: 2 millis
NIFI02:8443: 5 millis
NIFI01:8443: 224 millis
NIFI03:8443: 5 millis


2018-10-16 14:21:11,578 DEBUG [Replicate Request Thread-3] 
o.a.n.c.c.h.r.ThreadPoolRequestReplicator Notified monitor 
java.lang.Object@1da27fc4 because request GET 
https://NIFI03:8443/nifi-api/site-to-site has completed
2018-10-16 14:21:11,578 DEBUG [NiFi Web Server-304048] 
o.a.n.c.c.h.r.ThreadPoolRequestReplicator Unlocked 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock@3d507660[Read locks 
= 0] after replication completed for GET 
https://NIFI03:8443/nifi-api/site-to-site
2018-10-16 14:21:11,578 DEBUG [NiFi Web Server-304048] 
o.a.n.c.c.h.r.StandardAsyncClusterResponse Notifying all that merged response 
is complete for c910f24a-96fe-4efc-a8b8-6ef8d2674524
2018-10-16 14:21:11,578 DEBUG [NiFi Web Server-304048] 
o.a.n.c.c.h.r.StandardAsyncClusterResponse For GET 
https://NIFI03:8443/nifi-api/site-to-site Timing Info is as follows:
Verify Cluster State for All Nodes took 0 millis
Wait for HTTP Request Replication to be triggered for NIFI04:8443 took 0 millis
Wait for HTTP Request Replication to be triggered for NIFI02:8443 took 0 millis
Wait for HTTP Request Replication to be triggered for NIFI01:8443 took 0 millis
Wait for HTTP Request Replication to be triggered for NIFI03:8443 took 0 millis
Perform HTTP Request for NIFI04:8443 took 2 millis
Perform HTTP Request for NIFI03:8443 took 5 millis
Perform HTTP Request for NIFI02:8443 took 5 millis
Perform HTTP Request for NIFI01:8443 took 224 millis
Phase Completed for All Nodes took 224 millis
Notifying All Threads that Request is Complete took 0 millis
Total Time for All Nodes took 224 millis
Map/Merge Responses for All Nodes took 0 millis

-Karthik

From: Joe Witt [mailto:[email protected]]
Sent: Tuesday, October 16, 2018 8:03 AM
To: [email protected]
Subject: Re: [EXT] Re: Cluster Warnings

karthik

understood.  do you have those logs?



On Tue, Oct 16, 2018, 9:59 AM Karthik Kothareddy (karthikk) [CONT - Type 2] 
<[email protected]<mailto:[email protected]>> wrote:
Joe,

The slow node is Node04 in this case but we get one such slow response from a 
random node(Node01, Node02,Node03) every time we see this warning.

-Karthik

From: Joe Witt [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, October 16, 2018 7:55 AM
To: [email protected]<mailto:[email protected]>
Subject: [EXT] Re: Cluster Warnings

the logs show the fourth node is the slowest by far in all cases.  possibly a 
dns or other related issue?  but def focus on that node as the outlier and 
presuming nifi config is identical it suggest system/network differences from 
other nodes.

thanks

On Tue, Oct 16, 2018, 9:51 AM Karthik Kothareddy (karthikk) [CONT - Type 2] 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

We’re running a 4-node cluster on NiFi 1.7.1. The fourth node was added 
recently and as soon as we added the 4th node, we started seeing below warnings

Response time from NODE2 was slow for each of the last 3 requests made. To see 
more information about timing, enable DEBUG logging for 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator

Initially we though the problem was with the recent node added and cross 
checked all the configs on the box and everything seemed to be just fine. After 
enabling the DEBUG mode for cluster logging we noticed that the warning is not 
specific to any node and every-time we see a warning like above there is one 
slow node which takes forever to send a response like below (in this case the 
slow node is NIFI04). Sometimes these will lead to node-disconnects needing a 
manual intervention.

DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
b2c6e983-5233-4007-bd54-13d21b7068d5):
NIFI04:8443: 1386 millis
NIFI02:8443: 3 millis
NIFI01:8443: 5 millis
NIFI03:8443: 3 millis
DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
NIFI04:8443: 1143 millis
NIFI02:8443: 22 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis
DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
e4726027-27c7-4bbb-8ab6-d02bb41f1920):
NIFI04:8443: 1053 millis
NIFI02:8443: 3 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis

We tried changing the configurations in nifi.properties like bumping up the 
“nifi.cluster.node.protocol.max.threads” but none of them seems to be working 
and we’re still stuck with the slow communication between the nodes. We use an 
external zookeeper as this is our production server.
Below are some of our configs

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com<http://fslhdppnifi01.imfs.micron.com>
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=100
nifi.cluster.node.protocol.max.threads=120
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=90 sec
nifi.cluster.node.read.timeout=90 sec
nifi.cluster.node.max.concurrent.requests=1000
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=

Any thoughts on why this is happening?


-Karthik

RE: [EXT] Re: Cluster Warnings

Reply via email to