Hi: I have a 5-node ZooKeeper cluster deployed, and we monitor the cluster by turning on the Prometheus Adapter in ZooKeper. From the dashboard, I am seeing that after the ZK cluster gets restarted, the "zookeeper_MaxRequestLatency" keeps climbing up from 0ms to 2.7 seconds to 10 seconds to 15 seconds in about 40 minutes, and afterwards settled at 15 seconds for the last 12 hours with a straight line (that is, no value changed anymore).
All of the replicas share the similar behavior, but with small difference in terms of the max request latency value. So my questions are: (1) what is the meaning of "max request latency"? It seems that is the maximum latency experienced after the start of the process, and it is NOT calculated, say, in a 5-minute window? (2) what should be the right max request latency? I am concerned that the 15 seconds that I have is too high. Regards, Jun
