[jira] [Updated] (ZOOKEEPER-4768) Flaky test org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo

2023-11-02 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4768:
-
Labels: pull-requests-available  (was: pull-request-available)

> Flaky test 
> org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo
> ---
>
> Key: ZOOKEEPER-4768
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4768
> Project: ZooKeeper
>  Issue Type: Test
>  Components: metric system
>Affects Versions: 3.10.0
>Reporter: Yike Xiao
>Priority: Major
>  Labels: pull-requests-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the {{io.prometheus.client.hotspot.DefaultExports#initialize}} method has 
> been executed by other test cases before running the 
> {{org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo}} test 
> case, then this test case will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4768) Flaky test org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo

2023-11-02 Thread Yike Xiao (Jira)
Yike Xiao created ZOOKEEPER-4768:


 Summary: Flaky test 
org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo
 Key: ZOOKEEPER-4768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4768
 Project: ZooKeeper
  Issue Type: Test
  Components: metric system
Affects Versions: 3.10.0
Reporter: Yike Xiao


If the {{io.prometheus.client.hotspot.DefaultExports#initialize}} method has 
been executed by other test cases before running the 
{{org.apache.zookeeper.metrics.prometheus.ExportJvmInfoTest#exportInfo}} test 
case, then this test case will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4767) New implementation of prometheus quantile metrics based on DataSketches

2023-11-01 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4767:
-
Labels: pull-request-available  (was: )

> New implementation of prometheus quantile metrics based on DataSketches
> ---
>
> Key: ZOOKEEPER-4767
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4767
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Reporter: Yike Xiao
>Priority: Major
>  Labels: pull-request-available
>
> If the built-in Prometheus metrics feature introduced after version 3.6 is 
> enabled, under high-load scenarios (such as when there are a large number of 
> read requests), the percentile metrics (Summary) used to collect request 
> latencies can easily become a bottleneck and impact the service itself. This 
> is because the internal implementation of Summary involves the overhead of 
> lock operations. In scenarios with a large number of requests, lock 
> contention can lead to a dramatic deterioration in request latency. The 
> details of this issue and related profiling can be viewed in ZOOKEEPER-4741.
> In ZOOKEEPER-4289, the updates to Summary were switched to be executed in a 
> separate thread pool. While this approach avoids the overhead of lock 
> contention caused by multiple threads updating Summary simultaneously, it 
> introduces the operational overhead of the thread pool queue and additional 
> garbage collection (GC) overhead. Especially when the thread pool queue is 
> full, a large number of RejectedExecutionException instances will be thrown, 
> further increasing the pressure on GC.
> To address problems above, I have implemented an almost lock-free solution 
> based on DataSketches. Benchmark results show that it offers over a 10x speed 
> improvement compared to version 3.9.1 and avoids frequent GC caused by 
> creating a large number of temporary objects. The trade-off is that the 
> latency percentiles will be displayed with a relative delay (default is 60 
> seconds), and each Summary metric will have a certain amount of permanent 
> memory overhead.
> This solution refers to Matteo Merli's optimization work on the percentile 
> latency metrics for Bookkeeper, as detailed in 
> https://github.com/apache/bookkeeper/commit/3bff19956e70e37c025a8e29aa8428937af77aa1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4767) New implementation of prometheus quantile metrics based on DataSketches

2023-11-01 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4767:
-
Summary: New implementation of prometheus quantile metrics based on 
DataSketches  (was: New implementation of prometheus qunatile metrics based on 
DataSketches)

> New implementation of prometheus quantile metrics based on DataSketches
> ---
>
> Key: ZOOKEEPER-4767
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4767
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Reporter: Yike Xiao
>Priority: Major
>
> If the built-in Prometheus metrics feature introduced after version 3.6 is 
> enabled, under high-load scenarios (such as when there are a large number of 
> read requests), the percentile metrics (Summary) used to collect request 
> latencies can easily become a bottleneck and impact the service itself. This 
> is because the internal implementation of Summary involves the overhead of 
> lock operations. In scenarios with a large number of requests, lock 
> contention can lead to a dramatic deterioration in request latency. The 
> details of this issue and related profiling can be viewed in ZOOKEEPER-4741.
> In ZOOKEEPER-4289, the updates to Summary were switched to be executed in a 
> separate thread pool. While this approach avoids the overhead of lock 
> contention caused by multiple threads updating Summary simultaneously, it 
> introduces the operational overhead of the thread pool queue and additional 
> garbage collection (GC) overhead. Especially when the thread pool queue is 
> full, a large number of RejectedExecutionException instances will be thrown, 
> further increasing the pressure on GC.
> To address problems above, I have implemented an almost lock-free solution 
> based on DataSketches. Benchmark results show that it offers over a 10x speed 
> improvement compared to version 3.9.1 and avoids frequent GC caused by 
> creating a large number of temporary objects. The trade-off is that the 
> latency percentiles will be displayed with a relative delay (default is 60 
> seconds), and each Summary metric will have a certain amount of permanent 
> memory overhead.
> This solution refers to Matteo Merli's optimization work on the percentile 
> latency metrics for Bookkeeper, as detailed in 
> https://github.com/apache/bookkeeper/commit/3bff19956e70e37c025a8e29aa8428937af77aa1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4767) New implementation of prometheus qunatile metrics based on DataSketches

2023-11-01 Thread Yike Xiao (Jira)
Yike Xiao created ZOOKEEPER-4767:


 Summary: New implementation of prometheus qunatile metrics based 
on DataSketches
 Key: ZOOKEEPER-4767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4767
 Project: ZooKeeper
  Issue Type: Improvement
  Components: metric system
Reporter: Yike Xiao


If the built-in Prometheus metrics feature introduced after version 3.6 is 
enabled, under high-load scenarios (such as when there are a large number of 
read requests), the percentile metrics (Summary) used to collect request 
latencies can easily become a bottleneck and impact the service itself. This is 
because the internal implementation of Summary involves the overhead of lock 
operations. In scenarios with a large number of requests, lock contention can 
lead to a dramatic deterioration in request latency. The details of this issue 
and related profiling can be viewed in ZOOKEEPER-4741.

In ZOOKEEPER-4289, the updates to Summary were switched to be executed in a 
separate thread pool. While this approach avoids the overhead of lock 
contention caused by multiple threads updating Summary simultaneously, it 
introduces the operational overhead of the thread pool queue and additional 
garbage collection (GC) overhead. Especially when the thread pool queue is 
full, a large number of RejectedExecutionException instances will be thrown, 
further increasing the pressure on GC.

To address problems above, I have implemented an almost lock-free solution 
based on DataSketches. Benchmark results show that it offers over a 10x speed 
improvement compared to version 3.9.1 and avoids frequent GC caused by creating 
a large number of temporary objects. The trade-off is that the latency 
percentiles will be displayed with a relative delay (default is 60 seconds), 
and each Summary metric will have a certain amount of permanent memory overhead.

This solution refers to Matteo Merli's optimization work on the percentile 
latency metrics for Bookkeeper, as detailed in 
https://github.com/apache/bookkeeper/commit/3bff19956e70e37c025a8e29aa8428937af77aa1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-11-01 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4741:
-
Affects Version/s: 3.6.2
   3.6.1
   3.6.3
   3.6.0
   (was: 3.8.2)

> High latency under heavy load when prometheus metrics enabled
> -
>
> Key: ZOOKEEPER-4741
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.6.3, 3.6.1, 3.6.2, 3.6.4
> Environment: zookeeper version: 3.6.4
> kernel: 3.10.0-1160.95.1.el7.x86_64
> java version "1.8.0_111"
>  
> metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
>Reporter: Yike Xiao
>Priority: Major
> Attachments: 32010.lock.profile.html, 32010.threaddump.001.txt, 
> 32010.wallclock.profile.html, image-2023-09-11-16-17-21-166.png
>
>
> In our production, we use zookeeper built-in PrometheusMetricsProvider to 
> monitor zookeeper status, recently we observed very high latency in one of 
> our zookeeper cluster which serve heavy load.
> Measured in a heavy client side, the latency could more than 25 seconds. 
> {noformat}
> [arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
> Press Q or Ctrl+C to abort.
> Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
> recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
>     @ReplyHeader[585397,146177807315,0
> ],
> ]
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
> recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
>     @ReplyHeader[585588,146177807862,0
> ]{noformat}
>  
> Observed many connections with *high Recv-Q* on the server side. 
> CommitProcWorkThread *BLOCKED* in 
> {{org.apache.zookeeper.server.ServerStats#updateLatency:}}
> {noformat}
> "CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 
> tid=0x7f86d804a000 nid=0x6bca waiting for monitor entry 
> [0x7f86deb95000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
> - waiting to lock <0x000784dd1a18> (a 
> io.prometheus.client.CKMSQuantiles)
> at 
> io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
> at io.prometheus.client.Summary$Child.observe(Summary.java:281)
> at io.prometheus.client.Summary.observe(Summary.java:307)
> at 
> org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
> at 
> org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
> at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x000734923e80> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
>  
> The wall clock profile shows that there is lock contention within 
> `CommitProcWorkThread` threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763621#comment-17763621
 ] 

Yike Xiao commented on ZOOKEEPER-4741:
--

zookeeper 3.6.4 include io.prometheus:simpleclient:0.6.0, this patch is include 
in 0.7.0

[https://github.com/prometheus/client_java/commit/62001352f37c22e01ff5c806148abca44b41ba81]

 

zookeeper 3.8.2 include io.prometheus:simpleclient:0.9.0, I'll update here 
after upgrade simpleclient component

> High latency under heavy load when prometheus metrics enabled
> -
>
> Key: ZOOKEEPER-4741
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.4, 3.8.2
> Environment: zookeeper version: 3.6.4
> kernel: 3.10.0-1160.95.1.el7.x86_64
> java version "1.8.0_111"
>  
> metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
>Reporter: Yike Xiao
>Priority: Major
> Attachments: 32010.lock.profile.html, 32010.threaddump.001.txt, 
> 32010.wallclock.profile.html, image-2023-09-11-16-17-21-166.png
>
>
> In our production, we use zookeeper built-in PrometheusMetricsProvider to 
> monitor zookeeper status, recently we observed very high latency in one of 
> our zookeeper cluster which serve heavy load.
> Measured in a heavy client side, the latency could more than 25 seconds. 
> {noformat}
> [arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
> Press Q or Ctrl+C to abort.
> Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
> recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
>     @ReplyHeader[585397,146177807315,0
> ],
> ]
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
> recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
>     @ReplyHeader[585588,146177807862,0
> ]{noformat}
>  
> Observed many connections with *high Recv-Q* on the server side. 
> CommitProcWorkThread *BLOCKED* in 
> {{org.apache.zookeeper.server.ServerStats#updateLatency:}}
> {noformat}
> "CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 
> tid=0x7f86d804a000 nid=0x6bca waiting for monitor entry 
> [0x7f86deb95000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
> - waiting to lock <0x000784dd1a18> (a 
> io.prometheus.client.CKMSQuantiles)
> at 
> io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
> at io.prometheus.client.Summary$Child.observe(Summary.java:281)
> at io.prometheus.client.Summary.observe(Summary.java:307)
> at 
> org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
> at 
> org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
> at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x000734923e80> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
>  
> The wall clock profile shows that there is lock contention within 
> `CommitProcWorkThread` threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763611#comment-17763611
 ] 

Yike Xiao commented on ZOOKEEPER-4741:
--

Workaround:

After disable PrometheusMetricsProvider, problem disappeared.

> High latency under heavy load when prometheus metrics enabled
> -
>
> Key: ZOOKEEPER-4741
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.4, 3.8.2
> Environment: zookeeper version: 3.6.4
> kernel: 3.10.0-1160.95.1.el7.x86_64
> java version "1.8.0_111"
>  
> metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
>Reporter: Yike Xiao
>Priority: Major
> Attachments: 32010.lock.profile.html, 32010.threaddump.001.txt, 
> 32010.wallclock.profile.html, image-2023-09-11-16-17-21-166.png
>
>
> In our production, we use zookeeper built-in PrometheusMetricsProvider to 
> monitor zookeeper status, recently we observed very high latency in one of 
> our zookeeper cluster which serve heavy load.
> Measured in a heavy client side, the latency could more than 25 seconds. 
> {noformat}
> [arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
> Press Q or Ctrl+C to abort.
> Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
> recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
>     @ReplyHeader[585397,146177807315,0
> ],
> ]
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
> recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
>     @ReplyHeader[585588,146177807862,0
> ]{noformat}
>  
> Observed many connections with *high Recv-Q* on the server side. 
> CommitProcWorkThread *BLOCKED* in 
> {{org.apache.zookeeper.server.ServerStats#updateLatency:}}
> {noformat}
> "CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 
> tid=0x7f86d804a000 nid=0x6bca waiting for monitor entry 
> [0x7f86deb95000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
> - waiting to lock <0x000784dd1a18> (a 
> io.prometheus.client.CKMSQuantiles)
> at 
> io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
> at io.prometheus.client.Summary$Child.observe(Summary.java:281)
> at io.prometheus.client.Summary.observe(Summary.java:307)
> at 
> org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
> at 
> org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
> at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x000734923e80> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
>  
> The wall clock profile shows that there is lock contention within 
> `CommitProcWorkThread` threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4741:
-
Description: 
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 
{noformat}
[arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
    @ReplyHeader[585397,146177807315,0
],
]
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
    @ReplyHeader[585588,146177807862,0
]{noformat}
 

Observed many connections with *high Recv-Q* on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{org.apache.zookeeper.server.ServerStats#updateLatency:}}
{noformat}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
- waiting to lock <0x000784dd1a18> (a 
io.prometheus.client.CKMSQuantiles)
at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)
at io.prometheus.client.Summary.observe(Summary.java:307)
at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 
org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
at 
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x000734923e80> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
 

The wall clock profile shows that there is lock contention within 
`CommitProcWorkThread` threads.

  was:
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 
{noformat}
[arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
    @ReplyHeader[585397,146177807315,0
],
]
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
    @ReplyHeader[585588,146177807862,0
]{noformat}
 

Observed many connections with *high Recv-Q* on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}}}{}}}

 
{noformat}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]
   java.lang.Thread.State: BLOCKED (on object monitor)
   

[jira] [Updated] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4741:
-
Attachment: 32010.lock.profile.html

> High latency under heavy load when prometheus metrics enabled
> -
>
> Key: ZOOKEEPER-4741
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.4, 3.8.2
> Environment: zookeeper version: 3.6.4
> kernel: 3.10.0-1160.95.1.el7.x86_64
> java version "1.8.0_111"
>  
> metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
>Reporter: Yike Xiao
>Priority: Major
> Attachments: 32010.lock.profile.html, 32010.threaddump.001.txt, 
> 32010.wallclock.profile.html, image-2023-09-11-16-17-21-166.png
>
>
> In our production, we use zookeeper built-in PrometheusMetricsProvider to 
> monitor zookeeper status, recently we observed very high latency in one of 
> our zookeeper cluster which serve heavy load.
> Measured in a heavy client side, the latency could more than 25 seconds. 
> {noformat}
> [arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
> Press Q or Ctrl+C to abort.
> Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
> recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
>     @ReplyHeader[585397,146177807315,0
> ],
> ]
> method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
> ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
>     @Object[][isEmpty=false;size=4],
>     @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
> remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
> recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
>     @ReplyHeader[585588,146177807862,0
> ]{noformat}
>  
> Observed many connections with *high Recv-Q* on the server side. 
> CommitProcWorkThread *BLOCKED* in 
> {{org.apache.zookeeper.server.ServerStats#updateLatency:}}
> {noformat}
> "CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 
> tid=0x7f86d804a000 nid=0x6bca waiting for monitor entry 
> [0x7f86deb95000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
> - waiting to lock <0x000784dd1a18> (a 
> io.prometheus.client.CKMSQuantiles)
> at 
> io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
> at io.prometheus.client.Summary$Child.observe(Summary.java:281)
> at io.prometheus.client.Summary.observe(Summary.java:307)
> at 
> org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
> at 
> org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
> at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x000734923e80> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
>  
> The wall clock profile shows that there is lock contention within 
> `CommitProcWorkThread` threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4741:
-
Description: 
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 
{noformat}
[arthas@8]$ watch org.apache.zookeeper.ClientCnxn submitRequest
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 294 ms, listenerId: 1
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:32; [cost=28180.767833ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807315 xid:589796 sent:589796 
recv:973892 queuedpkts:0 pendingresp:4398 queuedevents:0],
    @ReplyHeader[585397,146177807315,0
],
]
method=org.apache.zookeeper.ClientCnxn.submitRequest location=AtExit
ts=2023-09-06 16:01:34; [cost=29052.726493ms] result=@ArrayList[
    @Object[][isEmpty=false;size=4],
    @ClientCnxn[sessionid:0x221f087c3d7eb6c local:/:40942 
remoteserver:/:2181 lastZxid:146177807862 xid:589989 sent:589989 
recv:974165 queuedpkts:0 pendingresp:4400 queuedevents:0],
    @ReplyHeader[585588,146177807862,0
]{noformat}
 

Observed many connections with *high Recv-Q* on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}}}{}}}

 
{noformat}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
- waiting to lock <0x000784dd1a18> (a 
io.prometheus.client.CKMSQuantiles)
at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)
at io.prometheus.client.Summary.observe(Summary.java:307)
at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 
org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
at 
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x000734923e80> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
 

!https://gitlab.dev.zhaopin.com/sucheng.wang/notes/uploads/b9da2552d6b00c3f9130d87caf01325e/image.png!

{{}}

  was:
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 

 

!image-2023-09-11-16-17-21-166.png! Observed many connections with high Recv-Q 
on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}}}{}}}

 
{noformat}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
- waiting to lock <0x000784dd1a18> (a 
io.prometheus.client.CKMSQuantiles)
at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)
at io.prometheus.client.Summary.observe(Summary.java:307)
at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 

[jira] [Updated] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yike Xiao updated ZOOKEEPER-4741:
-
Description: 
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 

 

!image-2023-09-11-16-17-21-166.png! Observed many connections with high Recv-Q 
on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}}}{}}}

 
{noformat}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)
- waiting to lock <0x000784dd1a18> (a 
io.prometheus.client.CKMSQuantiles)
at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)
at io.prometheus.client.Summary.observe(Summary.java:307)
at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153)
at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 
org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
at 
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x000734923e80> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){noformat}
 

!https://gitlab.dev.zhaopin.com/sucheng.wang/notes/uploads/b9da2552d6b00c3f9130d87caf01325e/image.png!

{{}}

  was:
In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 

!image-2023-09-11-16-17-21-166.png! Observed many connections with high Recv-Q 
on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}{{{}{}}}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]   
java.lang.Thread.State: BLOCKED (on object monitor)at 
io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)- 
waiting to lock <0x000784dd1a18> (a io.prometheus.client.CKMSQuantiles) 
   at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)at 
io.prometheus.client.Summary.observe(Summary.java:307)at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153) 
   at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 
org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
at 
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
The wall lock profile shows that there is lock contention within the 
{{CommitProcWorkThread}} threads.

!https://gitlab.dev.zhaopin.com/sucheng.wang/notes/uploads/b9da2552d6b00c3f9130d87caf01325e/image.png!

{{}}


> High latency under heavy load when prometheus metrics enabled
> -
>
> Key: ZOOKEEPER-4741
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.4, 3.8.2
> 

[jira] [Created] (ZOOKEEPER-4741) High latency under heavy load when prometheus metrics enabled

2023-09-11 Thread Yike Xiao (Jira)
Yike Xiao created ZOOKEEPER-4741:


 Summary: High latency under heavy load when prometheus metrics 
enabled
 Key: ZOOKEEPER-4741
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4741
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.8.2, 3.6.4
 Environment: zookeeper version: 3.6.4

kernel: 3.10.0-1160.95.1.el7.x86_64

java version "1.8.0_111"

 

metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
Reporter: Yike Xiao
 Attachments: 32010.threaddump.001.txt, 32010.wallclock.profile.html, 
image-2023-09-11-16-17-21-166.png

In our production, we use zookeeper built-in PrometheusMetricsProvider to 
monitor zookeeper status, recently we observed very high latency in one of our 
zookeeper cluster which serve heavy load.

Measured in a heavy client side, the latency could more than 25 seconds. 

!image-2023-09-11-16-17-21-166.png! Observed many connections with high Recv-Q 
on the server side. 

CommitProcWorkThread *BLOCKED* in 
{{{}org.apache.zookeeper.server.ServerStats#updateLatency:{}}}{{{}{}}}
"CommitProcWorkThread-15" #21595 daemon prio=5 os_prio=0 tid=0x7f86d804a000 
nid=0x6bca waiting for monitor entry [0x7f86deb95000]   
java.lang.Thread.State: BLOCKED (on object monitor)at 
io.prometheus.client.CKMSQuantiles.insert(CKMSQuantiles.java:91)- 
waiting to lock <0x000784dd1a18> (a io.prometheus.client.CKMSQuantiles) 
   at 
io.prometheus.client.TimeWindowQuantiles.insert(TimeWindowQuantiles.java:38)
at io.prometheus.client.Summary$Child.observe(Summary.java:281)at 
io.prometheus.client.Summary.observe(Summary.java:307)at 
org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider$PrometheusSummary.add(PrometheusMetricsProvider.java:355)
at 
org.apache.zookeeper.server.ServerStats.updateLatency(ServerStats.java:153) 
   at 
org.apache.zookeeper.server.FinalRequestProcessor.updateStats(FinalRequestProcessor.java:669)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:585)
at 
org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:545)
at 
org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
The wall lock profile shows that there is lock contention within the 
{{CommitProcWorkThread}} threads.

!https://gitlab.dev.zhaopin.com/sucheng.wang/notes/uploads/b9da2552d6b00c3f9130d87caf01325e/image.png!

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)