>> because the tests were run with Prometheus enabled, which is new in 3.6
and has significant negative perf impact.
Interesting, let's see what the numbers are without Prometheus involved. It
could be that the increased latency we observed in CommitProcessor is just
a symptom rather than the
Hi Michael,
Thanks for your additional inputs.
On Mon, May 3, 2021 at 3:13 PM Michael Han wrote:
> Hi Li,
>
> Thanks for following up.
>
> >> write_commitproc_time_ms were large
>
> This measures how long a local write op hears back from the leader. If it's
> big, then either the leader is
Hi Li,
Thanks for following up.
>> write_commitproc_time_ms were large
This measures how long a local write op hears back from the leader. If it's
big, then either the leader is very busy acking the request, or your
network RTT is high.
How does the local fsync time (fsynctime) look like
Hi Srikant,
1. Have you tried to run the test without enabling Prometheus metrics? What
I observed that enabling Prometheus has significant performance impact
(about 40%-60% degradation)
2. In addition to the session expiry errors and max latency increasing
issue, did you see any issue with
Hi Michael,
Thanks for your reply.
1. The workload is 500 concurrent users creating nodes with data size of 4
bytes.
2. It's pure write
3. The perf issue is that under the same load, there were many session
expired and connection loss errors when using ZK 3.6.2 but no such errors
in ZK 3.4.14.
Hi Andor
Thanks for your reply.
We are planning to perform one more round of stress testing and then I
would be able to provide the details logs needed for any troubleshooting.
Other details are provided against each question.
- which version of Zookeeper is being used,
3.6.2 at server side
Hi folks,
As previously mentioned the community won’t be able to help if you don’t share
more information about your scenario. We need to see the following:
- which version of Zookeeper is being used,
- how many nodes are you running in the ZK cluster,
- what is the server configuration? any
Can you explain why this is posted to the Arrow mailing-list? This
does not seem relevant to Arrow. If indeed it isn't, please remove the
Arrow mailing-list from the recipients.
Regards
Antoine.
On Wed, 21 Apr 2021 11:25:20 +0800
shrikant kalani
wrote:
> Hello Everyone,
>
> We are also
Hello Everyone,
We are also using zookeeper 3.6.2 with ssl turned on both sides. We
observed the same behaviour where under high write load the ZK server
starts expiring the session. There are no jvm related issues. During high
load the max latency increases significantly.
Also the session
What is the workload looking like? Is it pure write, or mixed read write?
A couple of ideas to move this forward:
* Publish the performance benchmark so the community can help.
* Bisect git commit and find the bad commit that caused the regression.
* Use the fine grained metrics introduced in 3.6
The CPU usage of both server and client are normal (< 50%) during the test.
Based on the investigation, the server is too busy with the load.
The issue doesn't exist in 3.4.14. I wonder why there is a significant
write performance degradation from 3.4.14 to 3.6.2 and how we can address
the
What is the CPU usage of both server and client during the test?
Looks like server is dropping the clients because either the server or both are
too busy to deal with the load.
This log line is also concerning: "Too busy to snap, skipping”
If that’s the case I believe you'll have to profile the
Thanks, Patrick.
Yes, we are using the same JVM version and GC configurations when
running the two tests. I have checked the GC metrics and also the heap dump
of the 3.6, the GC pause and the memory usage look okay.
Best,
Li
On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt wrote:
> On Sun, Feb
On Sun, Feb 21, 2021 at 3:28 PM Li Wang wrote:
> Hi Enrico, Sushant,
>
> I re-run the perf test with the data consistency check feature disabled
> (i.e. -Dzookeeper.digest.enabled=false), the write performance issue of 3.6
> is still there.
>
> With everything exactly the same, the throughput of
Hi Enrico, Sushant,
I re-run the perf test with the data consistency check feature disabled
(i.e. -Dzookeeper.digest.enabled=false), the write performance issue of 3.6
is still there.
With everything exactly the same, the throughput of 3.6 was only 1/2 of 3.4
and the max latency was more than 8
Thanks Sushant and Enrico!
This is a really good point. According to the 3.6 documentation, the
feature is disabled by default.
https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration.
However, checking the code, the default is enabled.
Let me set the
Hi Li,
On 3.6.2 consistency checker (adhash based) is enabled by default:
https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136.
It is not present in ZK 3.4.14.
This feature does have
Li,
I wonder of we have some new throttling/back pressure mechanisms that is
enabled by default.
Does anyone has some pointer to relevant implementations?
Enrico
Il Ven 19 Feb 2021, 19:46 Li Wang ha scritto:
> Hi,
>
> We switched to Netty on both client side and server side and the
>
Hi,
We switched to Netty on both client side and server side and the
performance issue is still there. Anyone has any insights on what could be
the cause of higher latency?
Thanks,
Li
On Mon, Feb 15, 2021 at 2:17 PM Li Wang wrote:
> Hi Enrico,
>
>
> Thanks for the reply.
>
>
> 1. We are
Hi Enrico,
Thanks for the reply.
1. We are using NIO based stack, not Netty based yet.
2. Yes, here are some metrics on the client side.
3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms, Max Latency 31s
3.4: throughput: 15k, failure: 0, Avg Latency: 30ms, Max Latency: 1.6s
3.
Hi Enrico,
Thanks for the reply.
1. We are using direct NIO based stack, not Netty based yet.
2. Yes, on the client side, here are the metrics
3.6:
On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli
wrote:
> IIRC The main difference is about the switch to Netty 4 and about using
> more
IIRC The main difference is about the switch to Netty 4 and about using
more DirectMemory. Are you using the Netty based stack?
Apart from that macro difference there have been many many changes since
3.4.
Do you have some metrics to share?
Are the JVM configurations and zoo.cfg configuration
Hi,
We want to upgrade from 3.4.14 to 3.6.2. During the perform/load
comparison test, it was found that the performance of 3.6 has been
significantly degraded compared to 3.4 for the write operation. Under the
same load, there was a huge number of SessionExpired and ConnectionLoss
errors in 3.6
23 matches
Mail list logo