[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985697#comment-14985697 ] Jay Kreps commented on KAFKA-2528: -- [~lindong] Great! > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluationRelease.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984276#comment-14984276 ] Dong Lin commented on KAFKA-2528: - [~jkreps] Sorry for late reply. I just test the quota using latest trunk. Please find the results below. Configuration: The test is run with one broker, one producer performance configured with topic=test record-size=1 --throughput=10, and one console consumer which reads from topic “test” at maximum possible throughput. Consumer always runs after producer stops. Bytes-in and bytes-out rates are collected using one minute average after the values stabilize. 1) Unlimited quota. Broker’s bytes-in and bytes-out rates are 85 MBps and 250 MBps. 2) 1 MBps quota for both producer and consumer. Broker’s bytes-in and bytes-out rates are 0.95 MBps and 0.98 MBps. 3) 10 MBps quota for both producer and consumer. Broker’s bytes-in and bytes-out rates are 9.8 MBps and 9.9 MBps. 4) 50 MBps quota for both producer and consumer. Broker’s bytes-in and bytes-out rates are 49 MBps and 49 MBps. It appears that quota from latest trunk is working correctly now. I didn't try to reproduce the problem in the original report, where the broker may have 2 MBps bytes-in rate in inGraph even when configured with 1 MBps produce quota. The difference in result may possibly due to change made in Rate.java in https://github.com/apache/kafka/pull/323. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739886#comment-14739886 ] Dong Lin commented on KAFKA-2528: - Yeah I think that is a reasonable expectation. We have done something similar to what you have described -- in experiment 2, when 4 producers produce to a cluster of 4 brokers configured with 10 MBps quota per clientId, the broker with most traffic does have ~10 MBps total throughput. And when quota is 50 MBps the highest throughput is ~50 MBps as well. I didn't record the precision here, but I am pretty sure the deviation is < 0.5 MBps. The error is <5% for 10 MBps total traffic and < 1% for 50 MBps total traffic. I think we can say that quota enforcement is accurate when quota >= 10 MBps. However, it does appear problematic that the measured broker throughput can be 2 MBps when quota is only 1 MBps. I don't have definitive explanation yet. I will try replicate this experiment and let you know. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739660#comment-14739660 ] Jay Kreps commented on KAFKA-2528: -- Yeah, sorry separate thought. What I'm trying to figure out is how closely can we At first I was pointing out that the measurement didn't seem to match the quota, and what I understood you to say was that we didn't really know what that measurement meant in comparison to the quota metric and maybe it meant something different so maybe the fact they don't match isn't a bug. That's fine but then that doesn't really confirm the accuracy of the quota enforcement either, right? So I was saying that my expectation would be if I used the perf tests which do periodic throughput reporting that throughput would match the quota I set (maybe off by a few percent but not like 10%). If not I would think it was a bug. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739917#comment-14739917 ] Jay Kreps commented on KAFKA-2528: -- Yeah that makes sense. So if that were the case the artifact would go away when measured over a longer timeframe, right? > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739899#comment-14739899 ] Aditya Auradkar commented on KAFKA-2528: One possible explanation for the difference is that we append to the log when the produce request is received. For example, in your experiment you have 12 mirror makers each sending a batch of data. When a batch is recorded the clients get throttled until the quota is within the limit. After receiving a response, each of them immediately sends a large batch to the brokers. Because the quota is so low and the request size can be much larger, there is a small absolute difference in this example which corresponds to the maximum size of the received request. I think if measured over a period of time from the client perspective, the actual throughput will be very similar to the 1MB quota. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739420#comment-14739420 ] Jay Kreps commented on KAFKA-2528: -- This is a great evaluation. It looks from the results like there must be some bug in the quota backoff calculation, though, right? E.g. the 1MB quota was actually 2MB which is 2x the desired rate. Do we know why? The test validates that it is directionally correct, but it should be possible to make it exact, right? > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739493#comment-14739493 ] Jay Kreps commented on KAFKA-2528: -- I mention it because I have had to implement that kind of throttling logic before and getting the arithmetic right is pretty tricky so it's really easy to have bugs that skew things a bit. It should be possible to get very close to the target (say ~1%) when the logic is right and you have heavily load through right? > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739438#comment-14739438 ] Aditya Auradkar commented on KAFKA-2528: I'm not quite sure why the actual rate is higher in this particular case. It seems to be a lot closer in the other tests Dong has posted. The difference is likely because of some measurement issue.. perhaps a test issue. It should be straightforward to reproduce this. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739488#comment-14739488 ] Dong Lin commented on KAFKA-2528: - I think this difference may be explained by the fact that ClientQuotaManger and BrokerTopicMetrics are using two difference classes, thus difference configuration and algorithm, to measure bytes-in-rate. It will be definitely a bug if the value of metric used by ClientQuotaManager ever exceeds the quota limit. However, here the throughput I used in the report is extracted using bytesInRate in BrokerTopicMetrics. Also note that the 1MB is a small absolute difference. As the quota and throughput increases, the relative difference gets smaller, as we may observe in other experiments. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739505#comment-14739505 ] Dong Lin commented on KAFKA-2528: - Yeah I think it is certainly doable, i.e. we can get very close to the target. I can run the experiment with and without BrokerTopicMetrics.bytesInRate using the same metric class as ClientQuotaManger.quotaSensor for calculating. This should tell us whether this difference is due to metric or quota implementation. I did this report months ago. I recall that I tried get metric value from ClientQuotaManger and it should be lower than 1MB for that experiment. But I haven't written down my observation. Therefore I need to re-do the test. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739585#comment-14739585 ] Jay Kreps commented on KAFKA-2528: -- I guess my expectation is that if I run a producer perf test or consumer perf test with a quota of 1MB/sec for 1 minute I would see 1 MB/sec throughput. I agree that you would need to make sure the definition of byte count was the same in the perf test. But this does seem to be the independent test of whether it's working right. Checking the quota manager doesn't really test what is happening on the client side... > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation
[ https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739612#comment-14739612 ] Dong Lin commented on KAFKA-2528: - I am a bit confused.. I thought you were referring to the first experiment in the report, "Broker throughput validation with production traffic", where the broker throughput can ben 2 MB even though the quota-per-broker-clientId is 1 MB, right? In this experiment, the throughput is obtained inGraph, which read BytesInPerSec from broker. No measurement is performed on the client side in the first experiment.. In other words, in my functionality test, I only compare the broker's throughput with quota configuration. > Quota Performance Evaluation > > > Key: KAFKA-2528 > URL: https://issues.apache.org/jira/browse/KAFKA-2528 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Assignee: Dong Lin > Attachments: QuotaPerformanceEvaluation.pdf > > > In this document we present the results of experiments we did at LinkedIn, to > validate the basic functionality of quota, as well as the performances > benefits of using quota in a heterogeneous multi-tenant environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)