[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-11-02 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985697#comment-14985697
 ] 

Jay Kreps commented on KAFKA-2528:
--

[~lindong] Great!

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluationRelease.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-11-01 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984276#comment-14984276
 ] 

Dong Lin commented on KAFKA-2528:
-

[~jkreps] Sorry for late reply. I just test the quota using latest trunk. 
Please find the results below. 

Configuration: The test is run with one broker, one producer performance 
configured with topic=test record-size=1 --throughput=10, and one 
console consumer which reads from topic “test” at maximum possible throughput. 
Consumer always runs after producer stops. Bytes-in and bytes-out rates are 
collected using one minute average after the values stabilize.

1) Unlimited quota. Broker’s bytes-in and bytes-out rates are 85 MBps and 250 
MBps.
2) 1 MBps quota for both producer and consumer. Broker’s bytes-in and bytes-out 
rates are 0.95 MBps and 0.98 MBps.
3) 10 MBps quota for both producer and consumer. Broker’s bytes-in and 
bytes-out rates are 9.8 MBps and 9.9 MBps.
4) 50 MBps quota for both producer and consumer. Broker’s bytes-in and 
bytes-out rates are 49 MBps and 49 MBps.

It appears that quota from latest trunk is working correctly now. I didn't try 
to reproduce the problem in the original report, where the broker may have 2 
MBps bytes-in rate in inGraph even when configured with 1 MBps produce quota. 
The difference in result may possibly due to change made in Rate.java in 
https://github.com/apache/kafka/pull/323.




> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739886#comment-14739886
 ] 

Dong Lin commented on KAFKA-2528:
-

Yeah I think that is a reasonable expectation.

We have done something similar to what you have described -- in experiment 2, 
when 4 producers produce to a cluster of 4 brokers configured with 10 MBps 
quota per clientId, the broker with most traffic does have ~10 MBps total 
throughput. And when quota is 50 MBps the highest throughput is ~50 MBps as 
well. I didn't record the precision here, but I am pretty sure the deviation is 
< 0.5 MBps.  The error is <5% for 10 MBps total traffic and < 1% for 50 MBps 
total traffic. I think we can say that quota enforcement is accurate when quota 
>= 10 MBps.

However, it does appear problematic that the measured broker throughput can be 
2 MBps when quota is only 1 MBps. I don't have definitive explanation yet. I 
will try replicate this experiment and let you know.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739660#comment-14739660
 ] 

Jay Kreps commented on KAFKA-2528:
--

Yeah, sorry separate thought. What I'm trying to figure out is how closely can 
we At first I was pointing out that the measurement didn't seem to match the 
quota, and what I understood you to say was that we didn't really know what 
that measurement meant in comparison to the quota metric and maybe it meant 
something different so maybe the fact they don't match isn't a bug.

That's fine but then that doesn't really confirm the accuracy of the quota 
enforcement either, right? So I was saying that my expectation would be if I 
used the perf tests which do periodic throughput reporting that throughput 
would match the quota I set (maybe off by a few percent but not like 10%). If 
not I would think it was a bug.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739917#comment-14739917
 ] 

Jay Kreps commented on KAFKA-2528:
--

Yeah that makes sense. So if that were the case the artifact would go away when 
measured over a longer timeframe, right?

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739899#comment-14739899
 ] 

Aditya Auradkar commented on KAFKA-2528:


One possible explanation for the difference is that we append to the log when 
the produce request is received. For example, in your experiment you have 12 
mirror makers each sending a batch of data. When a batch is recorded the 
clients get throttled until the quota is within the limit. After receiving a 
response, each of them immediately sends a large batch to the brokers. Because 
the quota is so low and the request size can be much larger, there is a small 
absolute difference in this example which corresponds to the maximum size of 
the received request. I think if measured over a period of time from the client 
perspective, the actual throughput will be very similar to the 1MB quota.



> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739420#comment-14739420
 ] 

Jay Kreps commented on KAFKA-2528:
--

This is a great evaluation. It looks from the results like there must be some 
bug in the quota backoff calculation, though, right? E.g. the 1MB quota was 
actually 2MB which is 2x the desired rate. Do we know why? The test validates 
that it is directionally correct, but it should be possible to make it exact, 
right?

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739493#comment-14739493
 ] 

Jay Kreps commented on KAFKA-2528:
--

I mention it because I have had to implement that kind of throttling logic 
before and getting the arithmetic right is pretty tricky so it's really easy to 
have bugs that skew things a bit. It should be possible to get very close to 
the target (say ~1%) when the logic is right and you have heavily load through 
right?

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739438#comment-14739438
 ] 

Aditya Auradkar commented on KAFKA-2528:


I'm not quite sure why the actual rate is higher in this particular case. It 
seems to be a lot closer in the other tests Dong has posted. The difference is 
likely because of some measurement issue.. perhaps a test issue. It should be 
straightforward to reproduce this.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739488#comment-14739488
 ] 

Dong Lin commented on KAFKA-2528:
-

I think this difference may be explained by the fact that ClientQuotaManger and 
BrokerTopicMetrics are using two difference classes, thus difference 
configuration and algorithm, to measure bytes-in-rate. It will be definitely a 
bug if the value of metric used by ClientQuotaManager ever exceeds the quota 
limit. However, here the throughput I used in the report is extracted using 
bytesInRate in BrokerTopicMetrics.

Also note that the 1MB is a small absolute difference. As the quota and 
throughput increases, the relative difference gets smaller, as we may observe 
in other experiments.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739505#comment-14739505
 ] 

Dong Lin commented on KAFKA-2528:
-

Yeah I think it is certainly doable, i.e. we can get very close to the target. 
I can run the experiment with and without BrokerTopicMetrics.bytesInRate using 
the same metric class as ClientQuotaManger.quotaSensor for calculating. This 
should tell us whether this difference is due to metric or quota implementation.

I did this report months ago. I recall that I tried get metric value from 
ClientQuotaManger and it should be lower than 1MB for that experiment. But I 
haven't written down my observation. Therefore I need to re-do the test.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739585#comment-14739585
 ] 

Jay Kreps commented on KAFKA-2528:
--

I guess my expectation is that if I run a producer perf test or consumer perf 
test with a quota of 1MB/sec for 1 minute I would see 1 MB/sec throughput. I 
agree that you would need to make sure the definition of byte count was the 
same in the perf test. But this does seem to be the independent test of whether 
it's working right. Checking the quota manager doesn't really test what is 
happening on the client side...

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2528) Quota Performance Evaluation

2015-09-10 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739612#comment-14739612
 ] 

Dong Lin commented on KAFKA-2528:
-

I am a bit confused.. I thought you were referring to the first experiment in 
the report, "Broker throughput validation with production traffic", where the 
broker throughput can ben 2 MB even though the quota-per-broker-clientId is 1 
MB, right? In this experiment, the throughput is obtained inGraph, which read 
BytesInPerSec from broker. No measurement is performed on the client side in 
the first experiment..

In other words, in my functionality test, I only compare the broker's 
throughput with quota configuration.

> Quota Performance Evaluation
> 
>
> Key: KAFKA-2528
> URL: https://issues.apache.org/jira/browse/KAFKA-2528
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dong Lin
>Assignee: Dong Lin
> Attachments: QuotaPerformanceEvaluation.pdf
>
>
> In this document we present the results of experiments we did at LinkedIn, to 
> validate the basic functionality of quota, as well as the performances 
> benefits of using quota in a heterogeneous multi-tenant environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)