Hi all,
I would like to inform you that we have slightly changed our thoughts about
the implementation
of the Token Bucket algorithm. Our initial idea was to change our existing
Rate to behave like
a Token Bucket. That works as we expected but we have realized that the
value of the Rate is
not rea
Hi all,
Just a quick update. We have made good progress regarding the implementation
of this KIP. The major parts are already in trunk modulo the new "rate"
implementation
which is still under development.
I would like to change the type of the `controller_mutations_rate` from a
Long to
a Double.
Hi all,
The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and
Colin)
and 2 non-binding votes (Tom, Anna).
Thank you all for the fruitful discussion! I'd like to particularly thank
Anna who has
heavily contributed to the design of this KIP.
Regards,
David
On Fri, Jun 12, 2020
+1. Thanks, David!
best,
Colin
On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
> Colin, Jun,
>
> Do the proposed error code and the updated KIP look good to you guys? I’d
> like to wrap up and close the vote.
>
> Thanks,
> David
>
> Le mer. 10 juin 2020 à 14:50, David Jacot a écrit :
>
>
Hi, David,
Thanks for making those changes. They look fine to me. +1
Jun
On Thu, Jun 11, 2020 at 11:51 PM David Jacot wrote:
> Colin, Jun,
>
> Do the proposed error code and the updated KIP look good to you guys? I’d
> like to wrap up and close the vote.
>
> Thanks,
> David
>
> Le mer. 10 juin
Colin, Jun,
Do the proposed error code and the updated KIP look good to you guys? I’d
like to wrap up and close the vote.
Thanks,
David
Le mer. 10 juin 2020 à 14:50, David Jacot a écrit :
> Hi Colin and Jun,
>
> I have no problem if we have to rewrite part of it when the new controller
> comes
Hi Colin and Jun,
I have no problem if we have to rewrite part of it when the new controller
comes
out. I will be more than happy to help out.
Regarding KIP-590, I think that we can cope with a principal as a string
when the
time comes. The user entity name is defined with a string already.
Rega
Hi, Colin,
Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear.
Hi, David,
We added a new quota name in the KIP. You chose not to bump up the version
of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since the
quota name is represented as a string. However,
On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> Hi Colin,
>
> Thank you for your feedback.
>
> Jun has summarized the situation pretty well. Thanks Jun! I would like to
> complement it with the following points:
>
> 1. Indeed, when the quota is exceeded, the broker will reject the topic
> cr
Hi, David,
Sounds good then.
Thanks,
Jun
On Tue, Jun 9, 2020 at 10:59 AM David Jacot wrote:
> Hi Jun,
>
> Both are already in the KIP, see "New Broker Configurations" chapter. I
> think
> that we need them in order to be able to define different burst for the new
> quota.
>
> Best,
> David
>
Hi Jun,
Both are already in the KIP, see "New Broker Configurations" chapter. I
think
that we need them in order to be able to define different burst for the new
quota.
Best,
David
On Tue, Jun 9, 2020 at 7:48 PM Jun Rao wrote:
> Hi, David,
>
> Another thing. Should we add controller.quota.wind
Hi, David,
Another thing. Should we add controller.quota.window.size.seconds and
controller.quota.window.num
or just reuse the existing quota.window.size.seconds and quota.window.num
that are used for other types of quotas?
Thanks,
Jun
On Tue, Jun 9, 2020 at 10:30 AM Jun Rao wrote:
> Hi, Davi
Hi, David,
Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on
the KIP.
Jun
On Tue, Jun 9, 2020 at 5:07 AM David Jacot wrote:
> Hi Colin,
>
> Thank you for your feedback.
>
> Jun has summarized the situation pretty well. Thanks Jun! I would like to
> complement it with t
Hi, David,
Thanks for the explanation. The KIP looks good to me now.
Jun
On Tue, Jun 9, 2020 at 4:27 AM David Jacot wrote:
> Hi Jun,
>
> 40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated.
> This
> is required to let the client know how long it must wait. This is explai
Hi Colin,
Thank you for your feedback.
Jun has summarized the situation pretty well. Thanks Jun! I would like to
complement it with the following points:
1. Indeed, when the quota is exceeded, the broker will reject the topic
creations, partition creations and topics deletions that are exceeding
Hi Jun,
40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated.
This
is required to let the client know how long it must wait. This is explained
in the
"Handling of new/old clients".
Best,
David
On Mon, Jun 8, 2020 at 9:29 PM Jun Rao wrote:
> Hi, David,
>
> Thanks for the u
On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> Hi, Colin,
>
> Thanks for the comment. You brought up several points.
>
> 1. Should we set up a per user quota? To me, it does seem we need some sort
> of a quota. When the controller runs out of resources, ideally, we only
> want to penalize the ba
Hi, Colin,
Thanks for the comment. You brought up several points.
1. Should we set up a per user quota? To me, it does seem we need some sort
of a quota. When the controller runs out of resources, ideally, we only
want to penalize the bad behaving applications, instead of every
application. To do
Hi, David,
Thanks for the updated KIP. Another minor comment below.
40. For the new `QUOTA_VIOLATED` error in the response to
CreateTopics/CreatePartitions/DeleteTopics, could you clarify
whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED?
Jun
On Mon, Jun 8, 2020 at 9:32
Hi Jun,
30. The rate is accumulated at the partition level. Let me clarify this in
the KIP.
Best,
David
On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner wrote:
> Hi David,
>
> The KIP looks good to me. I am going to the voting thread...
>
> Hi Jun,
>
> Yes, exactly. That's a separate thing from thi
Hi David,
Thanks for the KIP.
I thought about this for a while and I actually think this approach is not
quite right. The problem that I see here is that using an explicitly set quota
here requires careful tuning by the cluster operator. Even worse, this tuning
might be invalidated by change
+1 (not binding)
Thanks for the KIP!
-Anna
On Thu, Jun 4, 2020 at 8:26 AM Mickael Maison
wrote:
> +1 (binding)
> Thanks David for looking into this important issue
>
> On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley wrote:
> >
> > +1 (non binding).
> >
> > Thanks!
> >
> > On Wed, Jun 3, 2020 at 3:
Hi David,
The KIP looks good to me. I am going to the voting thread...
Hi Jun,
Yes, exactly. That's a separate thing from this KIP, so working on the fix.
Thanks,
Anna
On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote:
> Hi, Anna,
>
> Thanks for the comment. For the problem that you described, pe
Hi, Anna,
Thanks for the comment. For the problem that you described, perhaps we need
to make the quota checking and recording more atomic?
Hi, David,
Thanks for the updated KIP. Looks good to me now. Just one minor comment
below.
30. controller_mutations_rate: For topic creation and deletion,
Hi Anna and Jun,
You are right. We should allocate up to the quota for each old sample.
I have revamped the Throttling Algorithm section to better explain our
thought process and the token bucket inspiration.
I have also added a chapter with few guidelines about how to define
the quota. There is
Hi David and Jun,
I dug a bit deeper into the Rate implementation, and wanted to confirm that
I do believe that the token bucket behavior is better for the reasons we
already discussed but wanted to summarize. The main difference between Rate
and token bucket is that the Rate implementation allows
Hi, David, Anna,
Thanks for the discussion and the updated wiki.
11. If we believe the token bucket behavior is better in terms of handling
the burst behavior, we probably don't need a separate KIP since it's just
an implementation detail.
Regarding "So instead of having one sample equal to 560
+1 (binding)
Thanks David for looking into this important issue
On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley wrote:
>
> +1 (non binding).
>
> Thanks!
>
> On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram
> wrote:
>
> > +1 (binding)
> >
> > Thanks for the KIP, David!
> >
> > Regards,
> >
> > Rajini
> >
Hi all,
I just published an updated version of the KIP which includes:
* Using a slightly modified version of our Rate. I have tried to formalize
it based on our discussion. As Anna suggested, we may find a better way to
implement it.
* Handling of ValidateOnly as pointed out by Tom.
Please, chec
+1 (non binding).
Thanks!
On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram
wrote:
> +1 (binding)
>
> Thanks for the KIP, David!
>
> Regards,
>
> Rajini
>
>
> On Sun, May 31, 2020 at 3:29 AM Gwen Shapira wrote:
>
> > +1 (binding)
> >
> > Looks great. Thank you for the in-depth design and discussio
Hi David,
As a user I might expect the validateOnly option to do everything except
actually make the changes. That interpretation would imply the quota should
be checked, but the check should obviously be side-effect free. I think
this interpretation could be useful because it gives the caller eit
Hi Tom,
That's a good question. As the validation does not create any load on the
controller, I was planning to do it without checking the quota at all. Does
that
sound reasonable?
Best,
David
On Thu, Jun 4, 2020 at 4:23 PM David Jacot wrote:
> Hi Jun and Anna,
>
> Thank you both for your repl
Hi Jun and Anna,
Thank you both for your replies.
Based on our recent discussion, I agree that using a rate is better to
remain
consistent with other quotas. As you both suggested, it seems that changing
the way we compute the rate to better handle spiky workloads and behave a
bit more similarly
Hi Jun and David,
Regarding token bucket vs, Rate behavior. We recently observed a couple of
cases where a bursty workload behavior would result in long-ish pauses in
between, resulting in lower overall bandwidth than the quota. I will need
to debug this a bit more to be 100% sure, but it does loo
Hi, David,
Thanks for the reply.
11. To match the behavior in the Token bucket approach, I was thinking that
requests that don't fit in the previous time windows will be accumulated in
the current time window. So, the 60 extra requests will be accumulated in
the latest window. Then, the client al
Hi David,
One quick question about the implementation (I don't think it's spelled out
in the KIP): Presumably if you make, for example, a create topics request
with validate only it will check for quota violation, but not count towards
quota violation, right?
Many thanks,
Tom
On Wed, Jun 3, 202
+1 (binding)
Thanks for the KIP, David!
Regards,
Rajini
On Sun, May 31, 2020 at 3:29 AM Gwen Shapira wrote:
> +1 (binding)
>
> Looks great. Thank you for the in-depth design and discussion.
>
> On Fri, May 29, 2020 at 7:58 AM David Jacot wrote:
>
> > Hi folks,
> >
> > I'd like to start the
Hi David,
2) sorry, that was my mistake.
Regards,
Rajini
On Wed, Jun 3, 2020 at 3:08 PM David Jacot wrote:
> Hi Rajini,
>
> Thanks for your prompt response.
> 1) Good catch, fixed.
> 2) The retry mechanism will be in the client so a new field is not
> required in the requests.
>
> Regards,
>
Hi Rajini,
Thanks for your prompt response.
1) Good catch, fixed.
2) The retry mechanism will be in the client so a new field is not
required in the requests.
Regards,
David
On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram
wrote:
> Hi David,
>
> Thanks for the updates, looks good. Just a couple o
Hi David,
Thanks for the updates, looks good. Just a couple of minor comments:
1) There is a typo in "*The channel will be mutated as well when
`throttle_time_ms > 0`." * Should be *muted*?
2) Since the three requests will need a new field for `
*retryQuotaViolatedException*`, we should perhaps ad
Hi all,
I have updated the KIP based on our recent discussions. I have mainly
changed the
following points:
* I have renamed the quota as suggested by Jun.
* I have changed the metrics to be "token bucket" agnostic. The idea is to
report the
burst and the rate per principal/clientid.
* I have remo
Hi Rajini,
Thanks for your feedback. Please find my answers below:
1) Our main goal is to protect the controller from the extreme users
(DDoS). We want
to protect it from large requests or repetitive requests coming from a
single user.
That user could be used by multiple apps as you pointed out w
Hi David,
Thanks for the KIP. A few questions below:
1) The KIP says: *`Typically, applications tend to send one request to
create all the topics that they need`*. What would the point of throttling
be in this case? If there was a user quota for the principal used by that
application, wouldn't we
Hi Jun,
Thanks for your reply.
10. I think that both options are likely equivalent from an accuracy point
of
view. If we put the implementation aside, conceptually, I am not convinced
by the used based approach because resources don't have a clear owner
in AK at the moment. A topic can be created
+1 (binding)
Looks great. Thank you for the in-depth design and discussion.
On Fri, May 29, 2020 at 7:58 AM David Jacot wrote:
> Hi folks,
>
> I'd like to start the vote for KIP-599 which proposes a new quota to
> throttle create topic, create partition, and delete topics operations to
> protec
Hi, David, Anna,
Thanks for the response. Sorry for the late reply.
10. Regarding exposing rate or usage as quota. Your argument is that usage
is not very accurate anyway and is harder to implement. So, let's just be a
bit loose and expose rate. I am sort of neutral on that. (1) It seems to me
th
Hi folks,
I'd like to start the vote for KIP-599 which proposes a new quota to
throttle create topic, create partition, and delete topics operations to
protect the Kafka controller:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+To
Hi folks,
I have updated the KIP. As mentioned by Jun, I have made the
quota per principal/clientid similarly to the other quotas. I have
also explained how this will work in conjunction with the auto
topics creation.
Please, take a look at it. I plan to call a vote for it in the next few
days if
Hi David,
Thanks for the explanation and confirmation that evolving the APIs is not
off the table in the longer term.
Kind regards,
Tom
Hi Tom,
>> What exactly is the problem with having a huge backlog of pending
>> operations? I can see that the backlog would need persisting so that the
>> controller could change without losing track of the topics to be mutated,
>> and the mutations would need to be submitted in batches to the co
Hi Jun,
Coming back to your question regarding the differences between the token
bucket algorithm and our current quota mechanism. I did some tests and
they confirmed my first intuition that our current mechanism does not work
well with a bursty workload. Let me try to illustrate the difference wi
Hi David,
Thanks for the reply.
>> If I understand the proposed throttling algorithm, an initial request
> would
> >> be allowed (possibly making K negative) and only subsequent requests
> >> (before K became positive) would receive the QUOTA_VIOLATED. That would
> >> mean it was still possible t
Hi Tom,
Thanks for the feedback.
>> If I understand the proposed throttling algorithm, an initial request
would
>> be allowed (possibly making K negative) and only subsequent requests
>> (before K became positive) would receive the QUOTA_VIOLATED. That would
>> mean it was still possible to block
Hi Anna and Jun,
Anna, thanks for your thoughtful feedback. Overall, I agree with what you
said. If I summarize, you said that using time on server threads is not
easier
to tune than a rate based approach and it does not really capture all the
load neither as the control requests are not taken int
Hi David and Jun,
I wanted to add to the discussion about using requests/sec vs. time on
server threads (similar to request quota) for expressing quota for topic
ops.
I think request quota does not protect the brokers from overload by itself
-- it still requires tuning and sometimes re-tuning, be
Hi David,
Thanks for the KIP.
If I understand the proposed throttling algorithm, an initial request would
be allowed (possibly making K negative) and only subsequent requests
(before K became positive) would receive the QUOTA_VIOLATED. That would
mean it was still possible to block the controller
Hi, David,
Thanks for the reply. A few more comments.
1. I am actually not sure if a quota based on request rate is easier for
the users. For context, in KIP-124, we started with a request rate quota,
but ended up not choosing it. The main issues are (a) requests are not
equal; some are more expe
Hi Jun,
Thank you for the feedback.
1. You are right. At the end, we do care about the percentage of time that
an operation ties up the controller thread. I thought about this but I was
not entirely convinced by it for following reasons:
1.1. While I do agree that setting up a rate and a burst i
Hi, David,
Thanks for the KIP. A few quick comments.
1. About quota.partition.mutations.rate. I am not sure if it's very easy
for the user to set the quota as a rate. For example, each partition
mutation could take a different number of ZK operations (depending on
things like retry). The time to
Hi folks,
I'd like to start the discussion for KIP-599:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
It proposes to introduce quotas for the create topics, create partitions
and delete topics operations. Let me
60 matches
Mail list logo