Re: [VOTE] KIP-219 - Improve Quota Communication

2018-05-07 Thread Jonghyun Lee
Thanks for the detailed explanation, Rajini. I think it makes sense not to
make the protocol change for them. I'll revert those changes.

Jon

On Mon, May 7, 2018 at 6:42 AM, Rajini Sivaram 
wrote:

> Hi Jon,
>
> Thanks for the explanation. We had deliberately avoided adding
> throttle_time_ms to LeaderAndIsrResponse,UpdateMetadataResponse,
> SaslAuthenticateResponse and SaslHandshakeResponse since clients
> (producer/consumer/admin client) would never get throttled on these.
> Clients don't use LeaderAndIsr and UpdateMetadata requests, clients never
> send SaslAuthenticate and SaslHandshake requests after the initial
> authentication. We do throttle these, but that is to prevent DoS attacks or
> requests from clients that were not implemented correctly. In these cases,
> we would rely on muting on the broker-side to enforce quotas rather than
> have clients mute themselves. So perhaps we don't need the protocol change?
>
>
> On Sat, May 5, 2018 at 12:49 AM, Jonghyun Lee  wrote:
>
> > Per Rajini's request, I'd like to add one more discussion item.
> >
> > I added the THROTTLE_TIME_MS field to LeaderAndIsrResponse,
> > UpdateMetadataResponse, SaslAuthenticateResponse and
> SaslHandshakeResponse.
> > These are cluster actions and normally not throttled, but they can
> actually
> > be throttled on errors (cluster authorization failure for the first two
> and
> > requests coming in after authorization is done for the last two) for
> > request quota violation. And when they are throttled, I want throttle
> time
> > to be passed back to the client so that the client can refrain from
> sending
> > more requests to the broker until the throttle time is over.
> > For non-error cases, they won't be throttled as before and there are no
> > behavior changes (except that the response will have a zero throttle
> time).
> > If we do not include THROTTLE_TIME_MS in those responses like before,
> they
> > will still be throttled on the broker side, but the client won't know
> that
> > and can keep sending requests to the broker while throttling is in
> > progress. I assume this is not a big concern for these API types, though.
> >
> > Please let me know what you think, and if we decide to keep this change,
> > I'll update the KIP.
> >
> > Thanks,
> > Jon
> >
> >
> > On Wed, May 2, 2018 at 12:08 PM, Jonghyun Lee 
> wrote:
> >
> > > Hi Magnus and Jun, do you have any feedback on this?
> > >
> > > Since two of the original voters (Becket and Rajini) showed a
> preference
> > > for bumping up all request versions, I'll wait till tomorrow and start
> > > implementing it unless someone expresses different opinions.
> > >
> > > As for the implementation, the current plan is to add a method to
> > > AbstractResponse which tells whether or not client should throttle and
> > have
> > > an implementation in each response type based on its version.
> > >
> > > Thanks,
> > > Jon
> > >
> > >
> > > On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee 
> > wrote:
> > >
> > >> Hi Rajini,
> > >>
> > >> Thanks for letting me know the SASL use case. I think this is a
> similar
> > >> point that Becket raised (creating dependency between different
> > requests)
> > >> and it makes sense to me.
> > >>
> > >> Will wait for some more feedback and finalize.
> > >>
> > >> Thanks,
> > >> Jon
> > >>
> > >>
> > >>
> > >> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Jon,
> > >>>
> > >>> We currently send two ApiVersions requests for SASL, the first with
> > >>> version
> > >>> 0 to get the request versions for SASL and another after
> > authentication.
> > >>> You are currently using the second and basing all throttling
> decisions
> > on
> > >>> that. This may get in the way if we wanted to change this code in the
> > >>> future (e.g. throttle SASL requests or combine the two ApiVersions
> > >>> requests
> > >>> into one). So if it is easy enough to check request versions against
> an
> > >>> immutable map rather than track broker ApiVersions in a mutable set,
> > >>> perhaps that would be better?
> > >>>
> > >>> Hi Magnus, It will be good to know what works better for non-Java
> > >>> clients.
> > >>>
> > >>>
> > >>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee 
> > >>> wrote:
> > >>>
> > >>> > Thanks, Becket.
> > >>> >
> > >>> > Assuming that requiring new client implementations to issue
> > >>> > ApiVersionsRequest before issuing any other requests is reasonable
> as
> > >>> you
> > >>> > mentioned, I am wondering if keeping track of brokers that support
> > >>> KIP-219
> > >>> > is actually simpler than keeping a map from each request type to
> its
> > >>> min
> > >>> > version number supporting KIP-219 and checking the version of every
> > >>> single
> > >>> > request to see whether or not to throttle from the client side. To
> > me,
> > >>> it
> > >>> > looks like a broker 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-05-07 Thread Rajini Sivaram
Hi Jon,

Thanks for the explanation. We had deliberately avoided adding
throttle_time_ms to LeaderAndIsrResponse,UpdateMetadataResponse,
SaslAuthenticateResponse and SaslHandshakeResponse since clients
(producer/consumer/admin client) would never get throttled on these.
Clients don't use LeaderAndIsr and UpdateMetadata requests, clients never
send SaslAuthenticate and SaslHandshake requests after the initial
authentication. We do throttle these, but that is to prevent DoS attacks or
requests from clients that were not implemented correctly. In these cases,
we would rely on muting on the broker-side to enforce quotas rather than
have clients mute themselves. So perhaps we don't need the protocol change?


On Sat, May 5, 2018 at 12:49 AM, Jonghyun Lee  wrote:

> Per Rajini's request, I'd like to add one more discussion item.
>
> I added the THROTTLE_TIME_MS field to LeaderAndIsrResponse,
> UpdateMetadataResponse, SaslAuthenticateResponse and SaslHandshakeResponse.
> These are cluster actions and normally not throttled, but they can actually
> be throttled on errors (cluster authorization failure for the first two and
> requests coming in after authorization is done for the last two) for
> request quota violation. And when they are throttled, I want throttle time
> to be passed back to the client so that the client can refrain from sending
> more requests to the broker until the throttle time is over.
> For non-error cases, they won't be throttled as before and there are no
> behavior changes (except that the response will have a zero throttle time).
> If we do not include THROTTLE_TIME_MS in those responses like before, they
> will still be throttled on the broker side, but the client won't know that
> and can keep sending requests to the broker while throttling is in
> progress. I assume this is not a big concern for these API types, though.
>
> Please let me know what you think, and if we decide to keep this change,
> I'll update the KIP.
>
> Thanks,
> Jon
>
>
> On Wed, May 2, 2018 at 12:08 PM, Jonghyun Lee  wrote:
>
> > Hi Magnus and Jun, do you have any feedback on this?
> >
> > Since two of the original voters (Becket and Rajini) showed a preference
> > for bumping up all request versions, I'll wait till tomorrow and start
> > implementing it unless someone expresses different opinions.
> >
> > As for the implementation, the current plan is to add a method to
> > AbstractResponse which tells whether or not client should throttle and
> have
> > an implementation in each response type based on its version.
> >
> > Thanks,
> > Jon
> >
> >
> > On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee 
> wrote:
> >
> >> Hi Rajini,
> >>
> >> Thanks for letting me know the SASL use case. I think this is a similar
> >> point that Becket raised (creating dependency between different
> requests)
> >> and it makes sense to me.
> >>
> >> Will wait for some more feedback and finalize.
> >>
> >> Thanks,
> >> Jon
> >>
> >>
> >>
> >> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> >> wrote:
> >>
> >>> Hi Jon,
> >>>
> >>> We currently send two ApiVersions requests for SASL, the first with
> >>> version
> >>> 0 to get the request versions for SASL and another after
> authentication.
> >>> You are currently using the second and basing all throttling decisions
> on
> >>> that. This may get in the way if we wanted to change this code in the
> >>> future (e.g. throttle SASL requests or combine the two ApiVersions
> >>> requests
> >>> into one). So if it is easy enough to check request versions against an
> >>> immutable map rather than track broker ApiVersions in a mutable set,
> >>> perhaps that would be better?
> >>>
> >>> Hi Magnus, It will be good to know what works better for non-Java
> >>> clients.
> >>>
> >>>
> >>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee 
> >>> wrote:
> >>>
> >>> > Thanks, Becket.
> >>> >
> >>> > Assuming that requiring new client implementations to issue
> >>> > ApiVersionsRequest before issuing any other requests is reasonable as
> >>> you
> >>> > mentioned, I am wondering if keeping track of brokers that support
> >>> KIP-219
> >>> > is actually simpler than keeping a map from each request type to its
> >>> min
> >>> > version number supporting KIP-219 and checking the version of every
> >>> single
> >>> > request to see whether or not to throttle from the client side. To
> me,
> >>> it
> >>> > looks like a broker property that can be cached, and checking every
> >>> single
> >>> > request seems excessive.
> >>> >
> >>> > Thanks,
> >>> > Jon
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin 
> >>> wrote:
> >>> >
> >>> > > The reason we wanted to bump up all the request versions was let
> the
> >>> > > clients know whether the broker has already throttled the request
> or
> >>> not.
> >>> > > This avoids double throttling in both 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-05-04 Thread Jonghyun Lee
Per Rajini's request, I'd like to add one more discussion item.

I added the THROTTLE_TIME_MS field to LeaderAndIsrResponse,
UpdateMetadataResponse, SaslAuthenticateResponse and SaslHandshakeResponse.
These are cluster actions and normally not throttled, but they can actually
be throttled on errors (cluster authorization failure for the first two and
requests coming in after authorization is done for the last two) for
request quota violation. And when they are throttled, I want throttle time
to be passed back to the client so that the client can refrain from sending
more requests to the broker until the throttle time is over.
For non-error cases, they won't be throttled as before and there are no
behavior changes (except that the response will have a zero throttle time).
If we do not include THROTTLE_TIME_MS in those responses like before, they
will still be throttled on the broker side, but the client won't know that
and can keep sending requests to the broker while throttling is in
progress. I assume this is not a big concern for these API types, though.

Please let me know what you think, and if we decide to keep this change,
I'll update the KIP.

Thanks,
Jon


On Wed, May 2, 2018 at 12:08 PM, Jonghyun Lee  wrote:

> Hi Magnus and Jun, do you have any feedback on this?
>
> Since two of the original voters (Becket and Rajini) showed a preference
> for bumping up all request versions, I'll wait till tomorrow and start
> implementing it unless someone expresses different opinions.
>
> As for the implementation, the current plan is to add a method to
> AbstractResponse which tells whether or not client should throttle and have
> an implementation in each response type based on its version.
>
> Thanks,
> Jon
>
>
> On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee  wrote:
>
>> Hi Rajini,
>>
>> Thanks for letting me know the SASL use case. I think this is a similar
>> point that Becket raised (creating dependency between different requests)
>> and it makes sense to me.
>>
>> Will wait for some more feedback and finalize.
>>
>> Thanks,
>> Jon
>>
>>
>>
>> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram 
>> wrote:
>>
>>> Hi Jon,
>>>
>>> We currently send two ApiVersions requests for SASL, the first with
>>> version
>>> 0 to get the request versions for SASL and another after authentication.
>>> You are currently using the second and basing all throttling decisions on
>>> that. This may get in the way if we wanted to change this code in the
>>> future (e.g. throttle SASL requests or combine the two ApiVersions
>>> requests
>>> into one). So if it is easy enough to check request versions against an
>>> immutable map rather than track broker ApiVersions in a mutable set,
>>> perhaps that would be better?
>>>
>>> Hi Magnus, It will be good to know what works better for non-Java
>>> clients.
>>>
>>>
>>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee 
>>> wrote:
>>>
>>> > Thanks, Becket.
>>> >
>>> > Assuming that requiring new client implementations to issue
>>> > ApiVersionsRequest before issuing any other requests is reasonable as
>>> you
>>> > mentioned, I am wondering if keeping track of brokers that support
>>> KIP-219
>>> > is actually simpler than keeping a map from each request type to its
>>> min
>>> > version number supporting KIP-219 and checking the version of every
>>> single
>>> > request to see whether or not to throttle from the client side. To me,
>>> it
>>> > looks like a broker property that can be cached, and checking every
>>> single
>>> > request seems excessive.
>>> >
>>> > Thanks,
>>> > Jon
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin 
>>> wrote:
>>> >
>>> > > The reason we wanted to bump up all the request versions was let the
>>> > > clients know whether the broker has already throttled the request or
>>> not.
>>> > > This avoids double throttling in both brokers and clients, if the
>>> clients
>>> > > implementation supports KIP-219.
>>> > >
>>> > > The new change uses only ApiVersionRequest, instead of all the
>>> requests,
>>> > to
>>> > > indicate whether the brokers have applied throttling or not. The
>>> > difference
>>> > > is that all the clients must issue ApiVersionRequest first before
>>> issuing
>>> > > any other requests. This has no impact on the existing clients. But
>>> for
>>> > > clients implementation that wants to support KIP-219, they need to
>>> follow
>>> > > this practice, which seems to be reasonable.
>>> > > Initially I thought the change is fine. However, thinking about this
>>> > again,
>>> > > that means the clients implementation needs to remember the
>>> ApiVersions
>>> > of
>>> > > each broker, which may have some complexity. Also this seems
>>> introduces
>>> > the
>>> > > behavior dependency between different requests, which seems
>>> unnecessary.
>>> > >
>>> > > Due to the above reasons, I think it might be better to bump up 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-05-02 Thread Jonghyun Lee
Hi Magnus and Jun, do you have any feedback on this?

Since two of the original voters (Becket and Rajini) showed a preference
for bumping up all request versions, I'll wait till tomorrow and start
implementing it unless someone expresses different opinions.

As for the implementation, the current plan is to add a method to
AbstractResponse which tells whether or not client should throttle and have
an implementation in each response type based on its version.

Thanks,
Jon


On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee  wrote:

> Hi Rajini,
>
> Thanks for letting me know the SASL use case. I think this is a similar
> point that Becket raised (creating dependency between different requests)
> and it makes sense to me.
>
> Will wait for some more feedback and finalize.
>
> Thanks,
> Jon
>
>
>
> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram 
> wrote:
>
>> Hi Jon,
>>
>> We currently send two ApiVersions requests for SASL, the first with
>> version
>> 0 to get the request versions for SASL and another after authentication.
>> You are currently using the second and basing all throttling decisions on
>> that. This may get in the way if we wanted to change this code in the
>> future (e.g. throttle SASL requests or combine the two ApiVersions
>> requests
>> into one). So if it is easy enough to check request versions against an
>> immutable map rather than track broker ApiVersions in a mutable set,
>> perhaps that would be better?
>>
>> Hi Magnus, It will be good to know what works better for non-Java clients.
>>
>>
>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee 
>> wrote:
>>
>> > Thanks, Becket.
>> >
>> > Assuming that requiring new client implementations to issue
>> > ApiVersionsRequest before issuing any other requests is reasonable as
>> you
>> > mentioned, I am wondering if keeping track of brokers that support
>> KIP-219
>> > is actually simpler than keeping a map from each request type to its min
>> > version number supporting KIP-219 and checking the version of every
>> single
>> > request to see whether or not to throttle from the client side. To me,
>> it
>> > looks like a broker property that can be cached, and checking every
>> single
>> > request seems excessive.
>> >
>> > Thanks,
>> > Jon
>> >
>> >
>> >
>> >
>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin 
>> wrote:
>> >
>> > > The reason we wanted to bump up all the request versions was let the
>> > > clients know whether the broker has already throttled the request or
>> not.
>> > > This avoids double throttling in both brokers and clients, if the
>> clients
>> > > implementation supports KIP-219.
>> > >
>> > > The new change uses only ApiVersionRequest, instead of all the
>> requests,
>> > to
>> > > indicate whether the brokers have applied throttling or not. The
>> > difference
>> > > is that all the clients must issue ApiVersionRequest first before
>> issuing
>> > > any other requests. This has no impact on the existing clients. But
>> for
>> > > clients implementation that wants to support KIP-219, they need to
>> follow
>> > > this practice, which seems to be reasonable.
>> > > Initially I thought the change is fine. However, thinking about this
>> > again,
>> > > that means the clients implementation needs to remember the
>> ApiVersions
>> > of
>> > > each broker, which may have some complexity. Also this seems
>> introduces
>> > the
>> > > behavior dependency between different requests, which seems
>> unnecessary.
>> > >
>> > > Due to the above reasons, I think it might be better to bump up all
>> the
>> > > request versions.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee 
>> > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > Per Ismael's suggestion, I'd like to get comments from the original
>> > > voters
>> > > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
>> > > change
>> > > > proposed in the discussion thread (
>> > > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
>> > > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
>> > > > Would you let me know?
>> > > >
>> > > > Thanks,
>> > > > Jon
>> > > >
>> > > >
>> > > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin 
>> wrote:
>> > > >
>> > > > > +Jon
>> > > > >
>> > > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin 
>> > > wrote:
>> > > > >
>> > > > >> Thanks for the discussion and voting.
>> > > > >>
>> > > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>> > > > >>
>> > > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
>> > > > rajinisiva...@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Hi Becket,
>> > > > >> >
>> > > > >> > Thanks for the update. Yes, it does address my concern.
>> > > > >> >
>> > > > >> > +1 (binding)
>> > > > >> >
>> > > > >> > Regards,
>> > > > >> >
>> > > > >> 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-30 Thread Jonghyun Lee
Hi Rajini,

Thanks for letting me know the SASL use case. I think this is a similar
point that Becket raised (creating dependency between different requests)
and it makes sense to me.

Will wait for some more feedback and finalize.

Thanks,
Jon



On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram 
wrote:

> Hi Jon,
>
> We currently send two ApiVersions requests for SASL, the first with version
> 0 to get the request versions for SASL and another after authentication.
> You are currently using the second and basing all throttling decisions on
> that. This may get in the way if we wanted to change this code in the
> future (e.g. throttle SASL requests or combine the two ApiVersions requests
> into one). So if it is easy enough to check request versions against an
> immutable map rather than track broker ApiVersions in a mutable set,
> perhaps that would be better?
>
> Hi Magnus, It will be good to know what works better for non-Java clients.
>
>
> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee  wrote:
>
> > Thanks, Becket.
> >
> > Assuming that requiring new client implementations to issue
> > ApiVersionsRequest before issuing any other requests is reasonable as you
> > mentioned, I am wondering if keeping track of brokers that support
> KIP-219
> > is actually simpler than keeping a map from each request type to its min
> > version number supporting KIP-219 and checking the version of every
> single
> > request to see whether or not to throttle from the client side. To me, it
> > looks like a broker property that can be cached, and checking every
> single
> > request seems excessive.
> >
> > Thanks,
> > Jon
> >
> >
> >
> >
> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin 
> wrote:
> >
> > > The reason we wanted to bump up all the request versions was let the
> > > clients know whether the broker has already throttled the request or
> not.
> > > This avoids double throttling in both brokers and clients, if the
> clients
> > > implementation supports KIP-219.
> > >
> > > The new change uses only ApiVersionRequest, instead of all the
> requests,
> > to
> > > indicate whether the brokers have applied throttling or not. The
> > difference
> > > is that all the clients must issue ApiVersionRequest first before
> issuing
> > > any other requests. This has no impact on the existing clients. But for
> > > clients implementation that wants to support KIP-219, they need to
> follow
> > > this practice, which seems to be reasonable.
> > > Initially I thought the change is fine. However, thinking about this
> > again,
> > > that means the clients implementation needs to remember the ApiVersions
> > of
> > > each broker, which may have some complexity. Also this seems introduces
> > the
> > > behavior dependency between different requests, which seems
> unnecessary.
> > >
> > > Due to the above reasons, I think it might be better to bump up all the
> > > request versions.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee 
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > Per Ismael's suggestion, I'd like to get comments from the original
> > > voters
> > > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
> > > change
> > > > proposed in the discussion thread (
> > > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > > > Would you let me know?
> > > >
> > > > Thanks,
> > > > Jon
> > > >
> > > >
> > > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin 
> wrote:
> > > >
> > > > > +Jon
> > > > >
> > > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin 
> > > wrote:
> > > > >
> > > > >> Thanks for the discussion and voting.
> > > > >>
> > > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> > > > >>
> > > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Becket,
> > > > >> >
> > > > >> > Thanks for the update. Yes, it does address my concern.
> > > > >> >
> > > > >> > +1 (binding)
> > > > >> >
> > > > >> > Regards,
> > > > >> >
> > > > >> > Rajini
> > > > >> >
> > > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <
> becket@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Actually returning an empty fetch request may still be useful
> to
> > > > >> reduce
> > > > >> > the
> > > > >> > > throttle time due to request quota violation because the
> > > > FetchResponse
> > > > >> > send
> > > > >> > > time will be less. I just updated the KIP.
> > > > >> > >
> > > > >> > > Rajini, does that address your concern?
> > > > >> > >
> > > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
> > becket@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Thanks for the reply, Jun.
> > > > >> > > >
> > > > >> > > > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-30 Thread Rajini Sivaram
Hi Jon,

We currently send two ApiVersions requests for SASL, the first with version
0 to get the request versions for SASL and another after authentication.
You are currently using the second and basing all throttling decisions on
that. This may get in the way if we wanted to change this code in the
future (e.g. throttle SASL requests or combine the two ApiVersions requests
into one). So if it is easy enough to check request versions against an
immutable map rather than track broker ApiVersions in a mutable set,
perhaps that would be better?

Hi Magnus, It will be good to know what works better for non-Java clients.


On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee  wrote:

> Thanks, Becket.
>
> Assuming that requiring new client implementations to issue
> ApiVersionsRequest before issuing any other requests is reasonable as you
> mentioned, I am wondering if keeping track of brokers that support KIP-219
> is actually simpler than keeping a map from each request type to its min
> version number supporting KIP-219 and checking the version of every single
> request to see whether or not to throttle from the client side. To me, it
> looks like a broker property that can be cached, and checking every single
> request seems excessive.
>
> Thanks,
> Jon
>
>
>
>
> On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin  wrote:
>
> > The reason we wanted to bump up all the request versions was let the
> > clients know whether the broker has already throttled the request or not.
> > This avoids double throttling in both brokers and clients, if the clients
> > implementation supports KIP-219.
> >
> > The new change uses only ApiVersionRequest, instead of all the requests,
> to
> > indicate whether the brokers have applied throttling or not. The
> difference
> > is that all the clients must issue ApiVersionRequest first before issuing
> > any other requests. This has no impact on the existing clients. But for
> > clients implementation that wants to support KIP-219, they need to follow
> > this practice, which seems to be reasonable.
> > Initially I thought the change is fine. However, thinking about this
> again,
> > that means the clients implementation needs to remember the ApiVersions
> of
> > each broker, which may have some complexity. Also this seems introduces
> the
> > behavior dependency between different requests, which seems unnecessary.
> >
> > Due to the above reasons, I think it might be better to bump up all the
> > request versions.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee 
> wrote:
> >
> > > Hello,
> > >
> > > Per Ismael's suggestion, I'd like to get comments from the original
> > voters
> > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
> > change
> > > proposed in the discussion thread (
> > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > > Would you let me know?
> > >
> > > Thanks,
> > > Jon
> > >
> > >
> > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin  wrote:
> > >
> > > > +Jon
> > > >
> > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin 
> > wrote:
> > > >
> > > >> Thanks for the discussion and voting.
> > > >>
> > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> > > >>
> > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Becket,
> > > >> >
> > > >> > Thanks for the update. Yes, it does address my concern.
> > > >> >
> > > >> > +1 (binding)
> > > >> >
> > > >> > Regards,
> > > >> >
> > > >> > Rajini
> > > >> >
> > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin  >
> > > >> wrote:
> > > >> >
> > > >> > > Actually returning an empty fetch request may still be useful to
> > > >> reduce
> > > >> > the
> > > >> > > throttle time due to request quota violation because the
> > > FetchResponse
> > > >> > send
> > > >> > > time will be less. I just updated the KIP.
> > > >> > >
> > > >> > > Rajini, does that address your concern?
> > > >> > >
> > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
> becket@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Thanks for the reply, Jun.
> > > >> > > >
> > > >> > > > Currently the byte rate quota does not apply to
> > HeartbeatRequest,
> > > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
> > > requests
> > > >> are
> > > >> > > > throttled is because the request quota is violated. In that
> > case,
> > > >> the
> > > >> > > > throttle time does not really matter whether we return a full
> > > >> > > FetchResponse
> > > >> > > > or an empty one. Would it be more consistent if we throttle
> > based
> > > on
> > > >> > the
> > > >> > > > actual throttle time / channel mute time?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-27 Thread Jonghyun Lee
Thanks, Becket.

Assuming that requiring new client implementations to issue
ApiVersionsRequest before issuing any other requests is reasonable as you
mentioned, I am wondering if keeping track of brokers that support KIP-219
is actually simpler than keeping a map from each request type to its min
version number supporting KIP-219 and checking the version of every single
request to see whether or not to throttle from the client side. To me, it
looks like a broker property that can be cached, and checking every single
request seems excessive.

Thanks,
Jon




On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin  wrote:

> The reason we wanted to bump up all the request versions was let the
> clients know whether the broker has already throttled the request or not.
> This avoids double throttling in both brokers and clients, if the clients
> implementation supports KIP-219.
>
> The new change uses only ApiVersionRequest, instead of all the requests, to
> indicate whether the brokers have applied throttling or not. The difference
> is that all the clients must issue ApiVersionRequest first before issuing
> any other requests. This has no impact on the existing clients. But for
> clients implementation that wants to support KIP-219, they need to follow
> this practice, which seems to be reasonable.
> Initially I thought the change is fine. However, thinking about this again,
> that means the clients implementation needs to remember the ApiVersions of
> each broker, which may have some complexity. Also this seems introduces the
> behavior dependency between different requests, which seems unnecessary.
>
> Due to the above reasons, I think it might be better to bump up all the
> request versions.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee  wrote:
>
> > Hello,
> >
> > Per Ismael's suggestion, I'd like to get comments from the original
> voters
> > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
> change
> > proposed in the discussion thread (
> > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > Would you let me know?
> >
> > Thanks,
> > Jon
> >
> >
> > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin  wrote:
> >
> > > +Jon
> > >
> > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin 
> wrote:
> > >
> > >> Thanks for the discussion and voting.
> > >>
> > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> > >>
> > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Becket,
> > >> >
> > >> > Thanks for the update. Yes, it does address my concern.
> > >> >
> > >> > +1 (binding)
> > >> >
> > >> > Regards,
> > >> >
> > >> > Rajini
> > >> >
> > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin 
> > >> wrote:
> > >> >
> > >> > > Actually returning an empty fetch request may still be useful to
> > >> reduce
> > >> > the
> > >> > > throttle time due to request quota violation because the
> > FetchResponse
> > >> > send
> > >> > > time will be less. I just updated the KIP.
> > >> > >
> > >> > > Rajini, does that address your concern?
> > >> > >
> > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin  >
> > >> > wrote:
> > >> > >
> > >> > > > Thanks for the reply, Jun.
> > >> > > >
> > >> > > > Currently the byte rate quota does not apply to
> HeartbeatRequest,
> > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
> > requests
> > >> are
> > >> > > > throttled is because the request quota is violated. In that
> case,
> > >> the
> > >> > > > throttle time does not really matter whether we return a full
> > >> > > FetchResponse
> > >> > > > or an empty one. Would it be more consistent if we throttle
> based
> > on
> > >> > the
> > >> > > > actual throttle time / channel mute time?
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jiangjie (Becket) Qin
> > >> > > >
> > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao 
> > wrote:
> > >> > > >
> > >> > > >> Hi, Jiangjie,
> > >> > > >>
> > >> > > >> You are right that the heartbeat is done in a channel different
> > >> from
> > >> > the
> > >> > > >> fetch request. I think it's still useful to return an empty
> fetch
> > >> > > response
> > >> > > >> when the quota is violated. This way, the throttle time for the
> > >> > > heartbeat
> > >> > > >> request won't be large. I agree that we can just mute the
> channel
> > >> for
> > >> > > the
> > >> > > >> fetch request for the throttle time computed based on a full
> > fetch
> > >> > > >> response. This probably also partially addresses Rajini's #1
> > >> concern.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >>
> > >> > > >> Jun
> > >> > > >>
> > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
> > becket@gmail.com>
> > >> > > wrote:
> > >> > > >>
> > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-27 Thread Becket Qin
The reason we wanted to bump up all the request versions was let the
clients know whether the broker has already throttled the request or not.
This avoids double throttling in both brokers and clients, if the clients
implementation supports KIP-219.

The new change uses only ApiVersionRequest, instead of all the requests, to
indicate whether the brokers have applied throttling or not. The difference
is that all the clients must issue ApiVersionRequest first before issuing
any other requests. This has no impact on the existing clients. But for
clients implementation that wants to support KIP-219, they need to follow
this practice, which seems to be reasonable.
Initially I thought the change is fine. However, thinking about this again,
that means the clients implementation needs to remember the ApiVersions of
each broker, which may have some complexity. Also this seems introduces the
behavior dependency between different requests, which seems unnecessary.

Due to the above reasons, I think it might be better to bump up all the
request versions.

Thanks,

Jiangjie (Becket) Qin


On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee  wrote:

> Hello,
>
> Per Ismael's suggestion, I'd like to get comments from the original voters
> for KIP-219 (Becket, Jun, Rajini) and others about the new interface change
> proposed in the discussion thread (
> https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> Would you let me know?
>
> Thanks,
> Jon
>
>
> On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin  wrote:
>
> > +Jon
> >
> > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin  wrote:
> >
> >> Thanks for the discussion and voting.
> >>
> >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> >>
> >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> >> wrote:
> >>
> >> > Hi Becket,
> >> >
> >> > Thanks for the update. Yes, it does address my concern.
> >> >
> >> > +1 (binding)
> >> >
> >> > Regards,
> >> >
> >> > Rajini
> >> >
> >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin 
> >> wrote:
> >> >
> >> > > Actually returning an empty fetch request may still be useful to
> >> reduce
> >> > the
> >> > > throttle time due to request quota violation because the
> FetchResponse
> >> > send
> >> > > time will be less. I just updated the KIP.
> >> > >
> >> > > Rajini, does that address your concern?
> >> > >
> >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin 
> >> > wrote:
> >> > >
> >> > > > Thanks for the reply, Jun.
> >> > > >
> >> > > > Currently the byte rate quota does not apply to HeartbeatRequest,
> >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
> requests
> >> are
> >> > > > throttled is because the request quota is violated. In that case,
> >> the
> >> > > > throttle time does not really matter whether we return a full
> >> > > FetchResponse
> >> > > > or an empty one. Would it be more consistent if we throttle based
> on
> >> > the
> >> > > > actual throttle time / channel mute time?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao 
> wrote:
> >> > > >
> >> > > >> Hi, Jiangjie,
> >> > > >>
> >> > > >> You are right that the heartbeat is done in a channel different
> >> from
> >> > the
> >> > > >> fetch request. I think it's still useful to return an empty fetch
> >> > > response
> >> > > >> when the quota is violated. This way, the throttle time for the
> >> > > heartbeat
> >> > > >> request won't be large. I agree that we can just mute the channel
> >> for
> >> > > the
> >> > > >> fetch request for the throttle time computed based on a full
> fetch
> >> > > >> response. This probably also partially addresses Rajini's #1
> >> concern.
> >> > > >>
> >> > > >> Thanks,
> >> > > >>
> >> > > >> Jun
> >> > > >>
> >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
> becket@gmail.com>
> >> > > wrote:
> >> > > >>
> >> > > >> > Hi Rajini,
> >> > > >> >
> >> > > >> > Thanks for the comments. Pleas see the reply inline.
> >> > > >> >
> >> > > >> > Hi Jun,
> >> > > >> >
> >> > > >> > Thinking about the consumer rebalance case a bit more, I am not
> >> sure
> >> > > if
> >> > > >> we
> >> > > >> > need to worry about the delayed rebalance due to quota
> violation
> >> or
> >> > > not.
> >> > > >> > The rebalance actually uses a separate channel to coordinator.
> >> > > Therefore
> >> > > >> > unless the request quota is hit, the rebalance won't be
> >> throttled.
> >> > > Even
> >> > > >> if
> >> > > >> > request quota is hit, it seems unlikely to be delayed long. So
> it
> >> > > looks
> >> > > >> > that we don't need to unmute the channel earlier than needed.
> >> Does
> >> > > this
> >> > > >> > address your concern?
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> >
> >> > > >> > Jiangjie (Becket) Qin
> >> 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-25 Thread Jonghyun Lee
Hello,

Per Ismael's suggestion, I'd like to get comments from the original voters
for KIP-219 (Becket, Jun, Rajini) and others about the new interface change
proposed in the discussion thread (
https://markmail.org/search/?q=kafka+KIP-219#query:kafka%20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
Would you let me know?

Thanks,
Jon


On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin  wrote:

> +Jon
>
> On Mon, 22 Jan 2018 at 10:38 AM Becket Qin  wrote:
>
>> Thanks for the discussion and voting.
>>
>> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>>
>> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram 
>> wrote:
>>
>> > Hi Becket,
>> >
>> > Thanks for the update. Yes, it does address my concern.
>> >
>> > +1 (binding)
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin 
>> wrote:
>> >
>> > > Actually returning an empty fetch request may still be useful to
>> reduce
>> > the
>> > > throttle time due to request quota violation because the FetchResponse
>> > send
>> > > time will be less. I just updated the KIP.
>> > >
>> > > Rajini, does that address your concern?
>> > >
>> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin 
>> > wrote:
>> > >
>> > > > Thanks for the reply, Jun.
>> > > >
>> > > > Currently the byte rate quota does not apply to HeartbeatRequest,
>> > > > JoinGroupRequest/SyncGroupRequest. So the only case those requests
>> are
>> > > > throttled is because the request quota is violated. In that case,
>> the
>> > > > throttle time does not really matter whether we return a full
>> > > FetchResponse
>> > > > or an empty one. Would it be more consistent if we throttle based on
>> > the
>> > > > actual throttle time / channel mute time?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
>> > > >
>> > > >> Hi, Jiangjie,
>> > > >>
>> > > >> You are right that the heartbeat is done in a channel different
>> from
>> > the
>> > > >> fetch request. I think it's still useful to return an empty fetch
>> > > response
>> > > >> when the quota is violated. This way, the throttle time for the
>> > > heartbeat
>> > > >> request won't be large. I agree that we can just mute the channel
>> for
>> > > the
>> > > >> fetch request for the throttle time computed based on a full fetch
>> > > >> response. This probably also partially addresses Rajini's #1
>> concern.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jun
>> > > >>
>> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin 
>> > > wrote:
>> > > >>
>> > > >> > Hi Rajini,
>> > > >> >
>> > > >> > Thanks for the comments. Pleas see the reply inline.
>> > > >> >
>> > > >> > Hi Jun,
>> > > >> >
>> > > >> > Thinking about the consumer rebalance case a bit more, I am not
>> sure
>> > > if
>> > > >> we
>> > > >> > need to worry about the delayed rebalance due to quota violation
>> or
>> > > not.
>> > > >> > The rebalance actually uses a separate channel to coordinator.
>> > > Therefore
>> > > >> > unless the request quota is hit, the rebalance won't be
>> throttled.
>> > > Even
>> > > >> if
>> > > >> > request quota is hit, it seems unlikely to be delayed long. So it
>> > > looks
>> > > >> > that we don't need to unmute the channel earlier than needed.
>> Does
>> > > this
>> > > >> > address your concern?
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > Jiangjie (Becket) Qin
>> > > >> >
>> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> > > >> rajinisiva...@gmail.com>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hi Becket,
>> > > >> > >
>> > > >> > > A few questions:
>> > > >> > >
>> > > >> > > 1. KIP says: *Although older client implementations (prior to
>> > > >> knowledge
>> > > >> > of
>> > > >> > > this KIP) will immediately send the next request after the
>> broker
>> > > >> > responds
>> > > >> > > without paying attention to the throttle time field, the
>> broker is
>> > > >> > > protected by virtue of muting the channel for time X. i.e., the
>> > next
>> > > >> > > request will not be processed until the channel is unmuted. *
>> > > >> > > For fetch requests, the response is sent immediately and the
>> mute
>> > > >> time on
>> > > >> > > the broker based on empty fetch response will often be zero
>> > (unlike
>> > > >> the
>> > > >> > > throttle time returned to the client based on non-empty
>> response).
>> > > >> Won't
>> > > >> > > that lead to a tight loop of fetch requests from old clients
>> > > >> > (particularly
>> > > >> > > expensive with SSL)? Wouldn't it be better to retain current
>> > > behaviour
>> > > >> > for
>> > > >> > > old clients? Also, if we change the behaviour for old clients,
>> > > client
>> > > >> > > metrics that track throttle time will report a lot of throttle
>> > > >> unrelated
>> > > >> > to
>> > > >> > >  actual throttle 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-04-25 Thread Dong Lin
+Jon

On Mon, 22 Jan 2018 at 10:38 AM Becket Qin  wrote:

> Thanks for the discussion and voting.
>
> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>
> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram 
> wrote:
>
> > Hi Becket,
> >
> > Thanks for the update. Yes, it does address my concern.
> >
> > +1 (binding)
> >
> > Regards,
> >
> > Rajini
> >
> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin 
> wrote:
> >
> > > Actually returning an empty fetch request may still be useful to reduce
> > the
> > > throttle time due to request quota violation because the FetchResponse
> > send
> > > time will be less. I just updated the KIP.
> > >
> > > Rajini, does that address your concern?
> > >
> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin 
> > wrote:
> > >
> > > > Thanks for the reply, Jun.
> > > >
> > > > Currently the byte rate quota does not apply to HeartbeatRequest,
> > > > JoinGroupRequest/SyncGroupRequest. So the only case those requests
> are
> > > > throttled is because the request quota is violated. In that case, the
> > > > throttle time does not really matter whether we return a full
> > > FetchResponse
> > > > or an empty one. Would it be more consistent if we throttle based on
> > the
> > > > actual throttle time / channel mute time?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
> > > >
> > > >> Hi, Jiangjie,
> > > >>
> > > >> You are right that the heartbeat is done in a channel different from
> > the
> > > >> fetch request. I think it's still useful to return an empty fetch
> > > response
> > > >> when the quota is violated. This way, the throttle time for the
> > > heartbeat
> > > >> request won't be large. I agree that we can just mute the channel
> for
> > > the
> > > >> fetch request for the throttle time computed based on a full fetch
> > > >> response. This probably also partially addresses Rajini's #1
> concern.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin 
> > > wrote:
> > > >>
> > > >> > Hi Rajini,
> > > >> >
> > > >> > Thanks for the comments. Pleas see the reply inline.
> > > >> >
> > > >> > Hi Jun,
> > > >> >
> > > >> > Thinking about the consumer rebalance case a bit more, I am not
> sure
> > > if
> > > >> we
> > > >> > need to worry about the delayed rebalance due to quota violation
> or
> > > not.
> > > >> > The rebalance actually uses a separate channel to coordinator.
> > > Therefore
> > > >> > unless the request quota is hit, the rebalance won't be throttled.
> > > Even
> > > >> if
> > > >> > request quota is hit, it seems unlikely to be delayed long. So it
> > > looks
> > > >> > that we don't need to unmute the channel earlier than needed. Does
> > > this
> > > >> > address your concern?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > > >> rajinisiva...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Becket,
> > > >> > >
> > > >> > > A few questions:
> > > >> > >
> > > >> > > 1. KIP says: *Although older client implementations (prior to
> > > >> knowledge
> > > >> > of
> > > >> > > this KIP) will immediately send the next request after the
> broker
> > > >> > responds
> > > >> > > without paying attention to the throttle time field, the broker
> is
> > > >> > > protected by virtue of muting the channel for time X. i.e., the
> > next
> > > >> > > request will not be processed until the channel is unmuted. *
> > > >> > > For fetch requests, the response is sent immediately and the
> mute
> > > >> time on
> > > >> > > the broker based on empty fetch response will often be zero
> > (unlike
> > > >> the
> > > >> > > throttle time returned to the client based on non-empty
> response).
> > > >> Won't
> > > >> > > that lead to a tight loop of fetch requests from old clients
> > > >> > (particularly
> > > >> > > expensive with SSL)? Wouldn't it be better to retain current
> > > behaviour
> > > >> > for
> > > >> > > old clients? Also, if we change the behaviour for old clients,
> > > client
> > > >> > > metrics that track throttle time will report a lot of throttle
> > > >> unrelated
> > > >> > to
> > > >> > >  actual throttle time.
> > > >> > >
> > > >> > For consumers, if quota is violated, the throttle time on the
> broker
> > > >> will
> > > >> > not be 0. It is just that the throttle time will not be increasing
> > > >> because
> > > >> > the consumer will return an empty response in this case. So there
> > > should
> > > >> > not be a tight loop.
> > > >> >
> > > >> >
> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
> > connections.max.idle.ms
> > > >> > >  will still be honored during
> the
> > > >> > throttle
> > > >> > > time X. This makes sure 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-22 Thread Becket Qin
Thanks for the discussion and voting.

KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).

On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram 
wrote:

> Hi Becket,
>
> Thanks for the update. Yes, it does address my concern.
>
> +1 (binding)
>
> Regards,
>
> Rajini
>
> On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin  wrote:
>
> > Actually returning an empty fetch request may still be useful to reduce
> the
> > throttle time due to request quota violation because the FetchResponse
> send
> > time will be less. I just updated the KIP.
> >
> > Rajini, does that address your concern?
> >
> > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin 
> wrote:
> >
> > > Thanks for the reply, Jun.
> > >
> > > Currently the byte rate quota does not apply to HeartbeatRequest,
> > > JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> > > throttled is because the request quota is violated. In that case, the
> > > throttle time does not really matter whether we return a full
> > FetchResponse
> > > or an empty one. Would it be more consistent if we throttle based on
> the
> > > actual throttle time / channel mute time?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
> > >
> > >> Hi, Jiangjie,
> > >>
> > >> You are right that the heartbeat is done in a channel different from
> the
> > >> fetch request. I think it's still useful to return an empty fetch
> > response
> > >> when the quota is violated. This way, the throttle time for the
> > heartbeat
> > >> request won't be large. I agree that we can just mute the channel for
> > the
> > >> fetch request for the throttle time computed based on a full fetch
> > >> response. This probably also partially addresses Rajini's #1 concern.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin 
> > wrote:
> > >>
> > >> > Hi Rajini,
> > >> >
> > >> > Thanks for the comments. Pleas see the reply inline.
> > >> >
> > >> > Hi Jun,
> > >> >
> > >> > Thinking about the consumer rebalance case a bit more, I am not sure
> > if
> > >> we
> > >> > need to worry about the delayed rebalance due to quota violation or
> > not.
> > >> > The rebalance actually uses a separate channel to coordinator.
> > Therefore
> > >> > unless the request quota is hit, the rebalance won't be throttled.
> > Even
> > >> if
> > >> > request quota is hit, it seems unlikely to be delayed long. So it
> > looks
> > >> > that we don't need to unmute the channel earlier than needed. Does
> > this
> > >> > address your concern?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > >> rajinisiva...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Becket,
> > >> > >
> > >> > > A few questions:
> > >> > >
> > >> > > 1. KIP says: *Although older client implementations (prior to
> > >> knowledge
> > >> > of
> > >> > > this KIP) will immediately send the next request after the broker
> > >> > responds
> > >> > > without paying attention to the throttle time field, the broker is
> > >> > > protected by virtue of muting the channel for time X. i.e., the
> next
> > >> > > request will not be processed until the channel is unmuted. *
> > >> > > For fetch requests, the response is sent immediately and the mute
> > >> time on
> > >> > > the broker based on empty fetch response will often be zero
> (unlike
> > >> the
> > >> > > throttle time returned to the client based on non-empty response).
> > >> Won't
> > >> > > that lead to a tight loop of fetch requests from old clients
> > >> > (particularly
> > >> > > expensive with SSL)? Wouldn't it be better to retain current
> > behaviour
> > >> > for
> > >> > > old clients? Also, if we change the behaviour for old clients,
> > client
> > >> > > metrics that track throttle time will report a lot of throttle
> > >> unrelated
> > >> > to
> > >> > >  actual throttle time.
> > >> > >
> > >> > For consumers, if quota is violated, the throttle time on the broker
> > >> will
> > >> > not be 0. It is just that the throttle time will not be increasing
> > >> because
> > >> > the consumer will return an empty response in this case. So there
> > should
> > >> > not be a tight loop.
> > >> >
> > >> >
> > >> > > 2. KIP says: *The usual idle timeout i.e.,
> connections.max.idle.ms
> > >> > >  will still be honored during the
> > >> > throttle
> > >> > > time X. This makes sure that the brokers will detect client
> > connection
> > >> > > closure in a bounded time.*
> > >> > > Wouldn't it be better to bound maximum throttle time to
> > >> > > *connections.max.idle.ms
> > >> > > *? If we mute for a time greater
> > than
> > >> > > *connections.max.idle.ms
> > >> > > * and then close a client
> > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-18 Thread Rajini Sivaram
Hi Becket,

Thanks for the update. Yes, it does address my concern.

+1 (binding)

Regards,

Rajini

On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin  wrote:

> Actually returning an empty fetch request may still be useful to reduce the
> throttle time due to request quota violation because the FetchResponse send
> time will be less. I just updated the KIP.
>
> Rajini, does that address your concern?
>
> On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin  wrote:
>
> > Thanks for the reply, Jun.
> >
> > Currently the byte rate quota does not apply to HeartbeatRequest,
> > JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> > throttled is because the request quota is violated. In that case, the
> > throttle time does not really matter whether we return a full
> FetchResponse
> > or an empty one. Would it be more consistent if we throttle based on the
> > actual throttle time / channel mute time?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
> >
> >> Hi, Jiangjie,
> >>
> >> You are right that the heartbeat is done in a channel different from the
> >> fetch request. I think it's still useful to return an empty fetch
> response
> >> when the quota is violated. This way, the throttle time for the
> heartbeat
> >> request won't be large. I agree that we can just mute the channel for
> the
> >> fetch request for the throttle time computed based on a full fetch
> >> response. This probably also partially addresses Rajini's #1 concern.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin 
> wrote:
> >>
> >> > Hi Rajini,
> >> >
> >> > Thanks for the comments. Pleas see the reply inline.
> >> >
> >> > Hi Jun,
> >> >
> >> > Thinking about the consumer rebalance case a bit more, I am not sure
> if
> >> we
> >> > need to worry about the delayed rebalance due to quota violation or
> not.
> >> > The rebalance actually uses a separate channel to coordinator.
> Therefore
> >> > unless the request quota is hit, the rebalance won't be throttled.
> Even
> >> if
> >> > request quota is hit, it seems unlikely to be delayed long. So it
> looks
> >> > that we don't need to unmute the channel earlier than needed. Does
> this
> >> > address your concern?
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> >> rajinisiva...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Becket,
> >> > >
> >> > > A few questions:
> >> > >
> >> > > 1. KIP says: *Although older client implementations (prior to
> >> knowledge
> >> > of
> >> > > this KIP) will immediately send the next request after the broker
> >> > responds
> >> > > without paying attention to the throttle time field, the broker is
> >> > > protected by virtue of muting the channel for time X. i.e., the next
> >> > > request will not be processed until the channel is unmuted. *
> >> > > For fetch requests, the response is sent immediately and the mute
> >> time on
> >> > > the broker based on empty fetch response will often be zero (unlike
> >> the
> >> > > throttle time returned to the client based on non-empty response).
> >> Won't
> >> > > that lead to a tight loop of fetch requests from old clients
> >> > (particularly
> >> > > expensive with SSL)? Wouldn't it be better to retain current
> behaviour
> >> > for
> >> > > old clients? Also, if we change the behaviour for old clients,
> client
> >> > > metrics that track throttle time will report a lot of throttle
> >> unrelated
> >> > to
> >> > >  actual throttle time.
> >> > >
> >> > For consumers, if quota is violated, the throttle time on the broker
> >> will
> >> > not be 0. It is just that the throttle time will not be increasing
> >> because
> >> > the consumer will return an empty response in this case. So there
> should
> >> > not be a tight loop.
> >> >
> >> >
> >> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> >> > >  will still be honored during the
> >> > throttle
> >> > > time X. This makes sure that the brokers will detect client
> connection
> >> > > closure in a bounded time.*
> >> > > Wouldn't it be better to bound maximum throttle time to
> >> > > *connections.max.idle.ms
> >> > > *? If we mute for a time greater
> than
> >> > > *connections.max.idle.ms
> >> > > * and then close a client
> connection
> >> > > simply
> >> > > because we had muted it on the broker for a longer throttle time, we
> >> > force
> >> > > a reconnection and read the next request from the new connection
> >> straight
> >> > > away. This feels the same as a bound on the quota of *
> >> > > connections.max.idle.ms
> >> > > *, but with added load on the
> broker
> >> for
> >> > > authenticating another connection (expensive with SSL).
> >> > >
> >> 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-17 Thread Becket Qin
Actually returning an empty fetch request may still be useful to reduce the
throttle time due to request quota violation because the FetchResponse send
time will be less. I just updated the KIP.

Rajini, does that address your concern?

On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin  wrote:

> Thanks for the reply, Jun.
>
> Currently the byte rate quota does not apply to HeartbeatRequest,
> JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> throttled is because the request quota is violated. In that case, the
> throttle time does not really matter whether we return a full FetchResponse
> or an empty one. Would it be more consistent if we throttle based on the
> actual throttle time / channel mute time?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
>
>> Hi, Jiangjie,
>>
>> You are right that the heartbeat is done in a channel different from the
>> fetch request. I think it's still useful to return an empty fetch response
>> when the quota is violated. This way, the throttle time for the heartbeat
>> request won't be large. I agree that we can just mute the channel for the
>> fetch request for the throttle time computed based on a full fetch
>> response. This probably also partially addresses Rajini's #1 concern.
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin  wrote:
>>
>> > Hi Rajini,
>> >
>> > Thanks for the comments. Pleas see the reply inline.
>> >
>> > Hi Jun,
>> >
>> > Thinking about the consumer rebalance case a bit more, I am not sure if
>> we
>> > need to worry about the delayed rebalance due to quota violation or not.
>> > The rebalance actually uses a separate channel to coordinator. Therefore
>> > unless the request quota is hit, the rebalance won't be throttled. Even
>> if
>> > request quota is hit, it seems unlikely to be delayed long. So it looks
>> > that we don't need to unmute the channel earlier than needed. Does this
>> > address your concern?
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> rajinisiva...@gmail.com>
>> > wrote:
>> >
>> > > Hi Becket,
>> > >
>> > > A few questions:
>> > >
>> > > 1. KIP says: *Although older client implementations (prior to
>> knowledge
>> > of
>> > > this KIP) will immediately send the next request after the broker
>> > responds
>> > > without paying attention to the throttle time field, the broker is
>> > > protected by virtue of muting the channel for time X. i.e., the next
>> > > request will not be processed until the channel is unmuted. *
>> > > For fetch requests, the response is sent immediately and the mute
>> time on
>> > > the broker based on empty fetch response will often be zero (unlike
>> the
>> > > throttle time returned to the client based on non-empty response).
>> Won't
>> > > that lead to a tight loop of fetch requests from old clients
>> > (particularly
>> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
>> > for
>> > > old clients? Also, if we change the behaviour for old clients, client
>> > > metrics that track throttle time will report a lot of throttle
>> unrelated
>> > to
>> > >  actual throttle time.
>> > >
>> > For consumers, if quota is violated, the throttle time on the broker
>> will
>> > not be 0. It is just that the throttle time will not be increasing
>> because
>> > the consumer will return an empty response in this case. So there should
>> > not be a tight loop.
>> >
>> >
>> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
>> > >  will still be honored during the
>> > throttle
>> > > time X. This makes sure that the brokers will detect client connection
>> > > closure in a bounded time.*
>> > > Wouldn't it be better to bound maximum throttle time to
>> > > *connections.max.idle.ms
>> > > *? If we mute for a time greater than
>> > > *connections.max.idle.ms
>> > > * and then close a client connection
>> > > simply
>> > > because we had muted it on the broker for a longer throttle time, we
>> > force
>> > > a reconnection and read the next request from the new connection
>> straight
>> > > away. This feels the same as a bound on the quota of *
>> > > connections.max.idle.ms
>> > > *, but with added load on the broker
>> for
>> > > authenticating another connection (expensive with SSL).
>> > >
>> > I think we need to think about the consumer prior to and after this KIP
>> > separately.
>> >
>> > For consumer prior to this KIP, even if we unmute the channel after
>> > connection.max.idle.ms, it is likely that the consumers have already
>> > reached request.timeout.ms before that and has reconnected to the
>> broker.
>> > So there is no real difference whether we close the throttled channel or
>> > not.
>> >
>> > For consumers after the 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-17 Thread Becket Qin
Actually returning an empty fetch request may still be useful to reduce the
throttle time due to request quota violation because the FetchResponse send
time will be less. I just updated the KIP.

Rajini, does that address your concern?

On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin  wrote:

> Thanks for the reply, Jun.
>
> Currently the byte rate quota does not apply to HeartbeatRequest,
> JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> throttled is because the request quota is violated. In that case, the
> throttle time does not really matter whether we return a full FetchResponse
> or an empty one. Would it be more consistent if we throttle based on the
> actual throttle time / channel mute time?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:
>
>> Hi, Jiangjie,
>>
>> You are right that the heartbeat is done in a channel different from the
>> fetch request. I think it's still useful to return an empty fetch response
>> when the quota is violated. This way, the throttle time for the heartbeat
>> request won't be large. I agree that we can just mute the channel for the
>> fetch request for the throttle time computed based on a full fetch
>> response. This probably also partially addresses Rajini's #1 concern.
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin  wrote:
>>
>> > Hi Rajini,
>> >
>> > Thanks for the comments. Pleas see the reply inline.
>> >
>> > Hi Jun,
>> >
>> > Thinking about the consumer rebalance case a bit more, I am not sure if
>> we
>> > need to worry about the delayed rebalance due to quota violation or not.
>> > The rebalance actually uses a separate channel to coordinator. Therefore
>> > unless the request quota is hit, the rebalance won't be throttled. Even
>> if
>> > request quota is hit, it seems unlikely to be delayed long. So it looks
>> > that we don't need to unmute the channel earlier than needed. Does this
>> > address your concern?
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> rajinisiva...@gmail.com>
>> > wrote:
>> >
>> > > Hi Becket,
>> > >
>> > > A few questions:
>> > >
>> > > 1. KIP says: *Although older client implementations (prior to
>> knowledge
>> > of
>> > > this KIP) will immediately send the next request after the broker
>> > responds
>> > > without paying attention to the throttle time field, the broker is
>> > > protected by virtue of muting the channel for time X. i.e., the next
>> > > request will not be processed until the channel is unmuted. *
>> > > For fetch requests, the response is sent immediately and the mute
>> time on
>> > > the broker based on empty fetch response will often be zero (unlike
>> the
>> > > throttle time returned to the client based on non-empty response).
>> Won't
>> > > that lead to a tight loop of fetch requests from old clients
>> > (particularly
>> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
>> > for
>> > > old clients? Also, if we change the behaviour for old clients, client
>> > > metrics that track throttle time will report a lot of throttle
>> unrelated
>> > to
>> > >  actual throttle time.
>> > >
>> > For consumers, if quota is violated, the throttle time on the broker
>> will
>> > not be 0. It is just that the throttle time will not be increasing
>> because
>> > the consumer will return an empty response in this case. So there should
>> > not be a tight loop.
>> >
>> >
>> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
>> > >  will still be honored during the
>> > throttle
>> > > time X. This makes sure that the brokers will detect client connection
>> > > closure in a bounded time.*
>> > > Wouldn't it be better to bound maximum throttle time to
>> > > *connections.max.idle.ms
>> > > *? If we mute for a time greater than
>> > > *connections.max.idle.ms
>> > > * and then close a client connection
>> > > simply
>> > > because we had muted it on the broker for a longer throttle time, we
>> > force
>> > > a reconnection and read the next request from the new connection
>> straight
>> > > away. This feels the same as a bound on the quota of *
>> > > connections.max.idle.ms
>> > > *, but with added load on the broker
>> for
>> > > authenticating another connection (expensive with SSL).
>> > >
>> > I think we need to think about the consumer prior to and after this KIP
>> > separately.
>> >
>> > For consumer prior to this KIP, even if we unmute the channel after
>> > connection.max.idle.ms, it is likely that the consumers have already
>> > reached request.timeout.ms before that and has reconnected to the
>> broker.
>> > So there is no real difference whether we close the throttled channel or
>> > not.
>> >
>> > For consumers after the 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-16 Thread Becket Qin
Thanks for the reply, Jun.

Currently the byte rate quota does not apply to HeartbeatRequest,
JoinGroupRequest/SyncGroupRequest. So the only case those requests are
throttled is because the request quota is violated. In that case, the
throttle time does not really matter whether we return a full FetchResponse
or an empty one. Would it be more consistent if we throttle based on the
actual throttle time / channel mute time?

Thanks,

Jiangjie (Becket) Qin

On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao  wrote:

> Hi, Jiangjie,
>
> You are right that the heartbeat is done in a channel different from the
> fetch request. I think it's still useful to return an empty fetch response
> when the quota is violated. This way, the throttle time for the heartbeat
> request won't be large. I agree that we can just mute the channel for the
> fetch request for the throttle time computed based on a full fetch
> response. This probably also partially addresses Rajini's #1 concern.
>
> Thanks,
>
> Jun
>
> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin  wrote:
>
> > Hi Rajini,
> >
> > Thanks for the comments. Pleas see the reply inline.
> >
> > Hi Jun,
> >
> > Thinking about the consumer rebalance case a bit more, I am not sure if
> we
> > need to worry about the delayed rebalance due to quota violation or not.
> > The rebalance actually uses a separate channel to coordinator. Therefore
> > unless the request quota is hit, the rebalance won't be throttled. Even
> if
> > request quota is hit, it seems unlikely to be delayed long. So it looks
> > that we don't need to unmute the channel earlier than needed. Does this
> > address your concern?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Hi Becket,
> > >
> > > A few questions:
> > >
> > > 1. KIP says: *Although older client implementations (prior to knowledge
> > of
> > > this KIP) will immediately send the next request after the broker
> > responds
> > > without paying attention to the throttle time field, the broker is
> > > protected by virtue of muting the channel for time X. i.e., the next
> > > request will not be processed until the channel is unmuted. *
> > > For fetch requests, the response is sent immediately and the mute time
> on
> > > the broker based on empty fetch response will often be zero (unlike the
> > > throttle time returned to the client based on non-empty response).
> Won't
> > > that lead to a tight loop of fetch requests from old clients
> > (particularly
> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
> > for
> > > old clients? Also, if we change the behaviour for old clients, client
> > > metrics that track throttle time will report a lot of throttle
> unrelated
> > to
> > >  actual throttle time.
> > >
> > For consumers, if quota is violated, the throttle time on the broker will
> > not be 0. It is just that the throttle time will not be increasing
> because
> > the consumer will return an empty response in this case. So there should
> > not be a tight loop.
> >
> >
> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> > >  will still be honored during the
> > throttle
> > > time X. This makes sure that the brokers will detect client connection
> > > closure in a bounded time.*
> > > Wouldn't it be better to bound maximum throttle time to
> > > *connections.max.idle.ms
> > > *? If we mute for a time greater than
> > > *connections.max.idle.ms
> > > * and then close a client connection
> > > simply
> > > because we had muted it on the broker for a longer throttle time, we
> > force
> > > a reconnection and read the next request from the new connection
> straight
> > > away. This feels the same as a bound on the quota of *
> > > connections.max.idle.ms
> > > *, but with added load on the broker
> for
> > > authenticating another connection (expensive with SSL).
> > >
> > I think we need to think about the consumer prior to and after this KIP
> > separately.
> >
> > For consumer prior to this KIP, even if we unmute the channel after
> > connection.max.idle.ms, it is likely that the consumers have already
> > reached request.timeout.ms before that and has reconnected to the
> broker.
> > So there is no real difference whether we close the throttled channel or
> > not.
> >
> > For consumers after the KIP, because they will honor the throttle time,
> > they will back off until throttle time is reached. If that throttle time
> is
> > longer than connection.max.idle.ms, it seems not a big overhead because
> > there will only be one connection re-establishment in quite a few
> minutes.
> > Compared with such overhead, it seems more important to honor the quota
> so
> > the broker can survive.
> >
> >
> > > 3. Are we changing the behaviour of 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-16 Thread Jun Rao
Hi, Jiangjie,

You are right that the heartbeat is done in a channel different from the
fetch request. I think it's still useful to return an empty fetch response
when the quota is violated. This way, the throttle time for the heartbeat
request won't be large. I agree that we can just mute the channel for the
fetch request for the throttle time computed based on a full fetch
response. This probably also partially addresses Rajini's #1 concern.

Thanks,

Jun

On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin  wrote:

> Hi Rajini,
>
> Thanks for the comments. Pleas see the reply inline.
>
> Hi Jun,
>
> Thinking about the consumer rebalance case a bit more, I am not sure if we
> need to worry about the delayed rebalance due to quota violation or not.
> The rebalance actually uses a separate channel to coordinator. Therefore
> unless the request quota is hit, the rebalance won't be throttled. Even if
> request quota is hit, it seems unlikely to be delayed long. So it looks
> that we don't need to unmute the channel earlier than needed. Does this
> address your concern?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram 
> wrote:
>
> > Hi Becket,
> >
> > A few questions:
> >
> > 1. KIP says: *Although older client implementations (prior to knowledge
> of
> > this KIP) will immediately send the next request after the broker
> responds
> > without paying attention to the throttle time field, the broker is
> > protected by virtue of muting the channel for time X. i.e., the next
> > request will not be processed until the channel is unmuted. *
> > For fetch requests, the response is sent immediately and the mute time on
> > the broker based on empty fetch response will often be zero (unlike the
> > throttle time returned to the client based on non-empty response). Won't
> > that lead to a tight loop of fetch requests from old clients
> (particularly
> > expensive with SSL)? Wouldn't it be better to retain current behaviour
> for
> > old clients? Also, if we change the behaviour for old clients, client
> > metrics that track throttle time will report a lot of throttle unrelated
> to
> >  actual throttle time.
> >
> For consumers, if quota is violated, the throttle time on the broker will
> not be 0. It is just that the throttle time will not be increasing because
> the consumer will return an empty response in this case. So there should
> not be a tight loop.
>
>
> > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> >  will still be honored during the
> throttle
> > time X. This makes sure that the brokers will detect client connection
> > closure in a bounded time.*
> > Wouldn't it be better to bound maximum throttle time to
> > *connections.max.idle.ms
> > *? If we mute for a time greater than
> > *connections.max.idle.ms
> > * and then close a client connection
> > simply
> > because we had muted it on the broker for a longer throttle time, we
> force
> > a reconnection and read the next request from the new connection straight
> > away. This feels the same as a bound on the quota of *
> > connections.max.idle.ms
> > *, but with added load on the broker for
> > authenticating another connection (expensive with SSL).
> >
> I think we need to think about the consumer prior to and after this KIP
> separately.
>
> For consumer prior to this KIP, even if we unmute the channel after
> connection.max.idle.ms, it is likely that the consumers have already
> reached request.timeout.ms before that and has reconnected to the broker.
> So there is no real difference whether we close the throttled channel or
> not.
>
> For consumers after the KIP, because they will honor the throttle time,
> they will back off until throttle time is reached. If that throttle time is
> longer than connection.max.idle.ms, it seems not a big overhead because
> there will only be one connection re-establishment in quite a few minutes.
> Compared with such overhead, it seems more important to honor the quota so
> the broker can survive.
>
>
> > 3. Are we changing the behaviour of network bandwidth quota for
> > Produce/Fetch and retaining current behaviour for request quotas?
> >
> This is going to be applied to all the quota. Including request quotas.
>
>
> >
> > Thanks,
> >
> > Rajini
> >
> >
> >
> > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao  wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Thanks for the updated KIP. +1
> > >
> > > Jun
> > >
> > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin 
> wrote:
> > >
> > > > Thanks for the comments, Jun.
> > > >
> > > > 1. Good point.
> > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> enough
> > > so
> > > > the throttling is impacted.
> > > >
> > > > I have updated the KIP to reflect the changes.
> > > >
> > > > Thanks,
> > > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-15 Thread Becket Qin
Hi Rajini,

Thanks for the comments. Pleas see the reply inline.

Hi Jun,

Thinking about the consumer rebalance case a bit more, I am not sure if we
need to worry about the delayed rebalance due to quota violation or not.
The rebalance actually uses a separate channel to coordinator. Therefore
unless the request quota is hit, the rebalance won't be throttled. Even if
request quota is hit, it seems unlikely to be delayed long. So it looks
that we don't need to unmute the channel earlier than needed. Does this
address your concern?

Thanks,

Jiangjie (Becket) Qin

On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram 
wrote:

> Hi Becket,
>
> A few questions:
>
> 1. KIP says: *Although older client implementations (prior to knowledge of
> this KIP) will immediately send the next request after the broker responds
> without paying attention to the throttle time field, the broker is
> protected by virtue of muting the channel for time X. i.e., the next
> request will not be processed until the channel is unmuted. *
> For fetch requests, the response is sent immediately and the mute time on
> the broker based on empty fetch response will often be zero (unlike the
> throttle time returned to the client based on non-empty response). Won't
> that lead to a tight loop of fetch requests from old clients (particularly
> expensive with SSL)? Wouldn't it be better to retain current behaviour for
> old clients? Also, if we change the behaviour for old clients, client
> metrics that track throttle time will report a lot of throttle unrelated to
>  actual throttle time.
>
For consumers, if quota is violated, the throttle time on the broker will
not be 0. It is just that the throttle time will not be increasing because
the consumer will return an empty response in this case. So there should
not be a tight loop.


> 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
>  will still be honored during the throttle
> time X. This makes sure that the brokers will detect client connection
> closure in a bounded time.*
> Wouldn't it be better to bound maximum throttle time to
> *connections.max.idle.ms
> *? If we mute for a time greater than
> *connections.max.idle.ms
> * and then close a client connection
> simply
> because we had muted it on the broker for a longer throttle time, we force
> a reconnection and read the next request from the new connection straight
> away. This feels the same as a bound on the quota of *
> connections.max.idle.ms
> *, but with added load on the broker for
> authenticating another connection (expensive with SSL).
>
I think we need to think about the consumer prior to and after this KIP
separately.

For consumer prior to this KIP, even if we unmute the channel after
connection.max.idle.ms, it is likely that the consumers have already
reached request.timeout.ms before that and has reconnected to the broker.
So there is no real difference whether we close the throttled channel or
not.

For consumers after the KIP, because they will honor the throttle time,
they will back off until throttle time is reached. If that throttle time is
longer than connection.max.idle.ms, it seems not a big overhead because
there will only be one connection re-establishment in quite a few minutes.
Compared with such overhead, it seems more important to honor the quota so
the broker can survive.


> 3. Are we changing the behaviour of network bandwidth quota for
> Produce/Fetch and retaining current behaviour for request quotas?
>
This is going to be applied to all the quota. Including request quotas.


>
> Thanks,
>
> Rajini
>
>
>
> On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao  wrote:
>
> > Hi, Jiangjie,
> >
> > Thanks for the updated KIP. +1
> >
> > Jun
> >
> > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin  wrote:
> >
> > > Thanks for the comments, Jun.
> > >
> > > 1. Good point.
> > > 2. Also makes sense. Usually the connection.max.idle.ms is high enough
> > so
> > > the throttling is impacted.
> > >
> > > I have updated the KIP to reflect the changes.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao  wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > Sorry for the late response. The proposal sounds good overall. A
> couple
> > > of
> > > > minor comments below.
> > > >
> > > > 1. For throttling a fetch request, we could potentially just send an
> > > empty
> > > > response. We can return a throttle time calculated from a full
> > response,
> > > > but only mute the channel on the server based on a throttle time
> > > calculated
> > > > based on the empty response. This has the benefit that the server
> will
> > > mute
> > > > the channel much shorter, which will prevent the consumer from
> > > rebalancing
> > > > when throttled.
> > > >
> > > > 

Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-15 Thread Rajini Sivaram
Hi Becket,

A few questions:

1. KIP says: *Although older client implementations (prior to knowledge of
this KIP) will immediately send the next request after the broker responds
without paying attention to the throttle time field, the broker is
protected by virtue of muting the channel for time X. i.e., the next
request will not be processed until the channel is unmuted. *
For fetch requests, the response is sent immediately and the mute time on
the broker based on empty fetch response will often be zero (unlike the
throttle time returned to the client based on non-empty response). Won't
that lead to a tight loop of fetch requests from old clients (particularly
expensive with SSL)? Wouldn't it be better to retain current behaviour for
old clients? Also, if we change the behaviour for old clients, client
metrics that track throttle time will report a lot of throttle unrelated to
 actual throttle time.

2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
 will still be honored during the throttle
time X. This makes sure that the brokers will detect client connection
closure in a bounded time.*
Wouldn't it be better to bound maximum throttle time to
*connections.max.idle.ms
*? If we mute for a time greater than
*connections.max.idle.ms
* and then close a client connection simply
because we had muted it on the broker for a longer throttle time, we force
a reconnection and read the next request from the new connection straight
away. This feels the same as a bound on the quota of *connections.max.idle.ms
*, but with added load on the broker for
authenticating another connection (expensive with SSL).

3. Are we changing the behaviour of network bandwidth quota for
Produce/Fetch and retaining current behaviour for request quotas?

Thanks,

Rajini



On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao  wrote:

> Hi, Jiangjie,
>
> Thanks for the updated KIP. +1
>
> Jun
>
> On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin  wrote:
>
> > Thanks for the comments, Jun.
> >
> > 1. Good point.
> > 2. Also makes sense. Usually the connection.max.idle.ms is high enough
> so
> > the throttling is impacted.
> >
> > I have updated the KIP to reflect the changes.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao  wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Sorry for the late response. The proposal sounds good overall. A couple
> > of
> > > minor comments below.
> > >
> > > 1. For throttling a fetch request, we could potentially just send an
> > empty
> > > response. We can return a throttle time calculated from a full
> response,
> > > but only mute the channel on the server based on a throttle time
> > calculated
> > > based on the empty response. This has the benefit that the server will
> > mute
> > > the channel much shorter, which will prevent the consumer from
> > rebalancing
> > > when throttled.
> > >
> > > 2. The wiki says "connections.max.idle.ms should be ignored during the
> > > throttle time X." This has the potential issue that a server may not
> > detect
> > > that a client connection is already gone until after an arbitrary
> amount
> > of
> > > time. Perhaps we could still just close a connection if the server has
> > > muted it for longer than connections.max.idle.ms. This will at least
> > bound
> > > the time for a server to detect closed client connections.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We would like to start the voting thread for KIP-219. The KIP
> proposes
> > to
> > > > improve the quota communication between the brokers and clients,
> > > especially
> > > > for cases of long throttling time.
> > > >
> > > > The KIP wiki is following:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 219+-+Improve+quota+
> > > > communication
> > > >
> > > > The discussion thread is here:
> > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > >
> >
>


Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-10 Thread Jun Rao
Hi, Jiangjie,

Thanks for the updated KIP. +1

Jun

On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin  wrote:

> Thanks for the comments, Jun.
>
> 1. Good point.
> 2. Also makes sense. Usually the connection.max.idle.ms is high enough so
> the throttling is impacted.
>
> I have updated the KIP to reflect the changes.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao  wrote:
>
> > Hi, Jiangjie,
> >
> > Sorry for the late response. The proposal sounds good overall. A couple
> of
> > minor comments below.
> >
> > 1. For throttling a fetch request, we could potentially just send an
> empty
> > response. We can return a throttle time calculated from a full response,
> > but only mute the channel on the server based on a throttle time
> calculated
> > based on the empty response. This has the benefit that the server will
> mute
> > the channel much shorter, which will prevent the consumer from
> rebalancing
> > when throttled.
> >
> > 2. The wiki says "connections.max.idle.ms should be ignored during the
> > throttle time X." This has the potential issue that a server may not
> detect
> > that a client connection is already gone until after an arbitrary amount
> of
> > time. Perhaps we could still just close a connection if the server has
> > muted it for longer than connections.max.idle.ms. This will at least
> bound
> > the time for a server to detect closed client connections.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin 
> wrote:
> >
> > > Hi,
> > >
> > > We would like to start the voting thread for KIP-219. The KIP proposes
> to
> > > improve the quota communication between the brokers and clients,
> > especially
> > > for cases of long throttling time.
> > >
> > > The KIP wiki is following:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 219+-+Improve+quota+
> > > communication
> > >
> > > The discussion thread is here:
> > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> >
>


Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-08 Thread Becket Qin
Thanks for the comments, Jun.

1. Good point.
2. Also makes sense. Usually the connection.max.idle.ms is high enough so
the throttling is impacted.

I have updated the KIP to reflect the changes.

Thanks,

Jiangjie (Becket) Qin


On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao  wrote:

> Hi, Jiangjie,
>
> Sorry for the late response. The proposal sounds good overall. A couple of
> minor comments below.
>
> 1. For throttling a fetch request, we could potentially just send an empty
> response. We can return a throttle time calculated from a full response,
> but only mute the channel on the server based on a throttle time calculated
> based on the empty response. This has the benefit that the server will mute
> the channel much shorter, which will prevent the consumer from rebalancing
> when throttled.
>
> 2. The wiki says "connections.max.idle.ms should be ignored during the
> throttle time X." This has the potential issue that a server may not detect
> that a client connection is already gone until after an arbitrary amount of
> time. Perhaps we could still just close a connection if the server has
> muted it for longer than connections.max.idle.ms. This will at least bound
> the time for a server to detect closed client connections.
>
> Thanks,
>
> Jun
>
>
> On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin  wrote:
>
> > Hi,
> >
> > We would like to start the voting thread for KIP-219. The KIP proposes to
> > improve the quota communication between the brokers and clients,
> especially
> > for cases of long throttling time.
> >
> > The KIP wiki is following:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 219+-+Improve+quota+
> > communication
> >
> > The discussion thread is here:
> > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
>


Re: [VOTE] KIP-219 - Improve Quota Communication

2018-01-08 Thread Jun Rao
Hi, Jiangjie,

Sorry for the late response. The proposal sounds good overall. A couple of
minor comments below.

1. For throttling a fetch request, we could potentially just send an empty
response. We can return a throttle time calculated from a full response,
but only mute the channel on the server based on a throttle time calculated
based on the empty response. This has the benefit that the server will mute
the channel much shorter, which will prevent the consumer from rebalancing
when throttled.

2. The wiki says "connections.max.idle.ms should be ignored during the
throttle time X." This has the potential issue that a server may not detect
that a client connection is already gone until after an arbitrary amount of
time. Perhaps we could still just close a connection if the server has
muted it for longer than connections.max.idle.ms. This will at least bound
the time for a server to detect closed client connections.

Thanks,

Jun


On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin  wrote:

> Hi,
>
> We would like to start the voting thread for KIP-219. The KIP proposes to
> improve the quota communication between the brokers and clients, especially
> for cases of long throttling time.
>
> The KIP wiki is following:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-219+-+Improve+quota+
> communication
>
> The discussion thread is here:
> http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
>
> Thanks,
>
> Jiangjie (Becket) Qin
>