Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-26 Thread Konstantine Karantasis
> > not
> > > > > > > > > > right?
> > > > > > > > > > > > In
> > > > > > > > > > > > > any case, I agree that we can use the maximum of
> the two
> > > > values
> > > > > as
> > > > > > > > > > the
> > > > > > > > > > > > > effective `retry.backoff.max.ms` to handle the
> case when
> > > the
> > > > > > > > > > configured
> > > > > > > > > > > > > value of `retry.backoff.ms` is larger than the
> default of
> > > 1s.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Jason
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 19, 2020 at 3:29 PM Guozhang Wang <
> > > > > wangg...@gmail.com
> > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Jason,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > My understanding is a bit different here: even
> if user has
> > > an
> > > > > > > > > > explicit
> > > > > > > > > > > > > > overridden "retry.backoff.ms", the exponential
> mechanism
> > > > still
> > > > > > > > > > > > triggers
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > the backoff would be increased till "
> retry.backoff.max.ms
> > > ";
> > > > > and
> > > > > > > > > > if the
> > > > > > > > > > > > > > specified "retry.backoff.ms" is already larger
> than the "
> > > > > > > > > > > > > > retry.backoff.max.ms", we would still take "
> > > > > retry.backoff.max.ms
> > > > > > > > > ".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So if the user does override the "
> retry.backoff.ms" to a
> > > > value
> > > > > > > > > > larger
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > 1s and is not aware of the new config, she would
> be
> > > surprised
> > > > > to
> > > > > > > > > > see
> > > > > > > > > > > > the
> > > > > > > > > > > > > > specified value seemingly not being respected,
> but she
> > > could
> > > > > > > > > still
> > > > > > > > > > > > learn
> > > > > > > > > > > > > > that afterwards by reading the release notes
> introducing
> > > this
> > > > > KIP
> > > > > > > > > > > > > anyways.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > > > > > > > > ja...@confluent.io>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Sanjana,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The KIP looks good to me. I had just one
> question about
> > > the
> > > > > > > > > > default
> > > > > > > > > > > > > >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-25 Thread Sanjana Kaundinya
n
> > > > > > > > > explicit
> > > > > > > > > > > > > overridden "retry.backoff.ms", the exponential 
> > > > > > > > > > > > > mechanism
> > > still
> > > > > > > > > > > triggers
> > > > > > > > > > > > > and
> > > > > > > > > > > > > the backoff would be increased till 
> > > > > > > > > > > > > "retry.backoff.max.ms
> > ";
> > > > and
> > > > > > > > > if the
> > > > > > > > > > > > > specified "retry.backoff.ms" is already larger than 
> > > > > > > > > > > > > the "
> > > > > > > > > > > > > retry.backoff.max.ms", we would still take "
> > > > retry.backoff.max.ms
> > > > > > > > ".
> > > > > > > > > > > > >
> > > > > > > > > > > > > So if the user does override the "retry.backoff.ms" 
> > > > > > > > > > > > > to a
> > > value
> > > > > > > > > larger
> > > > > > > > > > > > than
> > > > > > > > > > > > > 1s and is not aware of the new config, she would be
> > surprised
> > > > to
> > > > > > > > > see
> > > > > > > > > > > the
> > > > > > > > > > > > > specified value seemingly not being respected, but she
> > could
> > > > > > > > still
> > > > > > > > > > > learn
> > > > > > > > > > > > > that afterwards by reading the release notes 
> > > > > > > > > > > > > introducing
> > this
> > > > KIP
> > > > > > > > > > > > anyways.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > > > > > > > ja...@confluent.io>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Sanjana,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The KIP looks good to me. I had just one question 
> > > > > > > > > > > > > > about
> > the
> > > > > > > > > default
> > > > > > > > > > > > > > behavior. As I understand, if the user has 
> > > > > > > > > > > > > > specified `
> > > > > > > > > > > retry.backoff.ms
> > > > > > > > > > > > `
> > > > > > > > > > > > > > explicitly, then we will not apply the default max
> > backoff.
> > > As
> > > > > > > > > such,
> > > > > > > > > > > > > > there's no way to get the benefit of this feature 
> > > > > > > > > > > > > > if you
> > are
> > > > > > > > > > > providing
> > > > > > > > > > > > a
> > > > > > > > > > > > > `
> > > > > > > > > > > > > > retry.backoff.ms` unless you also provide `
> > > > > > > > retry.backoff.max.ms
> > > > > > > > > `.
> > > > > > > > > > > That
> > > > > > > > > > > > > > makes sense if you assume the user is unaware of 
> > > > > > > > > > > > > > the new
> > > > > > > > > > > configuration,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > it is surprising otherwise.

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-25 Thread Konstantine Karantasis
t; > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Hey Jason,
> > > > > >>>>>>>
> > > > > >>>>>>> My understanding is a bit different here: even if user has
> an
> > > > > >>> explicit
> > > > > >>>>>>> overridden "retry.backoff.ms", the exponential mechanism
> > still
> > > > > >>>>> triggers
> > > > > >>>>>>> and
> > > > > >>>>>>> the backoff would be increased till "retry.backoff.max.ms
> ";
> > > and
> > > > > >>> if the
> > > > > >>>>>>> specified "retry.backoff.ms" is already larger than the "
> > > > > >>>>>>> retry.backoff.max.ms", we would still take "
> > > retry.backoff.max.ms
> > > > > >> ".
> > > > > >>>>>>>
> > > > > >>>>>>> So if the user does override the "retry.backoff.ms" to a
> > value
> > > > > >>> larger
> > > > > >>>>>> than
> > > > > >>>>>>> 1s and is not aware of the new config, she would be
> surprised
> > > to
> > > > > >>> see
> > > > > >>>>> the
> > > > > >>>>>>> specified value seemingly not being respected, but she
> could
> > > > > >> still
> > > > > >>>>> learn
> > > > > >>>>>>> that afterwards by reading the release notes introducing
> this
> > > KIP
> > > > > >>>>>> anyways.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Guozhang
> > > > > >>>>>>>
> > > > > >>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > > > >>> ja...@confluent.io>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi Sanjana,
> > > > > >>>>>>>>
> > > > > >>>>>>>> The KIP looks good to me. I had just one question about
> the
> > > > > >>> default
> > > > > >>>>>>>> behavior. As I understand, if the user has specified `
> > > > > >>>>> retry.backoff.ms
> > > > > >>>>>> `
> > > > > >>>>>>>> explicitly, then we will not apply the default max
> backoff.
> > As
> > > > > >>> such,
> > > > > >>>>>>>> there's no way to get the benefit of this feature if you
> are
> > > > > >>>>> providing
> > > > > >>>>>> a
> > > > > >>>>>>> `
> > > > > >>>>>>>> retry.backoff.ms` unless you also provide `
> > > > > >> retry.backoff.max.ms
> > > > > >>> `.
> > > > > >>>>> That
> > > > > >>>>>>>> makes sense if you assume the user is unaware of the new
> > > > > >>>>> configuration,
> > > > > >>>>>>> but
> > > > > >>>>>>>> it is surprising otherwise. Since it's not a semantic
> change
> > > > > >> and
> > > > > >>>>> since
> > > > > >>>>>>> the
> > > > > >>>>>>>> default you're proposing of 1s is fairly low already, I
> > wonder
> > > > > >> if
> > > > > >>>>> it's
> > > > > >>>>>>> good
> > > > > >>>>>>>> enough to mention the new configuration in the release
> notes
> > > > > >> and
> > > > > >>> not
> > > > > >>>>>> add
> > > > > >>>>>>>> any special logi

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-24 Thread Guozhang Wang
t; > >>>>>> any case, I agree that we can use the maximum of the two
> values
> > as
> > > > >>> the
> > > > >>>>>> effective `retry.backoff.max.ms` to handle the case when the
> > > > >>> configured
> > > > >>>>>> value of `retry.backoff.ms` is larger than the default of 1s.
> > > > >>>>>>
> > > > >>>>>> -Jason
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Thu, Mar 19, 2020 at 3:29 PM Guozhang Wang <
> > wangg...@gmail.com
> > > >
> > > > >>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hey Jason,
> > > > >>>>>>>
> > > > >>>>>>> My understanding is a bit different here: even if user has an
> > > > >>> explicit
> > > > >>>>>>> overridden "retry.backoff.ms", the exponential mechanism
> still
> > > > >>>>> triggers
> > > > >>>>>>> and
> > > > >>>>>>> the backoff would be increased till "retry.backoff.max.ms";
> > and
> > > > >>> if the
> > > > >>>>>>> specified "retry.backoff.ms" is already larger than the "
> > > > >>>>>>> retry.backoff.max.ms", we would still take "
> > retry.backoff.max.ms
> > > > >> ".
> > > > >>>>>>>
> > > > >>>>>>> So if the user does override the "retry.backoff.ms" to a
> value
> > > > >>> larger
> > > > >>>>>> than
> > > > >>>>>>> 1s and is not aware of the new config, she would be surprised
> > to
> > > > >>> see
> > > > >>>>> the
> > > > >>>>>>> specified value seemingly not being respected, but she could
> > > > >> still
> > > > >>>>> learn
> > > > >>>>>>> that afterwards by reading the release notes introducing this
> > KIP
> > > > >>>>>> anyways.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Guozhang
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > > >>> ja...@confluent.io>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi Sanjana,
> > > > >>>>>>>>
> > > > >>>>>>>> The KIP looks good to me. I had just one question about the
> > > > >>> default
> > > > >>>>>>>> behavior. As I understand, if the user has specified `
> > > > >>>>> retry.backoff.ms
> > > > >>>>>> `
> > > > >>>>>>>> explicitly, then we will not apply the default max backoff.
> As
> > > > >>> such,
> > > > >>>>>>>> there's no way to get the benefit of this feature if you are
> > > > >>>>> providing
> > > > >>>>>> a
> > > > >>>>>>> `
> > > > >>>>>>>> retry.backoff.ms` unless you also provide `
> > > > >> retry.backoff.max.ms
> > > > >>> `.
> > > > >>>>> That
> > > > >>>>>>>> makes sense if you assume the user is unaware of the new
> > > > >>>>> configuration,
> > > > >>>>>>> but
> > > > >>>>>>>> it is surprising otherwise. Since it's not a semantic change
> > > > >> and
> > > > >>>>> since
> > > > >>>>>>> the
> > > > >>>>>>>> default you're proposing of 1s is fairly low already, I
> wonder
> > > > >> if
> > > > >>>>> it's
> >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-23 Thread Boyang Chen
arger than the "
> > > >>>>>>> retry.backoff.max.ms", we would still take "
> retry.backoff.max.ms
> > > >> ".
> > > >>>>>>>
> > > >>>>>>> So if the user does override the "retry.backoff.ms" to a value
> > > >>> larger
> > > >>>>>> than
> > > >>>>>>> 1s and is not aware of the new config, she would be surprised
> to
> > > >>> see
> > > >>>>> the
> > > >>>>>>> specified value seemingly not being respected, but she could
> > > >> still
> > > >>>>> learn
> > > >>>>>>> that afterwards by reading the release notes introducing this
> KIP
> > > >>>>>> anyways.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Guozhang
> > > >>>>>>>
> > > >>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > >>> ja...@confluent.io>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi Sanjana,
> > > >>>>>>>>
> > > >>>>>>>> The KIP looks good to me. I had just one question about the
> > > >>> default
> > > >>>>>>>> behavior. As I understand, if the user has specified `
> > > >>>>> retry.backoff.ms
> > > >>>>>> `
> > > >>>>>>>> explicitly, then we will not apply the default max backoff. As
> > > >>> such,
> > > >>>>>>>> there's no way to get the benefit of this feature if you are
> > > >>>>> providing
> > > >>>>>> a
> > > >>>>>>> `
> > > >>>>>>>> retry.backoff.ms` unless you also provide `
> > > >> retry.backoff.max.ms
> > > >>> `.
> > > >>>>> That
> > > >>>>>>>> makes sense if you assume the user is unaware of the new
> > > >>>>> configuration,
> > > >>>>>>> but
> > > >>>>>>>> it is surprising otherwise. Since it's not a semantic change
> > > >> and
> > > >>>>> since
> > > >>>>>>> the
> > > >>>>>>>> default you're proposing of 1s is fairly low already, I wonder
> > > >> if
> > > >>>>> it's
> > > >>>>>>> good
> > > >>>>>>>> enough to mention the new configuration in the release notes
> > > >> and
> > > >>> not
> > > >>>>>> add
> > > >>>>>>>> any special logic. What do you think?
> > > >>>>>>>>
> > > >>>>>>>> -Jason
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > > >>>>>> skaundi...@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Thank you for the comments Guozhang.
> > > >>>>>>>>>
> > > >>>>>>>>> I’ll leave this KIP out for discussion till the end of the
> > > >>> week and
> > > >>>>>>> then
> > > >>>>>>>>> start a vote for this early next week.
> > > >>>>>>>>>
> > > >>>>>>>>> Sanjana
> > > >>>>>>>>>
> > > >>>>>>>>> On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
> > > >>> wangg...@gmail.com
> > > >>>>>> ,
> > > >>>>>>>> wrote:
> > > >>>>>>>>>> Hello Sanjana,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks for the proposed KIP, I think that makes a lot of
> > > >>> sense --
> > > >>>>>> as
> > > >>>>>&

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-23 Thread Sanjana Kaundinya
> > >>>>>>>>
> > >>>>>>>> The KIP looks good to me. I had just one question about the
> > >>> default
> > >>>>>>>> behavior. As I understand, if the user has specified `
> > >>>>> retry.backoff.ms
> > >>>>>> `
> > >>>>>>>> explicitly, then we will not apply the default max backoff. As
> > >>> such,
> > >>>>>>>> there's no way to get the benefit of this feature if you are
> > >>>>> providing
> > >>>>>> a
> > >>>>>>> `
> > >>>>>>>> retry.backoff.ms` unless you also provide `
> > >> retry.backoff.max.ms
> > >>> `.
> > >>>>> That
> > >>>>>>>> makes sense if you assume the user is unaware of the new
> > >>>>> configuration,
> > >>>>>>> but
> > >>>>>>>> it is surprising otherwise. Since it's not a semantic change
> > >> and
> > >>>>> since
> > >>>>>>> the
> > >>>>>>>> default you're proposing of 1s is fairly low already, I wonder
> > >> if
> > >>>>> it's
> > >>>>>>> good
> > >>>>>>>> enough to mention the new configuration in the release notes
> > >> and
> > >>> not
> > >>>>>> add
> > >>>>>>>> any special logic. What do you think?
> > >>>>>>>>
> > >>>>>>>> -Jason
> > >>>>>>>>
> > >>>>>>>> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > >>>>>> skaundi...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thank you for the comments Guozhang.
> > >>>>>>>>>
> > >>>>>>>>> I’ll leave this KIP out for discussion till the end of the
> > >>> week and
> > >>>>>>> then
> > >>>>>>>>> start a vote for this early next week.
> > >>>>>>>>>
> > >>>>>>>>> Sanjana
> > >>>>>>>>>
> > >>>>>>>>> On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
> > >>> wangg...@gmail.com
> > >>>>>> ,
> > >>>>>>>> wrote:
> > >>>>>>>>>> Hello Sanjana,
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks for the proposed KIP, I think that makes a lot of
> > >>> sense --
> > >>>>>> as
> > >>>>>>>> you
> > >>>>>>>>>> mentioned in the motivation, we've indeed seen many issues
> > >>> with
> > >>>>>>> regard
> > >>>>>>>> to
> > >>>>>>>>>> the frequent retries, with bounded exponential backoff in
> > >> the
> > >>>>>>> scenario
> > >>>>>>>>>> where there's a long connectivity issue we would
> > >> effectively
> > >>>>> reduce
> > >>>>>>> the
> > >>>>>>>>>> request load by 10 given the default configs.
> > >>>>>>>>>>
> > >>>>>>>>>> For higher-level Streams client and Connect frameworks,
> > >>> today we
> > >>>>>> also
> > >>>>>>>>> have
> > >>>>>>>>>> a retry logic but that's used in a slightly different way.
> > >>> For
> > >>>>>>> example
> > >>>>>>>> in
> > >>>>>>>>>> Streams, we tend to handle the retry logic at the
> > >>> thread-level
> > >>>>> and
> > >>>>>>>> hence
> > >>>>>>>>>> very likely we'd like to change that mechanism in KIP-572
> > >>>>> anyways.
> > >>>>>>> For
> > >>>>>>>>>> producer / consumer 

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-23 Thread Boyang Chen
I took this to mean that for users who have overridden `
> >>> retry.backoff.ms
> >>>>> `
> >>>>>> to 50ms (say), we will change the default `retry.backoff.max.ms`
> >> to
> >>> 50ms
> >>>>>> as
> >>>>>> well in order to preserve existing backoff behavior. Is that not
> >>> right?
> >>>>> In
> >>>>>> any case, I agree that we can use the maximum of the two values as
> >>> the
> >>>>>> effective `retry.backoff.max.ms` to handle the case when the
> >>> configured
> >>>>>> value of `retry.backoff.ms` is larger than the default of 1s.
> >>>>>>
> >>>>>> -Jason
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Mar 19, 2020 at 3:29 PM Guozhang Wang 
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hey Jason,
> >>>>>>>
> >>>>>>> My understanding is a bit different here: even if user has an
> >>> explicit
> >>>>>>> overridden "retry.backoff.ms", the exponential mechanism still
> >>>>> triggers
> >>>>>>> and
> >>>>>>> the backoff would be increased till "retry.backoff.max.ms"; and
> >>> if the
> >>>>>>> specified "retry.backoff.ms" is already larger than the "
> >>>>>>> retry.backoff.max.ms", we would still take "retry.backoff.max.ms
> >> ".
> >>>>>>>
> >>>>>>> So if the user does override the "retry.backoff.ms" to a value
> >>> larger
> >>>>>> than
> >>>>>>> 1s and is not aware of the new config, she would be surprised to
> >>> see
> >>>>> the
> >>>>>>> specified value seemingly not being respected, but she could
> >> still
> >>>>> learn
> >>>>>>> that afterwards by reading the release notes introducing this KIP
> >>>>>> anyways.
> >>>>>>>
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> >>> ja...@confluent.io>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Sanjana,
> >>>>>>>>
> >>>>>>>> The KIP looks good to me. I had just one question about the
> >>> default
> >>>>>>>> behavior. As I understand, if the user has specified `
> >>>>> retry.backoff.ms
> >>>>>> `
> >>>>>>>> explicitly, then we will not apply the default max backoff. As
> >>> such,
> >>>>>>>> there's no way to get the benefit of this feature if you are
> >>>>> providing
> >>>>>> a
> >>>>>>> `
> >>>>>>>> retry.backoff.ms` unless you also provide `
> >> retry.backoff.max.ms
> >>> `.
> >>>>> That
> >>>>>>>> makes sense if you assume the user is unaware of the new
> >>>>> configuration,
> >>>>>>> but
> >>>>>>>> it is surprising otherwise. Since it's not a semantic change
> >> and
> >>>>> since
> >>>>>>> the
> >>>>>>>> default you're proposing of 1s is fairly low already, I wonder
> >> if
> >>>>> it's
> >>>>>>> good
> >>>>>>>> enough to mention the new configuration in the release notes
> >> and
> >>> not
> >>>>>> add
> >>>>>>>> any special logic. What do you think?
> >>>>>>>>
> >>>>>>>> -Jason
> >>>>>>>>
> >>>>>>>> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> >>>>>> skaundi...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thank you for the comments Guozhang.
> >>>>>>>>>
> >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-23 Thread Cheng Tan
gt;>>> overridden "retry.backoff.ms", the exponential mechanism still
>>>>> triggers
>>>>>>> and
>>>>>>> the backoff would be increased till "retry.backoff.max.ms"; and
>>> if the
>>>>>>> specified "retry.backoff.ms" is already larger than the "
>>>>>>> retry.backoff.max.ms", we would still take "retry.backoff.max.ms
>> ".
>>>>>>> 
>>>>>>> So if the user does override the "retry.backoff.ms" to a value
>>> larger
>>>>>> than
>>>>>>> 1s and is not aware of the new config, she would be surprised to
>>> see
>>>>> the
>>>>>>> specified value seemingly not being respected, but she could
>> still
>>>>> learn
>>>>>>> that afterwards by reading the release notes introducing this KIP
>>>>>> anyways.
>>>>>>> 
>>>>>>> 
>>>>>>> Guozhang
>>>>>>> 
>>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
>>> ja...@confluent.io>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Sanjana,
>>>>>>>> 
>>>>>>>> The KIP looks good to me. I had just one question about the
>>> default
>>>>>>>> behavior. As I understand, if the user has specified `
>>>>> retry.backoff.ms
>>>>>> `
>>>>>>>> explicitly, then we will not apply the default max backoff. As
>>> such,
>>>>>>>> there's no way to get the benefit of this feature if you are
>>>>> providing
>>>>>> a
>>>>>>> `
>>>>>>>> retry.backoff.ms` unless you also provide `
>> retry.backoff.max.ms
>>> `.
>>>>> That
>>>>>>>> makes sense if you assume the user is unaware of the new
>>>>> configuration,
>>>>>>> but
>>>>>>>> it is surprising otherwise. Since it's not a semantic change
>> and
>>>>> since
>>>>>>> the
>>>>>>>> default you're proposing of 1s is fairly low already, I wonder
>> if
>>>>> it's
>>>>>>> good
>>>>>>>> enough to mention the new configuration in the release notes
>> and
>>> not
>>>>>> add
>>>>>>>> any special logic. What do you think?
>>>>>>>> 
>>>>>>>> -Jason
>>>>>>>> 
>>>>>>>> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
>>>>>> skaundi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thank you for the comments Guozhang.
>>>>>>>>> 
>>>>>>>>> I’ll leave this KIP out for discussion till the end of the
>>> week and
>>>>>>> then
>>>>>>>>> start a vote for this early next week.
>>>>>>>>> 
>>>>>>>>> Sanjana
>>>>>>>>> 
>>>>>>>>> On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
>>> wangg...@gmail.com
>>>>>> ,
>>>>>>>> wrote:
>>>>>>>>>> Hello Sanjana,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the proposed KIP, I think that makes a lot of
>>> sense --
>>>>>> as
>>>>>>>> you
>>>>>>>>>> mentioned in the motivation, we've indeed seen many issues
>>> with
>>>>>>> regard
>>>>>>>> to
>>>>>>>>>> the frequent retries, with bounded exponential backoff in
>> the
>>>>>>> scenario
>>>>>>>>>> where there's a long connectivity issue we would
>> effectively
>>>>> reduce
>>>>>>> the
>>>>>>>>>> request load by 10 given the default configs.
>>>>>>>>>> 
>>>>>>>>>> For higher-level Streams client and Connect frameworks,
>>> today we
>>>>>> also
>>>>>>>>> have
>>>>>>>>&

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Sanjana Kaundinya
he "
> > > > > > retry.backoff.max.ms", we would still take "retry.backoff.max.ms
> ".
> > > > > >
> > > > > > So if the user does override the "retry.backoff.ms" to a value
> > larger
> > > > > than
> > > > > > 1s and is not aware of the new config, she would be surprised to
> > see
> > > > the
> > > > > > specified value seemingly not being respected, but she could
> still
> > > > learn
> > > > > > that afterwards by reading the release notes introducing this KIP
> > > > > anyways.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Sanjana,
> > > > > > >
> > > > > > > The KIP looks good to me. I had just one question about the
> > default
> > > > > > > behavior. As I understand, if the user has specified `
> > > > retry.backoff.ms
> > > > > `
> > > > > > > explicitly, then we will not apply the default max backoff. As
> > such,
> > > > > > > there's no way to get the benefit of this feature if you are
> > > > providing
> > > > > a
> > > > > > `
> > > > > > > retry.backoff.ms` unless you also provide `
> retry.backoff.max.ms
> > `.
> > > > That
> > > > > > > makes sense if you assume the user is unaware of the new
> > > > configuration,
> > > > > > but
> > > > > > > it is surprising otherwise. Since it's not a semantic change
> and
> > > > since
> > > > > > the
> > > > > > > default you're proposing of 1s is fairly low already, I wonder
> if
> > > > it's
> > > > > > good
> > > > > > > enough to mention the new configuration in the release notes
> and
> > not
> > > > > add
> > > > > > > any special logic. What do you think?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > > > > skaundi...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you for the comments Guozhang.
> > > > > > > >
> > > > > > > > I’ll leave this KIP out for discussion till the end of the
> > week and
> > > > > > then
> > > > > > > > start a vote for this early next week.
> > > > > > > >
> > > > > > > > Sanjana
> > > > > > > >
> > > > > > > > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
> > wangg...@gmail.com
> > > > > ,
> > > > > > > wrote:
> > > > > > > > > Hello Sanjana,
> > > > > > > > >
> > > > > > > > > Thanks for the proposed KIP, I think that makes a lot of
> > sense --
> > > > > as
> > > > > > > you
> > > > > > > > > mentioned in the motivation, we've indeed seen many issues
> > with
> > > > > > regard
> > > > > > > to
> > > > > > > > > the frequent retries, with bounded exponential backoff in
> the
> > > > > > scenario
> > > > > > > > > where there's a long connectivity issue we would
> effectively
> > > > reduce
> > > > > > the
> > > > > > > > > request load by 10 given the default configs.
> > > > > > > > >
> > > > > > > > > For higher-level Streams client and Connect frameworks,
> > today we
> > > > > also
> > > > > > > > have
> > > > > > > > > a retry logic but that's used in a slightly different way.
> > For
> > > > > > example
> > > > > > > in
> > > > > > > > > Streams, we tend to handle the retry logic at the
> > thread-level
> > > >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Guozhang Wang
; >
> > > > > On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Sanjana,
> > > > > >
> > > > > > The KIP looks good to me. I had just one question about the
> default
> > > > > > behavior. As I understand, if the user has specified `
> > > retry.backoff.ms
> > > > `
> > > > > > explicitly, then we will not apply the default max backoff. As
> such,
> > > > > > there's no way to get the benefit of this feature if you are
> > > providing
> > > > a
> > > > > `
> > > > > > retry.backoff.ms` unless you also provide `retry.backoff.max.ms
> `.
> > > That
> > > > > > makes sense if you assume the user is unaware of the new
> > > configuration,
> > > > > but
> > > > > > it is surprising otherwise. Since it's not a semantic change and
> > > since
> > > > > the
> > > > > > default you're proposing of 1s is fairly low already, I wonder if
> > > it's
> > > > > good
> > > > > > enough to mention the new configuration in the release notes and
> not
> > > > add
> > > > > > any special logic. What do you think?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > > > skaundi...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for the comments Guozhang.
> > > > > > >
> > > > > > > I’ll leave this KIP out for discussion till the end of the
> week and
> > > > > then
> > > > > > > start a vote for this early next week.
> > > > > > >
> > > > > > > Sanjana
> > > > > > >
> > > > > > > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
> wangg...@gmail.com
> > > > ,
> > > > > > wrote:
> > > > > > > > Hello Sanjana,
> > > > > > > >
> > > > > > > > Thanks for the proposed KIP, I think that makes a lot of
> sense --
> > > > as
> > > > > > you
> > > > > > > > mentioned in the motivation, we've indeed seen many issues
> with
> > > > > regard
> > > > > > to
> > > > > > > > the frequent retries, with bounded exponential backoff in the
> > > > > scenario
> > > > > > > > where there's a long connectivity issue we would effectively
> > > reduce
> > > > > the
> > > > > > > > request load by 10 given the default configs.
> > > > > > > >
> > > > > > > > For higher-level Streams client and Connect frameworks,
> today we
> > > > also
> > > > > > > have
> > > > > > > > a retry logic but that's used in a slightly different way.
> For
> > > > > example
> > > > > > in
> > > > > > > > Streams, we tend to handle the retry logic at the
> thread-level
> > > and
> > > > > > hence
> > > > > > > > very likely we'd like to change that mechanism in KIP-572
> > > anyways.
> > > > > For
> > > > > > > > producer / consumer / admin clients, I think just applying
> this
> > > > > > > behavioral
> > > > > > > > change across these clients makes lot of sense. So I think
> can
> > > just
> > > > > > leave
> > > > > > > > the Streams / Connect out of the scope of this KIP to be
> > > addressed
> > > > in
> > > > > > > > separate discussions.
> > > > > > > >
> > > > > > > > I do not have further comments about this KIP :) LGTM.
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > > > > > skaundi...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> &

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Sanjana Kaundinya
 the new
> > configuration,
> > > > but
> > > > > it is surprising otherwise. Since it's not a semantic change and
> > since
> > > > the
> > > > > default you're proposing of 1s is fairly low already, I wonder if
> > it's
> > > > good
> > > > > enough to mention the new configuration in the release notes and not
> > > add
> > > > > any special logic. What do you think?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > > skaundi...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you for the comments Guozhang.
> > > > > >
> > > > > > I’ll leave this KIP out for discussion till the end of the week and
> > > > then
> > > > > > start a vote for this early next week.
> > > > > >
> > > > > > Sanjana
> > > > > >
> > > > > > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang  > > ,
> > > > > wrote:
> > > > > > > Hello Sanjana,
> > > > > > >
> > > > > > > Thanks for the proposed KIP, I think that makes a lot of sense --
> > > as
> > > > > you
> > > > > > > mentioned in the motivation, we've indeed seen many issues with
> > > > regard
> > > > > to
> > > > > > > the frequent retries, with bounded exponential backoff in the
> > > > scenario
> > > > > > > where there's a long connectivity issue we would effectively
> > reduce
> > > > the
> > > > > > > request load by 10 given the default configs.
> > > > > > >
> > > > > > > For higher-level Streams client and Connect frameworks, today we
> > > also
> > > > > > have
> > > > > > > a retry logic but that's used in a slightly different way. For
> > > > example
> > > > > in
> > > > > > > Streams, we tend to handle the retry logic at the thread-level
> > and
> > > > > hence
> > > > > > > very likely we'd like to change that mechanism in KIP-572
> > anyways.
> > > > For
> > > > > > > producer / consumer / admin clients, I think just applying this
> > > > > > behavioral
> > > > > > > change across these clients makes lot of sense. So I think can
> > just
> > > > > leave
> > > > > > > the Streams / Connect out of the scope of this KIP to be
> > addressed
> > > in
> > > > > > > separate discussions.
> > > > > > >
> > > > > > > I do not have further comments about this KIP :) LGTM.
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > > > > skaundi...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the feedback Boyang.
> > > > > > > >
> > > > > > > > If there’s anyone else who has feedback regarding this KIP,
> > would
> > > > > > really
> > > > > > > > appreciate it hearing it!
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Sanjana
> > > > > > > >
> > > > > > > > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <
> > > bche...@outlook.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sounds great!
> > > > > > > > >
> > > > > > > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > > > > > > 
> > > > > > > > > From: Sanjana Kaundinya 
> > > > > > > > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > > > > > > To: dev@kafka.apache.org 
> > > > > > > > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka
> > > > > Clients
> > > > >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Guozhang Wang
d
> > > then
> > > > > start a vote for this early next week.
> > > > >
> > > > > Sanjana
> > > > >
> > > > > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang  >,
> > > > wrote:
> > > > > > Hello Sanjana,
> > > > > >
> > > > > > Thanks for the proposed KIP, I think that makes a lot of sense --
> > as
> > > > you
> > > > > > mentioned in the motivation, we've indeed seen many issues with
> > > regard
> > > > to
> > > > > > the frequent retries, with bounded exponential backoff in the
> > > scenario
> > > > > > where there's a long connectivity issue we would effectively
> reduce
> > > the
> > > > > > request load by 10 given the default configs.
> > > > > >
> > > > > > For higher-level Streams client and Connect frameworks, today we
> > also
> > > > > have
> > > > > > a retry logic but that's used in a slightly different way. For
> > > example
> > > > in
> > > > > > Streams, we tend to handle the retry logic at the thread-level
> and
> > > > hence
> > > > > > very likely we'd like to change that mechanism in KIP-572
> anyways.
> > > For
> > > > > > producer / consumer / admin clients, I think just applying this
> > > > > behavioral
> > > > > > change across these clients makes lot of sense. So I think can
> just
> > > > leave
> > > > > > the Streams / Connect out of the scope of this KIP to be
> addressed
> > in
> > > > > > separate discussions.
> > > > > >
> > > > > > I do not have further comments about this KIP :) LGTM.
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > > > skaundi...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for the feedback Boyang.
> > > > > > >
> > > > > > > If there’s anyone else who has feedback regarding this KIP,
> would
> > > > > really
> > > > > > > appreciate it hearing it!
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sanjana
> > > > > > >
> > > > > > > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <
> > bche...@outlook.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Sounds great!
> > > > > > > >
> > > > > > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > > > > > 
> > > > > > > > From: Sanjana Kaundinya 
> > > > > > > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > > > > > To: dev@kafka.apache.org 
> > > > > > > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka
> > > > Clients
> > > > > > > >
> > > > > > > > Thanks for the explanation Boyang. One of the most common
> > > problems
> > > > > that
> > > > > > > we
> > > > > > > > have in Kafka is with respect to metadata fetches. For
> example,
> > > if
> > > > > there
> > > > > > > is
> > > > > > > > a broker failure, all clients start to fetch metadata at the
> > same
> > > > > time
> > > > > > > and
> > > > > > > > it often takes a while for the metadata to converge. In a
> high
> > > load
> > > > > > > > cluster, there are also issues where the volume of metadata
> has
> > > > made
> > > > > > > > convergence of metadata slower.
> > > > > > > >
> > > > > > > > For this case, exponential backoff helps as it reduces the
> > retry
> > > > > rate and
> > > > > > > > spaces out how often clients will retry, thereby bringing
> down
> > > the
> > > > > time
> > > > > > > for
> > > > > > > > co

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Sanjana Kaundinya
ic but that's used in a slightly different way. For
> > example
> > > in
> > > > > Streams, we tend to handle the retry logic at the thread-level and
> > > hence
> > > > > very likely we'd like to change that mechanism in KIP-572 anyways.
> > For
> > > > > producer / consumer / admin clients, I think just applying this
> > > > behavioral
> > > > > change across these clients makes lot of sense. So I think can just
> > > leave
> > > > > the Streams / Connect out of the scope of this KIP to be addressed
> in
> > > > > separate discussions.
> > > > >
> > > > > I do not have further comments about this KIP :) LGTM.
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > > skaundi...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for the feedback Boyang.
> > > > > >
> > > > > > If there’s anyone else who has feedback regarding this KIP, would
> > > > really
> > > > > > appreciate it hearing it!
> > > > > >
> > > > > > Thanks,
> > > > > > Sanjana
> > > > > >
> > > > > > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <
> bche...@outlook.com>
> > > > wrote:
> > > > > >
> > > > > > > Sounds great!
> > > > > > >
> > > > > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > > > > 
> > > > > > > From: Sanjana Kaundinya 
> > > > > > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > > > > To: dev@kafka.apache.org 
> > > > > > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka
> > > Clients
> > > > > > >
> > > > > > > Thanks for the explanation Boyang. One of the most common
> > problems
> > > > that
> > > > > > we
> > > > > > > have in Kafka is with respect to metadata fetches. For example,
> > if
> > > > there
> > > > > > is
> > > > > > > a broker failure, all clients start to fetch metadata at the
> same
> > > > time
> > > > > > and
> > > > > > > it often takes a while for the metadata to converge. In a high
> > load
> > > > > > > cluster, there are also issues where the volume of metadata has
> > > made
> > > > > > > convergence of metadata slower.
> > > > > > >
> > > > > > > For this case, exponential backoff helps as it reduces the
> retry
> > > > rate and
> > > > > > > spaces out how often clients will retry, thereby bringing down
> > the
> > > > time
> > > > > > for
> > > > > > > convergence. Something that Jason mentioned that would be a
> great
> > > > > > addition
> > > > > > > here would be if the backoff should be “jittered” as it was in
> > > > KIP-144
> > > > > > with
> > > > > > > respect to exponential reconnect backoff. This would help
> prevent
> > > the
> > > > > > > clients from being synchronized on when they retry, thereby
> > spacing
> > > > out
> > > > > > the
> > > > > > > number of requests being sent to the broker at the same time.
> > > > > > >
> > > > > > > I’ll add this example to the KIP and flush out more of the
> > details
> > > -
> > > > so
> > > > > > > it’s more clear.
> > > > > > >
> > > > > > > On Mar 17, 2020, 1:24 PM -0700, Boyang Chen <
> > > > reluctanthero...@gmail.com
> > > > > > > ,
> > > > > > > wrote:
> > > > > > > > Thanks for the reply Sanjana. I guess I would like to
> rephrase
> > my
> > > > > > > question
> > > > > > > > 2 and 3 as my previous response is a little bit unactionable.
> > > > > > > >
> > > > > > > > My specific point is that exponential backoff is not 

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Jason Gustafson
Hey Guozhang,

I was referring to this:

> For users who have not set retry.backoff.ms explicitly, the default
behavior will change so that the backoff will grow up to 1000 ms. For users
who have set retry.backoff.ms explicitly, the behavior will remain the same
as they could have specific requirements.

I took this to mean that for users who have overridden `retry.backoff.ms`
to 50ms (say), we will change the default `retry.backoff.max.ms` to 50ms as
well in order to preserve existing backoff behavior. Is that not right? In
any case, I agree that we can use the maximum of the two values as the
effective `retry.backoff.max.ms` to handle the case when the configured
value of `retry.backoff.ms` is larger than the default of 1s.

-Jason




On Thu, Mar 19, 2020 at 3:29 PM Guozhang Wang  wrote:

> Hey Jason,
>
> My understanding is a bit different here: even if user has an explicit
> overridden "retry.backoff.ms", the exponential mechanism still triggers
> and
> the backoff would be increased till "retry.backoff.max.ms"; and if the
> specified "retry.backoff.ms" is already larger than the "
> retry.backoff.max.ms", we would still take "retry.backoff.max.ms".
>
> So if the user does override the "retry.backoff.ms" to a value larger than
> 1s and is not aware of the new config, she would be surprised to see the
> specified value seemingly not being respected, but she could still learn
> that afterwards by reading the release notes introducing this KIP anyways.
>
>
> Guozhang
>
> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson 
> wrote:
>
> > Hi Sanjana,
> >
> > The KIP looks good to me. I had just one question about the default
> > behavior. As I understand, if the user has specified `retry.backoff.ms`
> > explicitly, then we will not apply the default max backoff. As such,
> > there's no way to get the benefit of this feature if you are providing a
> `
> > retry.backoff.ms` unless you also provide `retry.backoff.max.ms`. That
> > makes sense if you assume the user is unaware of the new configuration,
> but
> > it is surprising otherwise. Since it's not a semantic change and since
> the
> > default you're proposing of 1s is fairly low already, I wonder if it's
> good
> > enough to mention the new configuration in the release notes and not add
> > any special logic. What do you think?
> >
> > -Jason
> >
> > On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya 
> > wrote:
> >
> > > Thank you for the comments Guozhang.
> > >
> > > I’ll leave this KIP out for discussion till the end of the week and
> then
> > > start a vote for this early next week.
> > >
> > > Sanjana
> > >
> > > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang ,
> > wrote:
> > > > Hello Sanjana,
> > > >
> > > > Thanks for the proposed KIP, I think that makes a lot of sense -- as
> > you
> > > > mentioned in the motivation, we've indeed seen many issues with
> regard
> > to
> > > > the frequent retries, with bounded exponential backoff in the
> scenario
> > > > where there's a long connectivity issue we would effectively reduce
> the
> > > > request load by 10 given the default configs.
> > > >
> > > > For higher-level Streams client and Connect frameworks, today we also
> > > have
> > > > a retry logic but that's used in a slightly different way. For
> example
> > in
> > > > Streams, we tend to handle the retry logic at the thread-level and
> > hence
> > > > very likely we'd like to change that mechanism in KIP-572 anyways.
> For
> > > > producer / consumer / admin clients, I think just applying this
> > > behavioral
> > > > change across these clients makes lot of sense. So I think can just
> > leave
> > > > the Streams / Connect out of the scope of this KIP to be addressed in
> > > > separate discussions.
> > > >
> > > > I do not have further comments about this KIP :) LGTM.
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > skaundi...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Thanks for the feedback Boyang.
> > > > >
> > > > > If there’s anyone else who has feedback regarding this KIP, would
> > > really
> > > > > appreciate it hearing it!
> > > > >
> > > > > Thanks,
> > &g

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Guozhang Wang
Hey Jason,

My understanding is a bit different here: even if user has an explicit
overridden "retry.backoff.ms", the exponential mechanism still triggers and
the backoff would be increased till "retry.backoff.max.ms"; and if the
specified "retry.backoff.ms" is already larger than the "
retry.backoff.max.ms", we would still take "retry.backoff.max.ms".

So if the user does override the "retry.backoff.ms" to a value larger than
1s and is not aware of the new config, she would be surprised to see the
specified value seemingly not being respected, but she could still learn
that afterwards by reading the release notes introducing this KIP anyways.


Guozhang

On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson  wrote:

> Hi Sanjana,
>
> The KIP looks good to me. I had just one question about the default
> behavior. As I understand, if the user has specified `retry.backoff.ms`
> explicitly, then we will not apply the default max backoff. As such,
> there's no way to get the benefit of this feature if you are providing a `
> retry.backoff.ms` unless you also provide `retry.backoff.max.ms`. That
> makes sense if you assume the user is unaware of the new configuration, but
> it is surprising otherwise. Since it's not a semantic change and since the
> default you're proposing of 1s is fairly low already, I wonder if it's good
> enough to mention the new configuration in the release notes and not add
> any special logic. What do you think?
>
> -Jason
>
> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya 
> wrote:
>
> > Thank you for the comments Guozhang.
> >
> > I’ll leave this KIP out for discussion till the end of the week and then
> > start a vote for this early next week.
> >
> > Sanjana
> >
> > On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang ,
> wrote:
> > > Hello Sanjana,
> > >
> > > Thanks for the proposed KIP, I think that makes a lot of sense -- as
> you
> > > mentioned in the motivation, we've indeed seen many issues with regard
> to
> > > the frequent retries, with bounded exponential backoff in the scenario
> > > where there's a long connectivity issue we would effectively reduce the
> > > request load by 10 given the default configs.
> > >
> > > For higher-level Streams client and Connect frameworks, today we also
> > have
> > > a retry logic but that's used in a slightly different way. For example
> in
> > > Streams, we tend to handle the retry logic at the thread-level and
> hence
> > > very likely we'd like to change that mechanism in KIP-572 anyways. For
> > > producer / consumer / admin clients, I think just applying this
> > behavioral
> > > change across these clients makes lot of sense. So I think can just
> leave
> > > the Streams / Connect out of the scope of this KIP to be addressed in
> > > separate discussions.
> > >
> > > I do not have further comments about this KIP :) LGTM.
> > >
> > > Guozhang
> > >
> > >
> > > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> skaundi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks for the feedback Boyang.
> > > >
> > > > If there’s anyone else who has feedback regarding this KIP, would
> > really
> > > > appreciate it hearing it!
> > > >
> > > > Thanks,
> > > > Sanjana
> > > >
> > > > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen 
> > wrote:
> > > >
> > > > > Sounds great!
> > > > >
> > > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > > 
> > > > > From: Sanjana Kaundinya 
> > > > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > > To: dev@kafka.apache.org 
> > > > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka
> Clients
> > > > >
> > > > > Thanks for the explanation Boyang. One of the most common problems
> > that
> > > > we
> > > > > have in Kafka is with respect to metadata fetches. For example, if
> > there
> > > > is
> > > > > a broker failure, all clients start to fetch metadata at the same
> > time
> > > > and
> > > > > it often takes a while for the metadata to converge. In a high load
> > > > > cluster, there are also issues where the volume of metadata has
> made
> > > > > convergence of metadata slower.
> > > > >
> >

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Jason Gustafson
Hi Sanjana,

The KIP looks good to me. I had just one question about the default
behavior. As I understand, if the user has specified `retry.backoff.ms`
explicitly, then we will not apply the default max backoff. As such,
there's no way to get the benefit of this feature if you are providing a `
retry.backoff.ms` unless you also provide `retry.backoff.max.ms`. That
makes sense if you assume the user is unaware of the new configuration, but
it is surprising otherwise. Since it's not a semantic change and since the
default you're proposing of 1s is fairly low already, I wonder if it's good
enough to mention the new configuration in the release notes and not add
any special logic. What do you think?

-Jason

On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya 
wrote:

> Thank you for the comments Guozhang.
>
> I’ll leave this KIP out for discussion till the end of the week and then
> start a vote for this early next week.
>
> Sanjana
>
> On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang , wrote:
> > Hello Sanjana,
> >
> > Thanks for the proposed KIP, I think that makes a lot of sense -- as you
> > mentioned in the motivation, we've indeed seen many issues with regard to
> > the frequent retries, with bounded exponential backoff in the scenario
> > where there's a long connectivity issue we would effectively reduce the
> > request load by 10 given the default configs.
> >
> > For higher-level Streams client and Connect frameworks, today we also
> have
> > a retry logic but that's used in a slightly different way. For example in
> > Streams, we tend to handle the retry logic at the thread-level and hence
> > very likely we'd like to change that mechanism in KIP-572 anyways. For
> > producer / consumer / admin clients, I think just applying this
> behavioral
> > change across these clients makes lot of sense. So I think can just leave
> > the Streams / Connect out of the scope of this KIP to be addressed in
> > separate discussions.
> >
> > I do not have further comments about this KIP :) LGTM.
> >
> > Guozhang
> >
> >
> > On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya  >
> > wrote:
> >
> > > Thanks for the feedback Boyang.
> > >
> > > If there’s anyone else who has feedback regarding this KIP, would
> really
> > > appreciate it hearing it!
> > >
> > > Thanks,
> > > Sanjana
> > >
> > > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen 
> wrote:
> > >
> > > > Sounds great!
> > > >
> > > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > 
> > > > From: Sanjana Kaundinya 
> > > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > To: dev@kafka.apache.org 
> > > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
> > > >
> > > > Thanks for the explanation Boyang. One of the most common problems
> that
> > > we
> > > > have in Kafka is with respect to metadata fetches. For example, if
> there
> > > is
> > > > a broker failure, all clients start to fetch metadata at the same
> time
> > > and
> > > > it often takes a while for the metadata to converge. In a high load
> > > > cluster, there are also issues where the volume of metadata has made
> > > > convergence of metadata slower.
> > > >
> > > > For this case, exponential backoff helps as it reduces the retry
> rate and
> > > > spaces out how often clients will retry, thereby bringing down the
> time
> > > for
> > > > convergence. Something that Jason mentioned that would be a great
> > > addition
> > > > here would be if the backoff should be “jittered” as it was in
> KIP-144
> > > with
> > > > respect to exponential reconnect backoff. This would help prevent the
> > > > clients from being synchronized on when they retry, thereby spacing
> out
> > > the
> > > > number of requests being sent to the broker at the same time.
> > > >
> > > > I’ll add this example to the KIP and flush out more of the details -
> so
> > > > it’s more clear.
> > > >
> > > > On Mar 17, 2020, 1:24 PM -0700, Boyang Chen <
> reluctanthero...@gmail.com
> > > > ,
> > > > wrote:
> > > > > Thanks for the reply Sanjana. I guess I would like to rephrase my
> > > > question
> > > > > 2 and 3 as my previous response is a little bit unactionable.
> > > > >
&g

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-19 Thread Sanjana Kaundinya
Thank you for the comments Guozhang.

I’ll leave this KIP out for discussion till the end of the week and then start 
a vote for this early next week.

Sanjana

On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang , wrote:
> Hello Sanjana,
>
> Thanks for the proposed KIP, I think that makes a lot of sense -- as you
> mentioned in the motivation, we've indeed seen many issues with regard to
> the frequent retries, with bounded exponential backoff in the scenario
> where there's a long connectivity issue we would effectively reduce the
> request load by 10 given the default configs.
>
> For higher-level Streams client and Connect frameworks, today we also have
> a retry logic but that's used in a slightly different way. For example in
> Streams, we tend to handle the retry logic at the thread-level and hence
> very likely we'd like to change that mechanism in KIP-572 anyways. For
> producer / consumer / admin clients, I think just applying this behavioral
> change across these clients makes lot of sense. So I think can just leave
> the Streams / Connect out of the scope of this KIP to be addressed in
> separate discussions.
>
> I do not have further comments about this KIP :) LGTM.
>
> Guozhang
>
>
> On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya 
> wrote:
>
> > Thanks for the feedback Boyang.
> >
> > If there’s anyone else who has feedback regarding this KIP, would really
> > appreciate it hearing it!
> >
> > Thanks,
> > Sanjana
> >
> > On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen  wrote:
> >
> > > Sounds great!
> > >
> > > Get Outlook for iOS<https://aka.ms/o0ukef>
> > > ____________
> > > From: Sanjana Kaundinya 
> > > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > To: dev@kafka.apache.org 
> > > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
> > >
> > > Thanks for the explanation Boyang. One of the most common problems that
> > we
> > > have in Kafka is with respect to metadata fetches. For example, if there
> > is
> > > a broker failure, all clients start to fetch metadata at the same time
> > and
> > > it often takes a while for the metadata to converge. In a high load
> > > cluster, there are also issues where the volume of metadata has made
> > > convergence of metadata slower.
> > >
> > > For this case, exponential backoff helps as it reduces the retry rate and
> > > spaces out how often clients will retry, thereby bringing down the time
> > for
> > > convergence. Something that Jason mentioned that would be a great
> > addition
> > > here would be if the backoff should be “jittered” as it was in KIP-144
> > with
> > > respect to exponential reconnect backoff. This would help prevent the
> > > clients from being synchronized on when they retry, thereby spacing out
> > the
> > > number of requests being sent to the broker at the same time.
> > >
> > > I’ll add this example to the KIP and flush out more of the details - so
> > > it’s more clear.
> > >
> > > On Mar 17, 2020, 1:24 PM -0700, Boyang Chen  > > ,
> > > wrote:
> > > > Thanks for the reply Sanjana. I guess I would like to rephrase my
> > > question
> > > > 2 and 3 as my previous response is a little bit unactionable.
> > > >
> > > > My specific point is that exponential backoff is not a silver bullet
> > and
> > > we
> > > > should consider using it to solve known problems, instead of making the
> > > > holistic changes to all clients in Kafka ecosystem. I do like the
> > > > exponential backoff idea and believe this would be of great value, but
> > > > maybe we should focus on proposing some existing modules that are
> > > suffering
> > > > from static retry, and only change them in this first KIP. If in the
> > > > future, some other component users believe they are also suffering, we
> > > > could get more minor KIPs to change the behavior as well.
> > > >
> > > > Boyang
> > > >
> > > > On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya <
> > skaundi...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Thanks for the feedback Boyang, I will revise the KIP with the
> > > > > mathematical relations as per your suggestion. To address your
> > > feedback:
> > > > >
> > > > > 1. Currently, with the default of 100 ms per retry backoff, in 

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-18 Thread Guozhang Wang
Hello Sanjana,

Thanks for the proposed KIP, I think that makes a lot of sense -- as you
mentioned in the motivation, we've indeed seen many issues with regard to
the frequent retries, with bounded exponential backoff in the scenario
where there's a long connectivity issue we would effectively reduce the
request load by 10 given the default configs.

For higher-level Streams client and Connect frameworks, today we also have
a retry logic but that's used in a slightly different way. For example in
Streams, we tend to handle the retry logic at the thread-level and hence
very likely we'd like to change that mechanism in KIP-572 anyways. For
producer / consumer / admin clients, I think just applying this behavioral
change across these clients makes lot of sense. So I think can just leave
the Streams / Connect out of the scope of this KIP to be addressed in
separate discussions.

I do not have further comments about this KIP :) LGTM.

Guozhang


On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya 
wrote:

> Thanks for the feedback Boyang.
>
> If there’s anyone else who has feedback regarding this KIP, would really
> appreciate it hearing it!
>
> Thanks,
> Sanjana
>
> On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen  wrote:
>
> > Sounds great!
> >
> > Get Outlook for iOS<https://aka.ms/o0ukef>
> > 
> > From: Sanjana Kaundinya 
> > Sent: Tuesday, March 17, 2020 5:54:35 PM
> > To: dev@kafka.apache.org 
> > Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
> >
> > Thanks for the explanation Boyang. One of the most common problems that
> we
> > have in Kafka is with respect to metadata fetches. For example, if there
> is
> > a broker failure, all clients start to fetch metadata at the same time
> and
> > it often takes a while for the metadata to converge. In a high load
> > cluster, there are also issues where the volume of metadata has made
> > convergence of metadata slower.
> >
> > For this case, exponential backoff helps as it reduces the retry rate and
> > spaces out how often clients will retry, thereby bringing down the time
> for
> > convergence. Something that Jason mentioned that would be a great
> addition
> > here would be if the backoff should be “jittered” as it was in KIP-144
> with
> > respect to exponential reconnect backoff. This would help prevent the
> > clients from being synchronized on when they retry, thereby spacing out
> the
> > number of requests being sent to the broker at the same time.
> >
> > I’ll add this example to the KIP and flush out more of the details - so
> > it’s more clear.
> >
> > On Mar 17, 2020, 1:24 PM -0700, Boyang Chen  >,
> > wrote:
> > > Thanks for the reply Sanjana. I guess I would like to rephrase my
> > question
> > > 2 and 3 as my previous response is a little bit unactionable.
> > >
> > > My specific point is that exponential backoff is not a silver bullet
> and
> > we
> > > should consider using it to solve known problems, instead of making the
> > > holistic changes to all clients in Kafka ecosystem. I do like the
> > > exponential backoff idea and believe this would be of great value, but
> > > maybe we should focus on proposing some existing modules that are
> > suffering
> > > from static retry, and only change them in this first KIP. If in the
> > > future, some other component users believe they are also suffering, we
> > > could get more minor KIPs to change the behavior as well.
> > >
> > > Boyang
> > >
> > > On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya <
> skaundi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks for the feedback Boyang, I will revise the KIP with the
> > > > mathematical relations as per your suggestion. To address your
> > feedback:
> > > >
> > > > 1. Currently, with the default of 100 ms per retry backoff, in 1
> second
> > > > we would have 10 retries. In the case of using an exponential
> backoff,
> > we
> > > > would have a total of 4 retries in 1 second. Thus we have less than
> > half of
> > > > the amount of retries in the same timeframe and can lessen broker
> > pressure.
> > > > This calculation is done as following (using the formula laid out in
> > the
> > > > KIP:
> > > >
> > > > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry
> > ms
> > > > is initially 100 ms)
> > > > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> &

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-18 Thread Sanjana Kaundinya
Thanks for the feedback Boyang.

If there’s anyone else who has feedback regarding this KIP, would really
appreciate it hearing it!

Thanks,
Sanjana

On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen  wrote:

> Sounds great!
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
> 
> From: Sanjana Kaundinya 
> Sent: Tuesday, March 17, 2020 5:54:35 PM
> To: dev@kafka.apache.org 
> Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
>
> Thanks for the explanation Boyang. One of the most common problems that we
> have in Kafka is with respect to metadata fetches. For example, if there is
> a broker failure, all clients start to fetch metadata at the same time and
> it often takes a while for the metadata to converge. In a high load
> cluster, there are also issues where the volume of metadata has made
> convergence of metadata slower.
>
> For this case, exponential backoff helps as it reduces the retry rate and
> spaces out how often clients will retry, thereby bringing down the time for
> convergence. Something that Jason mentioned that would be a great addition
> here would be if the backoff should be “jittered” as it was in KIP-144 with
> respect to exponential reconnect backoff. This would help prevent the
> clients from being synchronized on when they retry, thereby spacing out the
> number of requests being sent to the broker at the same time.
>
> I’ll add this example to the KIP and flush out more of the details - so
> it’s more clear.
>
> On Mar 17, 2020, 1:24 PM -0700, Boyang Chen ,
> wrote:
> > Thanks for the reply Sanjana. I guess I would like to rephrase my
> question
> > 2 and 3 as my previous response is a little bit unactionable.
> >
> > My specific point is that exponential backoff is not a silver bullet and
> we
> > should consider using it to solve known problems, instead of making the
> > holistic changes to all clients in Kafka ecosystem. I do like the
> > exponential backoff idea and believe this would be of great value, but
> > maybe we should focus on proposing some existing modules that are
> suffering
> > from static retry, and only change them in this first KIP. If in the
> > future, some other component users believe they are also suffering, we
> > could get more minor KIPs to change the behavior as well.
> >
> > Boyang
> >
> > On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya  >
> > wrote:
> >
> > > Thanks for the feedback Boyang, I will revise the KIP with the
> > > mathematical relations as per your suggestion. To address your
> feedback:
> > >
> > > 1. Currently, with the default of 100 ms per retry backoff, in 1 second
> > > we would have 10 retries. In the case of using an exponential backoff,
> we
> > > would have a total of 4 retries in 1 second. Thus we have less than
> half of
> > > the amount of retries in the same timeframe and can lessen broker
> pressure.
> > > This calculation is done as following (using the formula laid out in
> the
> > > KIP:
> > >
> > > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry
> ms
> > > is initially 100 ms)
> > > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> > > Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> > > Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> > > Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> > > retry ms is 1000 ms)
> > >
> > > For 2 and 3, could you elaborate more about what you mean with respect
> to
> > > client timeouts? I’m not very familiar with the Streams framework, so
> would
> > > love to get more insight to how that currently works, with respect to
> > > producer transactions, so I can appropriately update the KIP to address
> > > these scenarios.
> > > On Mar 13, 2020, 7:15 PM -0700, Boyang Chen <
> reluctanthero...@gmail.com>,
> > > wrote:
> > > > Thanks for the KIP Sanjana. I think the motivation is good, but lack
> of
> > > > more quantitative analysis. For instance:
> > > >
> > > > 1. How much retries we are saving by applying the exponential retry
> vs
> > > > static retry? There should be some mathematical relations between the
> > > > static retry ms, the initial exponential retry ms, the max
> exponential
> > > > retry ms in a given time interval.
> > > > 2. How does this affect the client timeout? With exponential retry,
> the
> > > > client shall be getting easier to timeout on a parent level caller,
&g

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-18 Thread Boyang Chen
Sounds great!

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Sanjana Kaundinya 
Sent: Tuesday, March 17, 2020 5:54:35 PM
To: dev@kafka.apache.org 
Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

Thanks for the explanation Boyang. One of the most common problems that we have 
in Kafka is with respect to metadata fetches. For example, if there is a broker 
failure, all clients start to fetch metadata at the same time and it often 
takes a while for the metadata to converge. In a high load cluster, there are 
also issues where the volume of metadata has made convergence of metadata 
slower.

For this case, exponential backoff helps as it reduces the retry rate and 
spaces out how often clients will retry, thereby bringing down the time for 
convergence. Something that Jason mentioned that would be a great addition here 
would be if the backoff should be “jittered” as it was in KIP-144 with respect 
to exponential reconnect backoff. This would help prevent the clients from 
being synchronized on when they retry, thereby spacing out the number of 
requests being sent to the broker at the same time.

I’ll add this example to the KIP and flush out more of the details - so it’s 
more clear.

On Mar 17, 2020, 1:24 PM -0700, Boyang Chen , wrote:
> Thanks for the reply Sanjana. I guess I would like to rephrase my question
> 2 and 3 as my previous response is a little bit unactionable.
>
> My specific point is that exponential backoff is not a silver bullet and we
> should consider using it to solve known problems, instead of making the
> holistic changes to all clients in Kafka ecosystem. I do like the
> exponential backoff idea and believe this would be of great value, but
> maybe we should focus on proposing some existing modules that are suffering
> from static retry, and only change them in this first KIP. If in the
> future, some other component users believe they are also suffering, we
> could get more minor KIPs to change the behavior as well.
>
> Boyang
>
> On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya 
> wrote:
>
> > Thanks for the feedback Boyang, I will revise the KIP with the
> > mathematical relations as per your suggestion. To address your feedback:
> >
> > 1. Currently, with the default of 100 ms per retry backoff, in 1 second
> > we would have 10 retries. In the case of using an exponential backoff, we
> > would have a total of 4 retries in 1 second. Thus we have less than half of
> > the amount of retries in the same timeframe and can lessen broker pressure.
> > This calculation is done as following (using the formula laid out in the
> > KIP:
> >
> > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry ms
> > is initially 100 ms)
> > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> > Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> > Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> > Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> > retry ms is 1000 ms)
> >
> > For 2 and 3, could you elaborate more about what you mean with respect to
> > client timeouts? I’m not very familiar with the Streams framework, so would
> > love to get more insight to how that currently works, with respect to
> > producer transactions, so I can appropriately update the KIP to address
> > these scenarios.
> > On Mar 13, 2020, 7:15 PM -0700, Boyang Chen ,
> > wrote:
> > > Thanks for the KIP Sanjana. I think the motivation is good, but lack of
> > > more quantitative analysis. For instance:
> > >
> > > 1. How much retries we are saving by applying the exponential retry vs
> > > static retry? There should be some mathematical relations between the
> > > static retry ms, the initial exponential retry ms, the max exponential
> > > retry ms in a given time interval.
> > > 2. How does this affect the client timeout? With exponential retry, the
> > > client shall be getting easier to timeout on a parent level caller, for
> > > instance stream attempts to retry initializing producer transactions with
> > > given 5 minute interval. With exponential retry this mechanism could
> > > experience more frequent timeout which we should be careful with.
> > > 3. With regards to #2, we should have more detailed checklist of all the
> > > existing static retry scenarios, and adjust the initial exponential retry
> > > ms to make sure we won't get easily timeout in high level due to too few
> > > attempts.
> > >
> > > Boyang
> > >
> > > On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya 
> > > wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > I’ve written a KIP about introducing exponential backoff for Kafka
> > > > clients. Would appreciate any feedback on this.
> > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > > >
> > > > Thanks,
> > > > Sanjana
> > > >
> >


Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-17 Thread Sanjana Kaundinya
Thanks for the explanation Boyang. One of the most common problems that we have 
in Kafka is with respect to metadata fetches. For example, if there is a broker 
failure, all clients start to fetch metadata at the same time and it often 
takes a while for the metadata to converge. In a high load cluster, there are 
also issues where the volume of metadata has made convergence of metadata 
slower.

For this case, exponential backoff helps as it reduces the retry rate and 
spaces out how often clients will retry, thereby bringing down the time for 
convergence. Something that Jason mentioned that would be a great addition here 
would be if the backoff should be “jittered” as it was in KIP-144 with respect 
to exponential reconnect backoff. This would help prevent the clients from 
being synchronized on when they retry, thereby spacing out the number of 
requests being sent to the broker at the same time.

I’ll add this example to the KIP and flush out more of the details - so it’s 
more clear.

On Mar 17, 2020, 1:24 PM -0700, Boyang Chen , wrote:
> Thanks for the reply Sanjana. I guess I would like to rephrase my question
> 2 and 3 as my previous response is a little bit unactionable.
>
> My specific point is that exponential backoff is not a silver bullet and we
> should consider using it to solve known problems, instead of making the
> holistic changes to all clients in Kafka ecosystem. I do like the
> exponential backoff idea and believe this would be of great value, but
> maybe we should focus on proposing some existing modules that are suffering
> from static retry, and only change them in this first KIP. If in the
> future, some other component users believe they are also suffering, we
> could get more minor KIPs to change the behavior as well.
>
> Boyang
>
> On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya 
> wrote:
>
> > Thanks for the feedback Boyang, I will revise the KIP with the
> > mathematical relations as per your suggestion. To address your feedback:
> >
> > 1. Currently, with the default of 100 ms per retry backoff, in 1 second
> > we would have 10 retries. In the case of using an exponential backoff, we
> > would have a total of 4 retries in 1 second. Thus we have less than half of
> > the amount of retries in the same timeframe and can lessen broker pressure.
> > This calculation is done as following (using the formula laid out in the
> > KIP:
> >
> > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry ms
> > is initially 100 ms)
> > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> > Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> > Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> > Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> > retry ms is 1000 ms)
> >
> > For 2 and 3, could you elaborate more about what you mean with respect to
> > client timeouts? I’m not very familiar with the Streams framework, so would
> > love to get more insight to how that currently works, with respect to
> > producer transactions, so I can appropriately update the KIP to address
> > these scenarios.
> > On Mar 13, 2020, 7:15 PM -0700, Boyang Chen ,
> > wrote:
> > > Thanks for the KIP Sanjana. I think the motivation is good, but lack of
> > > more quantitative analysis. For instance:
> > >
> > > 1. How much retries we are saving by applying the exponential retry vs
> > > static retry? There should be some mathematical relations between the
> > > static retry ms, the initial exponential retry ms, the max exponential
> > > retry ms in a given time interval.
> > > 2. How does this affect the client timeout? With exponential retry, the
> > > client shall be getting easier to timeout on a parent level caller, for
> > > instance stream attempts to retry initializing producer transactions with
> > > given 5 minute interval. With exponential retry this mechanism could
> > > experience more frequent timeout which we should be careful with.
> > > 3. With regards to #2, we should have more detailed checklist of all the
> > > existing static retry scenarios, and adjust the initial exponential retry
> > > ms to make sure we won't get easily timeout in high level due to too few
> > > attempts.
> > >
> > > Boyang
> > >
> > > On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya 
> > > wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > I’ve written a KIP about introducing exponential backoff for Kafka
> > > > clients. Would appreciate any feedback on this.
> > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > > >
> > > > Thanks,
> > > > Sanjana
> > > >
> >


Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-17 Thread Boyang Chen
Thanks for the reply Sanjana. I guess I would like to rephrase my question
2 and 3 as my previous response is a little bit unactionable.

My specific point is that exponential backoff is not a silver bullet and we
should consider using it to solve known problems, instead of making the
holistic changes to all clients in Kafka ecosystem. I do like the
exponential backoff idea and believe this would be of great value, but
maybe we should focus on proposing some existing modules that are suffering
from static retry, and only change them in this first KIP. If in the
future, some other component users believe they are also suffering, we
could get more minor KIPs to change the behavior as well.

Boyang

On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya 
wrote:

> Thanks for the feedback Boyang, I will revise the KIP with the
> mathematical relations as per your suggestion. To address your feedback:
>
> 1.  Currently, with the default of 100 ms per retry backoff, in 1 second
> we would have 10 retries. In the case of using an exponential backoff, we
> would have a total of 4 retries in 1 second. Thus we have less than half of
> the amount of retries in the same timeframe and can lessen broker pressure.
> This calculation is done as following (using the formula laid out in the
> KIP:
>
> Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry ms
> is initially 100 ms)
> Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> retry ms is 1000 ms)
>
> For 2 and 3, could you elaborate more about what you mean with respect to
> client timeouts? I’m not very familiar with the Streams framework, so would
> love to get more insight to how that currently works, with respect to
> producer transactions, so I can appropriately update the KIP to address
> these scenarios.
> On Mar 13, 2020, 7:15 PM -0700, Boyang Chen ,
> wrote:
> > Thanks for the KIP Sanjana. I think the motivation is good, but lack of
> > more quantitative analysis. For instance:
> >
> > 1. How much retries we are saving by applying the exponential retry vs
> > static retry? There should be some mathematical relations between the
> > static retry ms, the initial exponential retry ms, the max exponential
> > retry ms in a given time interval.
> > 2. How does this affect the client timeout? With exponential retry, the
> > client shall be getting easier to timeout on a parent level caller, for
> > instance stream attempts to retry initializing producer transactions with
> > given 5 minute interval. With exponential retry this mechanism could
> > experience more frequent timeout which we should be careful with.
> > 3. With regards to #2, we should have more detailed checklist of all the
> > existing static retry scenarios, and adjust the initial exponential retry
> > ms to make sure we won't get easily timeout in high level due to too few
> > attempts.
> >
> > Boyang
> >
> > On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya 
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > I’ve written a KIP about introducing exponential backoff for Kafka
> > > clients. Would appreciate any feedback on this.
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > >
> > > Thanks,
> > > Sanjana
> > >
>


Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-15 Thread Sanjana Kaundinya
Thanks for the feedback Boyang, I will revise the KIP with the mathematical 
relations as per your suggestion. To address your feedback:

1.  Currently, with the default of 100 ms per retry backoff, in 1 second we 
would have 10 retries. In the case of using an exponential backoff, we would 
have a total of 4 retries in 1 second. Thus we have less than half of the 
amount of retries in the same timeframe and can lessen broker pressure. This 
calculation is done as following (using the formula laid out in the KIP:

Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry ms is 
initially 100 ms)
Try 2 at time 100 ms, failures = 1, next retry in 200 ms
Try 3 at time 300 ms, failures = 2, next retry in 400 ms
Try 4 at time 700 ms, failures = 3, next retry in 800 ms
Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max retry 
ms is 1000 ms)

For 2 and 3, could you elaborate more about what you mean with respect to 
client timeouts? I’m not very familiar with the Streams framework, so would 
love to get more insight to how that currently works, with respect to producer 
transactions, so I can appropriately update the KIP to address these scenarios.
On Mar 13, 2020, 7:15 PM -0700, Boyang Chen , wrote:
> Thanks for the KIP Sanjana. I think the motivation is good, but lack of
> more quantitative analysis. For instance:
>
> 1. How much retries we are saving by applying the exponential retry vs
> static retry? There should be some mathematical relations between the
> static retry ms, the initial exponential retry ms, the max exponential
> retry ms in a given time interval.
> 2. How does this affect the client timeout? With exponential retry, the
> client shall be getting easier to timeout on a parent level caller, for
> instance stream attempts to retry initializing producer transactions with
> given 5 minute interval. With exponential retry this mechanism could
> experience more frequent timeout which we should be careful with.
> 3. With regards to #2, we should have more detailed checklist of all the
> existing static retry scenarios, and adjust the initial exponential retry
> ms to make sure we won't get easily timeout in high level due to too few
> attempts.
>
> Boyang
>
> On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya 
> wrote:
>
> > Hi Everyone,
> >
> > I’ve written a KIP about introducing exponential backoff for Kafka
> > clients. Would appreciate any feedback on this.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> >
> > Thanks,
> > Sanjana
> >


Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

2020-03-13 Thread Boyang Chen
Thanks for the KIP Sanjana. I think the motivation is good, but lack of
more quantitative analysis. For instance:

1. How much retries we are saving by applying the exponential retry vs
static retry? There should be some mathematical relations between the
static retry ms, the initial exponential retry ms, the max exponential
retry ms in a given time interval.
2. How does this affect the client timeout? With exponential retry, the
client shall be getting easier to timeout on a parent level caller, for
instance stream attempts to retry initializing producer transactions with
given 5 minute interval. With exponential retry this mechanism could
experience more frequent timeout which we should be careful with.
3. With regards to #2, we should have more detailed checklist of all the
existing static retry scenarios, and adjust the initial exponential retry
ms to make sure we won't get easily timeout in high level due to too few
attempts.

Boyang

On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya 
wrote:

> Hi Everyone,
>
> I’ve written a KIP about introducing exponential backoff for Kafka
> clients. Would appreciate any feedback on this.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
>
> Thanks,
> Sanjana
>