from:"Justine Olshan"

[DISCUSS] KIP-480 : Sticky Partitioner

2019-06-24 Thread Justine Olshan

Hello,
This is the discussion thread for KIP-480: Sticky Partitioner.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner

Thank you,
Justine Olshan

Permission to create KIP

2019-06-24 Thread Justine Olshan

Hi, I was wondering if I could have permission to create a KIP. My wiki
username is jolshan.

Thank you,
Justine Olshan

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-25 Thread Justine Olshan

Thank you for looking at my KIP!

I will get to work on these changes.

In addition, here is the JIRA ticket:
https://issues.apache.org/jira/browse/KAFKA-8601

Thanks again,
Justine

On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe  wrote:

> Hi Justine,
>
> The KIP discusses adding a new method to the partitioner interface.
>
> > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
>
> However, this new method doesn't give the partitioner access to the key
> and value of the message.  While this works for the case described here (no
> key), in general we might need this information when re-assigning a
> partitition based on the batch completing.  So I think we should add these
> methods to onNewBatch.
>
> Also, it would be nice to call this something like "repartitionOnNewBatch"
> or something, to make it clearer what is going on.
>
> best,
> Colin
>
> On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > ticket too?
> >
> > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe  wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.  This looks great!
> > >
> > > In one place in the KIP, you write: "Remove
> > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> the
> > > round robin functionality of the partitioner has been removed."  You
> can
> > > skip this and similar lines.  We don't need to describe changes to
> internal
> > > test classes in the KIP since they're not visible to users or external
> > > developers.
> > >
> > > It seems like maybe the performance tests should get their own section.
> > > Right now, the way the layout is makes it look like they are part of
> the
> > > "Compatibility, Deprecation, and Migration Plan"
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-25 Thread Justine Olshan

I also just noticed that if we want to use this method on the keyed record
case, I will need to move the method outside of the sticky (no key, no set
partition) check. Not a big problem, but something to keep in mind.
Perhaps, we should encapsulate the sticky vs. not behavior inside the
method? More things to think about.

On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe  wrote:

> Hi Justine,
>
> The KIP discusses adding a new method to the partitioner interface.
>
> > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
>
> However, this new method doesn't give the partitioner access to the key
> and value of the message.  While this works for the case described here (no
> key), in general we might need this information when re-assigning a
> partitition based on the batch completing.  So I think we should add these
> methods to onNewBatch.
>
> Also, it would be nice to call this something like "repartitionOnNewBatch"
> or something, to make it clearer what is going on.
>
> best,
> Colin
>
> On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > ticket too?
> >
> > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe  wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.  This looks great!
> > >
> > > In one place in the KIP, you write: "Remove
> > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> the
> > > round robin functionality of the partitioner has been removed."  You
> can
> > > skip this and similar lines.  We don't need to describe changes to
> internal
> > > test classes in the KIP since they're not visible to users or external
> > > developers.
> > >
> > > It seems like maybe the performance tests should get their own section.
> > > Right now, the way the layout is makes it look like they are part of
> the
> > > "Compatibility, Deprecation, and Migration Plan"
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-25 Thread Justine Olshan

I came up with a good solution for this and will push the commit soon. The
repartition will be called only when a partition is not manually sent.

On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe  wrote:

> Well, this is a generic partitioner method, so it shouldn't dictate any
> particular behavior.
>
> Colin
>
>
> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> > I also just noticed that if we want to use this method on the keyed
> record
> > case, I will need to move the method outside of the sticky (no key, no
> set
> > partition) check. Not a big problem, but something to keep in mind.
> > Perhaps, we should encapsulate the sticky vs. not behavior inside the
> > method? More things to think about.
> >
> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe 
> wrote:
> >
> > > Hi Justine,
> > >
> > > The KIP discusses adding a new method to the partitioner interface.
> > >
> > > > default public Integer onNewBatch(String topic, Cluster cluster) {
> ... }
> > >
> > > However, this new method doesn't give the partitioner access to the key
> > > and value of the message.  While this works for the case described
> here (no
> > > key), in general we might need this information when re-assigning a
> > > partitition based on the batch completing.  So I think we should add
> these
> > > methods to onNewBatch.
> > >
> > > Also, it would be nice to call this something like
> "repartitionOnNewBatch"
> > > or something, to make it clearer what is going on.
> > >
> > > best,
> > > Colin
> > >
> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > > Thank you Justine for the KIP! Do you mind creating a corresponding
> JIRA
> > > > ticket too?
> > > >
> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe 
> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP.  This looks great!
> > > > >
> > > > > In one place in the KIP, you write: "Remove
> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
> since
> > > the
> > > > > round robin functionality of the partitioner has been removed."
> You
> > > can
> > > > > skip this and similar lines.  We don't need to describe changes to
> > > internal
> > > > > test classes in the KIP since they're not visible to users or
> external
> > > > > developers.
> > > > >
> > > > > It seems like maybe the performance tests should get their own
> section.
> > > > > Right now, the way the layout is makes it look like they are part
> of
> > > the
> > > > > "Compatibility, Deprecation, and Migration Plan"
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > > > Hello,
> > > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > >
> > > > > > Thank you,
> > > > > > Justine Olshan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-25 Thread Justine Olshan

Also apologies on the late link to the jira, but apparently https links do
not work and it kept defaulting to an image on my desktop even when it
looked like I put the correct link in. Weird...

On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan  wrote:

> I came up with a good solution for this and will push the commit soon. The
> repartition will be called only when a partition is not manually sent.
>
> On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe  wrote:
>
>> Well, this is a generic partitioner method, so it shouldn't dictate any
>> particular behavior.
>>
>> Colin
>>
>>
>> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
>> > I also just noticed that if we want to use this method on the keyed
>> record
>> > case, I will need to move the method outside of the sticky (no key, no
>> set
>> > partition) check. Not a big problem, but something to keep in mind.
>> > Perhaps, we should encapsulate the sticky vs. not behavior inside the
>> > method? More things to think about.
>> >
>> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe 
>> wrote:
>> >
>> > > Hi Justine,
>> > >
>> > > The KIP discusses adding a new method to the partitioner interface.
>> > >
>> > > > default public Integer onNewBatch(String topic, Cluster cluster) {
>> ... }
>> > >
>> > > However, this new method doesn't give the partitioner access to the
>> key
>> > > and value of the message.  While this works for the case described
>> here (no
>> > > key), in general we might need this information when re-assigning a
>> > > partitition based on the batch completing.  So I think we should add
>> these
>> > > methods to onNewBatch.
>> > >
>> > > Also, it would be nice to call this something like
>> "repartitionOnNewBatch"
>> > > or something, to make it clearer what is going on.
>> > >
>> > > best,
>> > > Colin
>> > >
>> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
>> > > > Thank you Justine for the KIP! Do you mind creating a corresponding
>> JIRA
>> > > > ticket too?
>> > > >
>> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe 
>> wrote:
>> > > >
>> > > > > Hi Justine,
>> > > > >
>> > > > > Thanks for the KIP.  This looks great!
>> > > > >
>> > > > > In one place in the KIP, you write: "Remove
>> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
>> since
>> > > the
>> > > > > round robin functionality of the partitioner has been removed."
>> You
>> > > can
>> > > > > skip this and similar lines.  We don't need to describe changes to
>> > > internal
>> > > > > test classes in the KIP since they're not visible to users or
>> external
>> > > > > developers.
>> > > > >
>> > > > > It seems like maybe the performance tests should get their own
>> section.
>> > > > > Right now, the way the layout is makes it look like they are part
>> of
>> > > the
>> > > > > "Compatibility, Deprecation, and Migration Plan"
>> > > > >
>> > > > > best,
>> > > > > Colin
>> > > > >
>> > > > >
>> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
>> > > > > > Hello,
>> > > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > > > > >
>> > > > > > Thank you,
>> > > > > > Justine Olshan
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-27 Thread Justine Olshan

I was going through fixing some of the overloaded methods and I realized I
overloaded the RecordAccumulator constructor. I added a parameter to
include the partitioner so I can call my method. However, the tests for the
record accumulator do not have a partitioner. There is the potential for a
NPE when calling this method. Currently, none of the tests will enter the
code block, but I was wondering if would be a good idea to include a
partitioner != null in the if statement as well. I'm open to other
suggestions if this is not clear about what is going on.

Ismael,
Oh I see now. It seems like Netflix just checks if the leader is available
as well.
I'll look into the case where no replica is down.



On Thu, Jun 27, 2019 at 10:39 AM Ismael Juma  wrote:

> Hey Justine.
>
> Available could mean that some replicas are down but the leader is
> available. The suggestion was to try a partition where no replica was down
> if it's available. Such partitions are safer in general. There could be
> some downsides too, so worth thinking about the trade-offs.
>
> Ismael
>
> On Thu, Jun 27, 2019, 10:24 AM Justine Olshan 
> wrote:
>
> > Ismael,
> >
> > Thanks for the feedback!
> >
> > For 1, currently the sticky partitioner favors "available partitions."
> From
> > my understanding, these are partitions that are not under-replicated. If
> > that is not the same, please let me know.
> > As for 2, I've switched to Optional, and the few tests I've run so far
> > suggest the performance is the same.
> > And for 3, I've added a javadoc to my next commit, so that should be up
> > soon.
> >
> > Thanks,
> > Justine
> >
> > On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma  wrote:
> >
> > > Thanks for the KIP Justine. It looks pretty good. A few comments:
> > >
> > > 1. Should we favor partitions that are not under replicated? This is
> > > something that Netflix did too.
> > >
> > > 2. If there's no measurable performance difference, I agree with
> > Stanislav
> > > that Optional would be better than Integer.
> > >
> > > 3. We should include the javadoc for the newly introduced method that
> > > specifies it and its parameters. In particular, it would good to
> specify
> > if
> > > it gets called when an explicit partition id has been provided.
> > >
> > > Ismael
> > >
> > > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan 
> > wrote:
> > >
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-27 Thread Justine Olshan

Moving the previous comment to the PR discussion. :)

On Thu, Jun 27, 2019 at 10:51 AM Justine Olshan 
wrote:

> I was going through fixing some of the overloaded methods and I realized I
> overloaded the RecordAccumulator constructor. I added a parameter to
> include the partitioner so I can call my method. However, the tests for the
> record accumulator do not have a partitioner. There is the potential for a
> NPE when calling this method. Currently, none of the tests will enter the
> code block, but I was wondering if would be a good idea to include a
> partitioner != null in the if statement as well. I'm open to other
> suggestions if this is not clear about what is going on.
>
> Ismael,
> Oh I see now. It seems like Netflix just checks if the leader is available
> as well.
> I'll look into the case where no replica is down.
>
>
>
> On Thu, Jun 27, 2019 at 10:39 AM Ismael Juma  wrote:
>
>> Hey Justine.
>>
>> Available could mean that some replicas are down but the leader is
>> available. The suggestion was to try a partition where no replica was down
>> if it's available. Such partitions are safer in general. There could be
>> some downsides too, so worth thinking about the trade-offs.
>>
>> Ismael
>>
>> On Thu, Jun 27, 2019, 10:24 AM Justine Olshan 
>> wrote:
>>
>> > Ismael,
>> >
>> > Thanks for the feedback!
>> >
>> > For 1, currently the sticky partitioner favors "available partitions."
>> From
>> > my understanding, these are partitions that are not under-replicated. If
>> > that is not the same, please let me know.
>> > As for 2, I've switched to Optional, and the few tests I've run so far
>> > suggest the performance is the same.
>> > And for 3, I've added a javadoc to my next commit, so that should be up
>> > soon.
>> >
>> > Thanks,
>> > Justine
>> >
>> > On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma  wrote:
>> >
>> > > Thanks for the KIP Justine. It looks pretty good. A few comments:
>> > >
>> > > 1. Should we favor partitions that are not under replicated? This is
>> > > something that Netflix did too.
>> > >
>> > > 2. If there's no measurable performance difference, I agree with
>> > Stanislav
>> > > that Optional would be better than Integer.
>> > >
>> > > 3. We should include the javadoc for the newly introduced method that
>> > > specifies it and its parameters. In particular, it would good to
>> specify
>> > if
>> > > it gets called when an explicit partition id has been provided.
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan 
>> > wrote:
>> > >
>> > > > Hello,
>> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > > >
>> > > > Thank you,
>> > > > Justine Olshan
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-27 Thread Justine Olshan

Ismael,

Thanks for the feedback!

For 1, currently the sticky partitioner favors "available partitions." From
my understanding, these are partitions that are not under-replicated. If
that is not the same, please let me know.
As for 2, I've switched to Optional, and the few tests I've run so far
suggest the performance is the same.
And for 3, I've added a javadoc to my next commit, so that should be up
soon.

Thanks,
Justine

On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma  wrote:

> Thanks for the KIP Justine. It looks pretty good. A few comments:
>
> 1. Should we favor partitions that are not under replicated? This is
> something that Netflix did too.
>
> 2. If there's no measurable performance difference, I agree with Stanislav
> that Optional would be better than Integer.
>
> 3. We should include the javadoc for the newly introduced method that
> specifies it and its parameters. In particular, it would good to specify if
> it gets called when an explicit partition id has been provided.
>
> Ismael
>
> On Mon, Jun 24, 2019, 2:04 PM Justine Olshan  wrote:
>
> > Hello,
> > This is the discussion thread for KIP-480: Sticky Partitioner.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >
> > Thank you,
> > Justine Olshan
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-07-10 Thread Justine Olshan

Hi M,

I'm a little confused by what you mean by extending the behavior on to the
RoundRobinPartitioner.
The sticky partitioner plans to remove the round-robin behavior from
records with no keys. Instead of sending them to each partition in order,
it sends them all to the same partition until the batch is sent.
I don't think you can have both round-robin and sticky partition behavior.

Thank you,
Justine Olshan

On Wed, Jul 10, 2019 at 1:54 AM M. Manna  wrote:

> Thanks for the comments Colin.
>
> My only concern is that this KIP is addressing a good feature and having
> that extended to RoundRobinPartitioner means 1 less KIP in the future.
>
> Would it be appropriate to extend the support to RoundRobinPartitioner too?
>
> Thanks,
>
> On Tue, 9 Jul 2019 at 17:24, Colin McCabe  wrote:
>
> > Hi M,
> >
> > The RoundRobinPartitioner added by KIP-369 doesn't interact with this
> > KIP.  If you configure your producer to use RoundRobinPartitioner, then
> the
> > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > implemented only in the DefaultPartitioner.
> >
> > regards,
> > Colin
> >
> >
> > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > Hello Justine,
> > >
> > > I have one item I wanted to discuss.
> > >
> > > We are currently in review stage for KAFKA- where we can choose
> > always
> > > RoundRobin regardless of null/usable key.
> > >
> > > If I understood this KIP motivation correctly, you are still honouring
> > how
> > > the hashing of key works for DefaultPartitioner. Would you say that
> > having
> > > an always "Round-Robin" partitioning with "Sticky" assignment
> (efficient
> > > batching of records for a partition) doesn't deviate from your original
> > > intention?
> > >
> > > Thanks,
> > >
> > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan 
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > If there are no more comments or concerns, I would like to start the
> > vote
> > > > on this tomorrow afternoon.
> > > >
> > > > However, if there are still topics to discuss, feel free to bring
> them
> > up
> > > > now.
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan 
> > > > wrote:
> > > >
> > > > > Hello again,
> > > > >
> > > > > Another update to the interface has been made to the KIP.
> > > > > Please let me know if you have any feedback!
> > > > >
> > > > > Thank you,
> > > > > Justine
> > > > >
> > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> jols...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >> I made some changes to the KIP.
> > > > >> The idea is to clean up the code, make behavior more explicit,
> > provide
> > > > >> more flexibility, and to keep default behavior the same.
> > > > >>
> > > > >> Now we will change the partition in onNewBatch, and specify the
> > > > >> conditions for this function call (non-keyed values, no explicit
> > > > >> partitions) in willCallOnNewBatch.
> > > > >> This clears up some of the issues with the implementation. I'm
> > happy to
> > > > >> hear further opinions and discuss this change!
> > > > >>
> > > > >> Thank you,
> > > > >> Justine
> > > > >>
> > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe 
> > > > wrote:
> > > > >>
> > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> comments:
> > > > >>> >
> > > > >>> > 1. Should we favor partitions that are not under replicated?
> > This is
> > > > >>> > something that Netflix did too.
> > > > >>>
> > > > >>> This seems like it could lead to cascading failures, right?  If a
> > > > >>> partition becomes under-replicated because there is too much
> > traffic,
> > > > the
> > > > >>> pro

[DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-11 Thread Justine Olshan

Hello all,

I'd like to start a discussion thread for KIP-487.
This KIP plans to deprecate the current system of auto-creating topics
through requests to the metadata and give the producer the ability to
automatically create topics instead.

More information can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer

Thank you,
Justine Olshan

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-11 Thread Justine Olshan

Hi Dhruvil,

Thanks for reading the KIP!
That was the general idea for deprecation. We would log a warning when the
config is enabled on the broker.
I also believe that there would be a change to documentation.
If there is anything else that should be done, please let me know!

Justine

On Thu, Jul 11, 2019 at 4:17 PM Dhruvil Shah  wrote:

> Hi Justine,
>
> Thanks for the KIP, this is great!
>
> Could you add some more information about what deprecating the broker
> configuration means? Would we log a warning in the logs when auto topic
> creation is enabled on the broker, for example?
>
> Thanks,
> Dhruvil
>
> On Thu, Jul 11, 2019 at 10:28 AM Justine Olshan 
> wrote:
>
> > Hello all,
> >
> > I'd like to start a discussion thread for KIP-487.
> > This KIP plans to deprecate the current system of auto-creating topics
> > through requests to the metadata and give the producer the ability to
> > automatically create topics instead.
> >
> > More information can be found here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer
> >
> > Thank you,
> > Justine Olshan
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-07-12 Thread Justine Olshan

Hello all,

Jun, thanks for taking a look at my KIP! We were also concerned about
batches containing a single record so we kept this in mind for the
implementation. The decision to switch the sticky partition actually
involves returning from the record accumulator and assigning the new
partition before the new batch is created. That way all of the records will
go to this new partition's batch. If you would like to get a better look at
how this works, please check out the PR:
https://github.com/apache/kafka/pull/6997/files. The most important lines
are in the append method of the RecordAccumulator and doSend in
KafkaProducer.

Colin, I think this makes sense to me except for the name
StickyRoundRobinPartitioner seems to not really explain the behavior of
what would be implemented. Perhaps a name indicating the sticky behavior is
always used, or that it will be used on keys is more descriptive. Calling
it "RoundRobin" seems a bit misleading to me.

Thanks again for reviewing,
Justine

On Thu, Jul 11, 2019 at 6:07 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the KIP. Nice writeup and great results. Just one comment.
>
> 100. To add a record to the accumulator, the producer needs to know the
> partition id. The decision of whether the record can be added to the
> current batch is only made after the accumulator.append() call. So, when a
> batch is full, it seems that the KIP will try to append the next record to
> the same partition, which will trigger the creation of a new batch with a
> single record. After that, new records will be routed to a new partition.
> If the producer doesn't come back to the first partition in time, the
> producer will send a single record batch. In the worse case, it can be that
> every other batch has only a single record. Is this correct? If so, could
> we avoid that?
>
> Jun
>
> On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe  wrote:
>
> > Hi Justine,
> >
> > I agree that we shouldn't change RoundRobinPartitioner, since its
> behavior
> > is already specified.
> >
> > However, we could add a new, separate StickyRoundRobinPartitioner class
> to
> > KIP-480 which just implemented the sticky behavior regardless of whether
> > the key was null.  That seems pretty easy to add (and it wouldn't have to
> > implemented right away in the first PR, of course.)  It would be an
> option
> > for people who wanted to configure this behavior.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > Hi M,
> > >
> > > I'm a little confused by what you mean by extending the behavior on to
> > the
> > > RoundRobinPartitioner.
> > > The sticky partitioner plans to remove the round-robin behavior from
> > > records with no keys. Instead of sending them to each partition in
> order,
> > > it sends them all to the same partition until the batch is sent.
> > > I don't think you can have both round-robin and sticky partition
> > behavior.
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna  wrote:
> > >
> > > > Thanks for the comments Colin.
> > > >
> > > > My only concern is that this KIP is addressing a good feature and
> > having
> > > > that extended to RoundRobinPartitioner means 1 less KIP in the
> future.
> > > >
> > > > Would it be appropriate to extend the support to
> RoundRobinPartitioner
> > too?
> > > >
> > > > Thanks,
> > > >
> > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe 
> wrote:
> > > >
> > > > > Hi M,
> > > > >
> > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with
> this
> > > > > KIP.  If you configure your producer to use RoundRobinPartitioner,
> > then
> > > > the
> > > > > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > > > > implemented only in the DefaultPartitioner.
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > > Hello Justine,
> > > > > >
> > > > > > I have one item I wanted to discuss.
> > > > > >
> > > > > > We are currently in review stage for KAFKA- where we can
> choose
> > > > > always
> > > > > > RoundRobin regardless of null/usable key.
> > >

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-12 Thread Justine Olshan

Hi Colin,

Thanks for looking at the KIP. I can definitely add to the title to make it
more clear.

It makes sense that both configurations could be turned on since there are
many cases where the user can not control the server-side configurations. I
was a little concerned about how both interacting would work out -- if
there would be to many requests for new topics, for example. But it since
it does make sense to allow both configurations enabled, I will test out
some scenarios and I'll change the KIP to support this.

I also agree with documentation about distinguishing the differences. I was
having some trouble with the wording but I like the phrases "server-side"
and "client-side." That's a good distinction I can use when describing.

I'll try to update the KIP soon keeping everyone's input in mind.

Thanks,
Justine

On Thu, Jul 11, 2019 at 5:39 PM Colin McCabe  wrote:

> Hi Justine,
>
> Thanks for the KIP.  This seems like a good step towards removing
> server-side topic auto-creation.
>
> We should add included "client-side" to the title of the KIP somewhere, to
> make it clear that we're talking about client-side auto creation.
>
> The KIP says:
> > In order to automatically create topics with the producer, the
> producer's
> > auto.create.topics.enable config must be set to true and the broker
> config should be set to false
>
> From a user's point of view, this seems counter-intuitive.  In order to
> auto-create topics the broker's auto.create.topics.enable config should be
> set to false?  It seems like the server-side auto-create is unrelated to
> the client-side auto-create.  We could have both turned on (and I'm sure
> that in the real world, people will try this configuration...)  There's no
> reason not to support this, I think.
>
> We should add some documentation explaining the difference between
> server-side and client-side auto-creation.  Without documentation, an admin
> might think that they had disabled all forms of auto-creation by setting
> the -side setting to false-- but this is not the case, of course.
>
> best,
> Colin
>
>
> On Thu, Jul 11, 2019, at 16:22, Justine Olshan wrote:
> > Hi Dhruvil,
> >
> > Thanks for reading the KIP!
> > That was the general idea for deprecation. We would log a warning when
> the
> > config is enabled on the broker.
> > I also believe that there would be a change to documentation.
> > If there is anything else that should be done, please let me know!
> >
> > Justine
> >
> > On Thu, Jul 11, 2019 at 4:17 PM Dhruvil Shah 
> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP, this is great!
> > >
> > > Could you add some more information about what deprecating the broker
> > > configuration means? Would we log a warning in the logs when auto topic
> > > creation is enabled on the broker, for example?
> > >
> > > Thanks,
> > > Dhruvil
> > >
> > > On Thu, Jul 11, 2019 at 10:28 AM Justine Olshan 
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I'd like to start a discussion thread for KIP-487.
> > > > This KIP plans to deprecate the current system of auto-creating
> topics
> > > > through requests to the metadata and give the producer the ability to
> > > > automatically create topics instead.
> > > >
> > > > More information can be found here:
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-07-12 Thread Justine Olshan

Hi Colin,

The first thing that comes to mind is AlwaysStickyPartitioner, but maybe I
could think about it a bit more and come up with something better.

The only reason why I hesitate about the RoundRobin part is that it doesn't
really distinguish that this partitioner will have this behavior on keyed
values. I know this is the case for the RoundRobinPartitioner, but unless
one is familiar with the code, it might not be obvious on first glance.

I can definitely include this in the KIP once we get a name settled.

Thanks,
Justine

On Fri, Jul 12, 2019 at 11:45 AM Colin McCabe  wrote:

> On Fri, Jul 12, 2019, at 09:02, Justine Olshan wrote:
> > Hello all,
> >
> > Jun, thanks for taking a look at my KIP! We were also concerned about
> > batches containing a single record so we kept this in mind for the
> > implementation. The decision to switch the sticky partition actually
> > involves returning from the record accumulator and assigning the new
> > partition before the new batch is created. That way all of the records
> will
> > go to this new partition's batch. If you would like to get a better look
> at
> > how this works, please check out the PR:
> > https://github.com/apache/kafka/pull/6997/files. The most important
> lines
> > are in the append method of the RecordAccumulator and doSend in
> > KafkaProducer.
>
> Thanks for the explanation.
>
> >
> > Colin, I think this makes sense to me except for the name
> > StickyRoundRobinPartitioner seems to not really explain the behavior of
> > what would be implemented. Perhaps a name indicating the sticky behavior
> is
> > always used, or that it will be used on keys is more descriptive. Calling
> > it "RoundRobin" seems a bit misleading to me.
>
> Hmm, what name would you propose here?
>
> Keep in mind we don't have to implement the new configurable partitioner
> in the initial PR :)
>
> best,
> Colin
>
> >
> > Thanks again for reviewing,
> > Justine
> >
> > On Thu, Jul 11, 2019 at 6:07 PM Jun Rao  wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the KIP. Nice writeup and great results. Just one comment.
> > >
> > > 100. To add a record to the accumulator, the producer needs to know the
> > > partition id. The decision of whether the record can be added to the
> > > current batch is only made after the accumulator.append() call. So,
> when a
> > > batch is full, it seems that the KIP will try to append the next
> record to
> > > the same partition, which will trigger the creation of a new batch
> with a
> > > single record. After that, new records will be routed to a new
> partition.
> > > If the producer doesn't come back to the first partition in time, the
> > > producer will send a single record batch. In the worse case, it can be
> that
> > > every other batch has only a single record. Is this correct? If so,
> could
> > > we avoid that?
> > >
> > > Jun
> > >
> > > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe 
> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > I agree that we shouldn't change RoundRobinPartitioner, since its
> > > behavior
> > > > is already specified.
> > > >
> > > > However, we could add a new, separate StickyRoundRobinPartitioner
> class
> > > to
> > > > KIP-480 which just implemented the sticky behavior regardless of
> whether
> > > > the key was null.  That seems pretty easy to add (and it wouldn't
> have to
> > > > implemented right away in the first PR, of course.)  It would be an
> > > option
> > > > for people who wanted to configure this behavior.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > > > Hi M,
> > > > >
> > > > > I'm a little confused by what you mean by extending the behavior
> on to
> > > > the
> > > > > RoundRobinPartitioner.
> > > > > The sticky partitioner plans to remove the round-robin behavior
> from
> > > > > records with no keys. Instead of sending them to each partition in
> > > order,
> > > > > it sends them all to the same partition until the batch is sent.
> > > > > I don't think you can have both round-robin and sticky partition
> > > > behavior.
> > > > >
> > > > > Thank you,
> > > > > Justine Olsh

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-12 Thread Justine Olshan

Just a quick update--

It seems that enabling both the broker and producer configs works fine,
except that the broker configurations for partitions, replication factor
take precedence.
I don't know if that is something we would want to change, but I'll be
updating the KIP for now to reflect this. Perhaps we would want to add more
to the documentation of the the producer configs to clarify.

Thank you,
Justine

On Fri, Jul 12, 2019 at 9:28 AM Justine Olshan  wrote:

> Hi Colin,
>
> Thanks for looking at the KIP. I can definitely add to the title to make
> it more clear.
>
> It makes sense that both configurations could be turned on since there are
> many cases where the user can not control the server-side configurations. I
> was a little concerned about how both interacting would work out -- if
> there would be to many requests for new topics, for example. But it since
> it does make sense to allow both configurations enabled, I will test out
> some scenarios and I'll change the KIP to support this.
>
> I also agree with documentation about distinguishing the differences. I
> was having some trouble with the wording but I like the phrases
> "server-side" and "client-side." That's a good distinction I can use when
> describing.
>
> I'll try to update the KIP soon keeping everyone's input in mind.
>
> Thanks,
> Justine
>
> On Thu, Jul 11, 2019 at 5:39 PM Colin McCabe  wrote:
>
>> Hi Justine,
>>
>> Thanks for the KIP.  This seems like a good step towards removing
>> server-side topic auto-creation.
>>
>> We should add included "client-side" to the title of the KIP somewhere,
>> to make it clear that we're talking about client-side auto creation.
>>
>> The KIP says:
>> > In order to automatically create topics with the producer, the
>> producer's
>> > auto.create.topics.enable config must be set to true and the broker
>> config should be set to false
>>
>> From a user's point of view, this seems counter-intuitive.  In order to
>> auto-create topics the broker's auto.create.topics.enable config should be
>> set to false?  It seems like the server-side auto-create is unrelated to
>> the client-side auto-create.  We could have both turned on (and I'm sure
>> that in the real world, people will try this configuration...)  There's no
>> reason not to support this, I think.
>>
>> We should add some documentation explaining the difference between
>> server-side and client-side auto-creation.  Without documentation, an admin
>> might think that they had disabled all forms of auto-creation by setting
>> the -side setting to false-- but this is not the case, of course.
>>
>> best,
>> Colin
>>
>>
>> On Thu, Jul 11, 2019, at 16:22, Justine Olshan wrote:
>> > Hi Dhruvil,
>> >
>> > Thanks for reading the KIP!
>> > That was the general idea for deprecation. We would log a warning when
>> the
>> > config is enabled on the broker.
>> > I also believe that there would be a change to documentation.
>> > If there is anything else that should be done, please let me know!
>> >
>> > Justine
>> >
>> > On Thu, Jul 11, 2019 at 4:17 PM Dhruvil Shah 
>> wrote:
>> >
>> > > Hi Justine,
>> > >
>> > > Thanks for the KIP, this is great!
>> > >
>> > > Could you add some more information about what deprecating the broker
>> > > configuration means? Would we log a warning in the logs when auto
>> topic
>> > > creation is enabled on the broker, for example?
>> > >
>> > > Thanks,
>> > > Dhruvil
>> > >
>> > > On Thu, Jul 11, 2019 at 10:28 AM Justine Olshan > >
>> > > wrote:
>> > >
>> > > > Hello all,
>> > > >
>> > > > I'd like to start a discussion thread for KIP-487.
>> > > > This KIP plans to deprecate the current system of auto-creating
>> topics
>> > > > through requests to the metadata and give the producer the ability
>> to
>> > > > automatically create topics instead.
>> > > >
>> > > > More information can be found here:
>> > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Automatic+Topic+Creation+on+Producer
>> > > >
>> > > > Thank you,
>> > > > Justine Olshan
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-07-08 Thread Justine Olshan

Hello all,

If there are no more comments or concerns, I would like to start the vote
on this tomorrow afternoon.

However, if there are still topics to discuss, feel free to bring them up
now.

Thank you,
Justine

On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan  wrote:

> Hello again,
>
> Another update to the interface has been made to the KIP.
> Please let me know if you have any feedback!
>
> Thank you,
> Justine
>
> On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan 
> wrote:
>
>> Hi all,
>> I made some changes to the KIP.
>> The idea is to clean up the code, make behavior more explicit, provide
>> more flexibility, and to keep default behavior the same.
>>
>> Now we will change the partition in onNewBatch, and specify the
>> conditions for this function call (non-keyed values, no explicit
>> partitions) in willCallOnNewBatch.
>> This clears up some of the issues with the implementation. I'm happy to
>> hear further opinions and discuss this change!
>>
>> Thank you,
>> Justine
>>
>> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe  wrote:
>>
>>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
>>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
>>> >
>>> > 1. Should we favor partitions that are not under replicated? This is
>>> > something that Netflix did too.
>>>
>>> This seems like it could lead to cascading failures, right?  If a
>>> partition becomes under-replicated because there is too much traffic, the
>>> producer stops sending to it, which puts even more load on the remaining
>>> partitions, which are even more likely to fail then, etc.  It also will
>>> create unbalanced load patterns on the consumers.
>>>
>>> >
>>> > 2. If there's no measurable performance difference, I agree with
>>> Stanislav
>>> > that Optional would be better than Integer.
>>> >
>>> > 3. We should include the javadoc for the newly introduced method that
>>> > specifies it and its parameters. In particular, it would good to
>>> specify if
>>> > it gets called when an explicit partition id has been provided.
>>>
>>> Agreed.
>>>
>>> best,
>>> Colin
>>>
>>> >
>>> > Ismael
>>> >
>>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan 
>>> wrote:
>>> >
>>> > > Hello,
>>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
>>> > >
>>> > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>>> > >
>>> > > Thank you,
>>> > > Justine Olshan
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-07-02 Thread Justine Olshan

Hello again,

Another update to the interface has been made to the KIP.
Please let me know if you have any feedback!

Thank you,
Justine

On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan  wrote:

> Hi all,
> I made some changes to the KIP.
> The idea is to clean up the code, make behavior more explicit, provide
> more flexibility, and to keep default behavior the same.
>
> Now we will change the partition in onNewBatch, and specify the conditions
> for this function call (non-keyed values, no explicit partitions) in
> willCallOnNewBatch.
> This clears up some of the issues with the implementation. I'm happy to
> hear further opinions and discuss this change!
>
> Thank you,
> Justine
>
> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe  wrote:
>
>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
>> >
>> > 1. Should we favor partitions that are not under replicated? This is
>> > something that Netflix did too.
>>
>> This seems like it could lead to cascading failures, right?  If a
>> partition becomes under-replicated because there is too much traffic, the
>> producer stops sending to it, which puts even more load on the remaining
>> partitions, which are even more likely to fail then, etc.  It also will
>> create unbalanced load patterns on the consumers.
>>
>> >
>> > 2. If there's no measurable performance difference, I agree with
>> Stanislav
>> > that Optional would be better than Integer.
>> >
>> > 3. We should include the javadoc for the newly introduced method that
>> > specifies it and its parameters. In particular, it would good to
>> specify if
>> > it gets called when an explicit partition id has been provided.
>>
>> Agreed.
>>
>> best,
>> Colin
>>
>> >
>> > Ismael
>> >
>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan 
>> wrote:
>> >
>> > > Hello,
>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > >
>> > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > >
>> > > Thank you,
>> > > Justine Olshan
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-26 Thread Justine Olshan

Stanislav,
Thank you for looking at my KIP!

I did discuss with Colin about whether the null vs. Optional types and we
did not come to a strong conclusion either way.
I'd be happy to change it if it makes the logic more clear.

Thanks,
Justine

On Wed, Jun 26, 2019 at 2:46 PM Stanislav Kozlovski 
wrote:

> Hey Justine,
>
> Thanks for the KIP! I am impressed by the performance results linked in the
> KIP and I like the data-driven approach. This looks like a great
> improvement.
>
> I had one minor question regarding the public interface
> `repartitionOnNewBatch` where we return null in the case of no change
> needed. Have we considered using Java's Optional type to avoid null values?
>
> Best,
> Stanislav
>
> On Tue, Jun 25, 2019 at 11:29 PM Colin McCabe  wrote:
>
> > No worries.  Thanks for fixing it!
> > C.
> >
> > On Tue, Jun 25, 2019, at 13:47, Justine Olshan wrote:
> > > Also apologies on the late link to the jira, but apparently https links
> > do
> > > not work and it kept defaulting to an image on my desktop even when it
> > > looked like I put the correct link in. Weird...
> > >
> > > On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan 
> > wrote:
> > >
> > > > I came up with a good solution for this and will push the commit
> soon.
> > The
> > > > repartition will be called only when a partition is not manually
> sent.
> > > >
> > > > On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe 
> > wrote:
> > > >
> > > >> Well, this is a generic partitioner method, so it shouldn't dictate
> > any
> > > >> particular behavior.
> > > >>
> > > >> Colin
> > > >>
> > > >>
> > > >> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> > > >> > I also just noticed that if we want to use this method on the
> keyed
> > > >> record
> > > >> > case, I will need to move the method outside of the sticky (no
> key,
> > no
> > > >> set
> > > >> > partition) check. Not a big problem, but something to keep in
> mind.
> > > >> > Perhaps, we should encapsulate the sticky vs. not behavior inside
> > the
> > > >> > method? More things to think about.
> > > >> >
> > > >> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe  >
> > > >> wrote:
> > > >> >
> > > >> > > Hi Justine,
> > > >> > >
> > > >> > > The KIP discusses adding a new method to the partitioner
> > interface.
> > > >> > >
> > > >> > > > default public Integer onNewBatch(String topic, Cluster
> > cluster) {
> > > >> ... }
> > > >> > >
> > > >> > > However, this new method doesn't give the partitioner access to
> > the
> > > >> key
> > > >> > > and value of the message.  While this works for the case
> described
> > > >> here (no
> > > >> > > key), in general we might need this information when
> re-assigning
> > a
> > > >> > > partitition based on the batch completing.  So I think we should
> > add
> > > >> these
> > > >> > > methods to onNewBatch.
> > > >> > >
> > > >> > > Also, it would be nice to call this something like
> > > >> "repartitionOnNewBatch"
> > > >> > > or something, to make it clearer what is going on.
> > > >> > >
> > > >> > > best,
> > > >> > > Colin
> > > >> > >
> > > >> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > >> > > > Thank you Justine for the KIP! Do you mind creating a
> > corresponding
> > > >> JIRA
> > > >> > > > ticket too?
> > > >> > > >
> > > >> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <
> > cmcc...@apache.org>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Hi Justine,
> > > >> > > > >
> > > >> > > > > Thanks for the KIP.  This looks great!
> > > >> > > > >
> > > >> > > > > In one place in the KIP, you write: "Remove
> > > >> > > > > testRoundRobinWithUnavailablePartitions() and
> testRoundRobin()
> > > >> since
> > > >> > > the
> > > >> > > > > round robin functionality of the partitioner has been
> > removed."
> > > >> You
> > > >> > > can
> > > >> > > > > skip this and similar lines.  We don't need to describe
> > changes to
> > > >> > > internal
> > > >> > > > > test classes in the KIP since they're not visible to users
> or
> > > >> external
> > > >> > > > > developers.
> > > >> > > > >
> > > >> > > > > It seems like maybe the performance tests should get their
> own
> > > >> section.
> > > >> > > > > Right now, the way the layout is makes it look like they are
> > part
> > > >> of
> > > >> > > the
> > > >> > > > > "Compatibility, Deprecation, and Migration Plan"
> > > >> > > > >
> > > >> > > > > best,
> > > >> > > > > Colin
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > >> > > > > > Hello,
> > > >> > > > > > This is the discussion thread for KIP-480: Sticky
> > Partitioner.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >> > > > > >
> > > >> > > > > > Thank you,
> > > >> > > > > > Justine Olshan
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

2019-06-28 Thread Justine Olshan

Hi all,
I made some changes to the KIP.
The idea is to clean up the code, make behavior more explicit, provide more
flexibility, and to keep default behavior the same.

Now we will change the partition in onNewBatch, and specify the conditions
for this function call (non-keyed values, no explicit partitions) in
willCallOnNewBatch.
This clears up some of the issues with the implementation. I'm happy to
hear further opinions and discuss this change!

Thank you,
Justine

On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe  wrote:

> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> >
> > 1. Should we favor partitions that are not under replicated? This is
> > something that Netflix did too.
>
> This seems like it could lead to cascading failures, right?  If a
> partition becomes under-replicated because there is too much traffic, the
> producer stops sending to it, which puts even more load on the remaining
> partitions, which are even more likely to fail then, etc.  It also will
> create unbalanced load patterns on the consumers.
>
> >
> > 2. If there's no measurable performance difference, I agree with
> Stanislav
> > that Optional would be better than Integer.
> >
> > 3. We should include the javadoc for the newly introduced method that
> > specifies it and its parameters. In particular, it would good to specify
> if
> > it gets called when an explicit partition id has been provided.
>
> Agreed.
>
> best,
> Colin
>
> >
> > Ismael
> >
> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan 
> wrote:
> >
> > > Hello,
> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> >
>

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-31 Thread Justine Olshan

Hi Mickael,
I agree that KIP-464 works on newer brokers, but I was a bit worried how
things would play out on older brokers that* do not *have KIP 464 included.
Is it enough to throw an error in this case when producer configs are not
specified?

Thanks,
Justine

On Wed, Jul 31, 2019 at 9:10 AM Mickael Maison 
wrote:

> Hi Justine,
>
> We can rely on KIP-464 which allows to omit the partition count or
> replication factor when creating a topic. In that case, the broker
> defaults are used.
>
> On Wed, Jul 31, 2019 at 4:55 PM Justine Olshan 
> wrote:
> >
> > Michael,
> > That makes sense to me!
> > To clarify, in the current state of the KIP, the producer does not rely
> on
> > the broker to autocreate--if the broker's config is disabled, then the
> > producer can autocreate on its own with a create topic request (the same
> > type of request the admin client uses).
> > However, if both configs are enabled, the broker will autocreate through
> a
> > metadata request before the producer gets a chance.
> > Of course, the way to avoid this, is to do as you suggested, and set the
> > "allow_auto_topic_creation" field to false.
> >
> > I think the only thing we need to be careful with in this setup is
> without
> > KIP 464, we can not use broker defaults for this topic. A user needs to
> > specify the number of partition and replication factor in the config.
> > An alternative to this is to have coded defaults for when these configs
> are
> > unspecified, but it is not immediately apparent what these defaults
> should
> > be.
> >
> > Thanks again for reading my KIP,
> > Justine
> >
> > On Wed, Jul 31, 2019 at 4:19 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the response!
> > > In my opinion, it would be better if the producer did not rely at all
> > > on the broker auto create feature as this is what we're aiming to
> > > deprecate. When requesting metadata we can set the
> > > "allow_auto_topic_creation" field to false to avoid the broker auto
> > > creation. Then if the topic is not existing, send a
> > > CreateTopicRequest.
> > >
> > > What do you think?
> > >
> > > On Mon, Jul 29, 2019 at 6:34 PM Justine Olshan 
> > > wrote:
> > > >
> > > > Currently the way it is implemented, the broker auto-creation
> > > configuration
> > > > takes precedence. The producer will not use the CreateTopics request.
> > > > (Technically it can--but the topic will already be created through
> the
> > > > broker, so it will never try to create the topic.)
> > > > It is possible to change this however, and I'd be happy to discuss
> the
> > > > benefits of this alternative.
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Mon, Jul 29, 2019 at 10:26 AM Mickael Maison <
> > > mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > In case auto creation is enabled on both the client and server,
> will
> > > > > the producer still use the AdminClient (CreateTopics request) to
> > > > > create topics? and not rely on the broker auto create.
> > > > > I'm guessing the answer is yes but can you make it explicit.
> > > > >
> > > > > Thank you
> > > > >
> > > > > On Wed, Jul 24, 2019 at 6:23 PM Justine Olshan <
> jols...@confluent.io>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > > Just a friendly reminder to take a look at this KIP if you have
> the
> > > time.
> > > > > >
> > > > > > I was thinking about broker vs. client default precedence, and I
> > > think it
> > > > > > makes sense to keep the broker as the default used when both
> > > client-side
> > > > > > and broker-side defaults are configured. The idea is that there
> > > would be
> > > > > > pretty clear documentation, and that many systems with
> configurations
> > > > > that
> > > > > > the client could not change would likely have the auto-create
> default
> > > > > off.
> > > > > > (In cloud for example).
> > > > > >
> > >

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-31 Thread Justine Olshan

Michael,
That makes sense to me!
To clarify, in the current state of the KIP, the producer does not rely on
the broker to autocreate--if the broker's config is disabled, then the
producer can autocreate on its own with a create topic request (the same
type of request the admin client uses).
However, if both configs are enabled, the broker will autocreate through a
metadata request before the producer gets a chance.
Of course, the way to avoid this, is to do as you suggested, and set the
"allow_auto_topic_creation" field to false.

I think the only thing we need to be careful with in this setup is without
KIP 464, we can not use broker defaults for this topic. A user needs to
specify the number of partition and replication factor in the config.
An alternative to this is to have coded defaults for when these configs are
unspecified, but it is not immediately apparent what these defaults should
be.

Thanks again for reading my KIP,
Justine

On Wed, Jul 31, 2019 at 4:19 AM Mickael Maison 
wrote:

> Hi Justine,
>
> Thanks for the response!
> In my opinion, it would be better if the producer did not rely at all
> on the broker auto create feature as this is what we're aiming to
> deprecate. When requesting metadata we can set the
> "allow_auto_topic_creation" field to false to avoid the broker auto
> creation. Then if the topic is not existing, send a
> CreateTopicRequest.
>
> What do you think?
>
> On Mon, Jul 29, 2019 at 6:34 PM Justine Olshan 
> wrote:
> >
> > Currently the way it is implemented, the broker auto-creation
> configuration
> > takes precedence. The producer will not use the CreateTopics request.
> > (Technically it can--but the topic will already be created through the
> > broker, so it will never try to create the topic.)
> > It is possible to change this however, and I'd be happy to discuss the
> > benefits of this alternative.
> >
> > Thank you,
> > Justine
> >
> > On Mon, Jul 29, 2019 at 10:26 AM Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP!
> > >
> > > In case auto creation is enabled on both the client and server, will
> > > the producer still use the AdminClient (CreateTopics request) to
> > > create topics? and not rely on the broker auto create.
> > > I'm guessing the answer is yes but can you make it explicit.
> > >
> > > Thank you
> > >
> > > On Wed, Jul 24, 2019 at 6:23 PM Justine Olshan 
> > > wrote:
> > > >
> > > > Hi,
> > > > Just a friendly reminder to take a look at this KIP if you have the
> time.
> > > >
> > > > I was thinking about broker vs. client default precedence, and I
> think it
> > > > makes sense to keep the broker as the default used when both
> client-side
> > > > and broker-side defaults are configured. The idea is that there
> would be
> > > > pretty clear documentation, and that many systems with configurations
> > > that
> > > > the client could not change would likely have the auto-create default
> > > off.
> > > > (In cloud for example).
> > > >
> > > > It also seems like in most cases, the consumer config
> > > > 'allow.auto.create.topics' was created to actually prevent the
> creation
> > > of
> > > > topics, so the loss of creation functionality will not be a big
> problem.
> > > >
> > > >  I'm happy to discuss any other compatibility problems or components
> of
> > > > this KIP.
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Wed, Jul 17, 2019 at 9:11 AM Justine Olshan  >
> > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I was looking at this KIP again, and there is a decision I made
> that I
> > > > > think is worth discussing.
> > > > >
> > > > > In the case where both the broker and producer's
> > > > > 'auto.create.topics.enable' are set to true, we have to choose
> either
> > > the
> > > > > broker configs or the producer configs for the replication
> > > > > factor/partitions.
> > > > >
> > > > > Currently, the decision is to have the broker defaults take
> precedence.
> > > > > (It is easier to do this in the implementation.) It also makes some
> > > sense
> > > > > for this behavior to take precedence since this behavior a

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-08-07 Thread Justine Olshan

; > On Tue, Aug 06, 2019 at 7:41 AM, Ismael Juma 
> > > wrote:
> > > > > >
> > > > > > > Hi Harsha,
> > > > > > >
> > > > > > > I mentioned policies and the authorizer. For example, with
> > > > > > > CreateTopicPolicy, you can implement the limits you describe.
> If
> > > you
> > > > > have
> > > > > > > ideas of how that should be improved, please submit a KIP. My
> > > point is
> > > > > that
> > > > > > > this KIP is not introducing any new functionality with regards
> to
> > > what
> > > > > > > rogue clients can do. It's using the existing protocol that is
> > > already
> > > > > > > exposed via the AdminClient. So, I don't think we need to
> address
> > > it in
> > > > > > > this KIP. Does that make sense?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Tue, Aug 6, 2019 at 7:13 AM Harsha Chintalapani <
> > > ka...@harsha.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > Ismael,
> > > > > > > Sure AdminClient can do that and we should've shipped a config
> or
> > > use
> > > > > the
> > > > > > > existing one to block that. Not all users are yet to upgrade to
> > > > > AdminClient
> > > > > > > and start using that to cause issues yet. In shared
> environment we
> > > > > should
> > > > > > > allow server to set sane defaults and not allow every client
> to go
> > > > > ahead
> > > > > > > create random no.of topic/partitions and replication factor.
> Even
> > > if
> > > > > the
> > > > > > > users want to allow topic creation proposed in the KIP , it
> makes
> > > > > sense to
> > > > > > > have some guards against the no.of partitions and replication
> > > factor.
> > > > > > > Authorizer is not always an answer to block requests and having
> > > users
> > > > > set
> > > > > > > server side configs to protect a multi-tenant environment is
> > > required.
> > > > > In a
> > > > > > > non-secure environment Authorizer is a blunt instrument either
> you
> > > end
> > > > > up
> > > > > > > blocking everyone or allowing everyone.
> > > > > > > I am asking to have server side that allow clients to create
> > > topics or
> > > > > not
> > > > > > > , if they are allowed set a ceiling on max no.of partitions and
> > > > > > > replication-factor.
> > > > > > >
> > > > > > > -Harsha
> > > > > > >
> > > > > > > On Mon, Aug 5 2019 at 8:58 PM,  wrote:
> > > > > > >
> > > > > > > Harsha,
> > > > > > >
> > > > > > > Rogue clients can use the admin client to create topics and
> > > partitions.
> > > > > > > ACLs and policies can help in that case as well as this one.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Mon, Aug 5, 2019, 7:48 PM Harsha Chintalapani <
> ka...@harsha.io>
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > Hi Justine,
> > > > > > > Thanks for the KIP.
> > > > > > > "When server-side auto-creation is disabled, client-side
> > > auto-creation
> > > > > > > will try to use client-side configurations"
> > > > > > > If I understand correctly, this KIP is removing any server-side
> > > > > blocking
> > > > > > > client auto creation of topic?
> > > > > > > if so this will present potential issue of rogue client
> creating
> > > ton of
> > > > > > > topic-partitions and potentially bringing down the service for
> > > everyone
> > > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > degrade the service itself.
> > >

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-08-07 Thread Justine Olshan

> >
> > delete
> >
> > records with the kafka-delete-records. sh ( http://kafka-delete-records.
> > sh/
> > ) command, or delete topics with kafka-topics. sh ( http://kafka-topics.
> > sh/
> > ). Trusting them not to set a certain config value seems minor in
> > comparison, right?
> >
> > best,
> > Colin
> >
> > On Tue, Aug 6, 2019, at 10:49, Harsha Chintalapani wrote:
> >
> > Hi,
> > Even with policies one needs to implement that, so for every
> >
> > user who
> >
> > doesn't want a producer to create topics or have limits around
> >
> > partitions &
> >
> > replication factor they have to implement a policy. The KIP is changing
> > the behavior , it might not be introducing
> >
> > the
> >
> > new functionality but it will enable producers to override the create
> >
> > topic
> >
> > config settings on the broker. What I am asking for to provide a
> >
> > config
> >
> > that will disable auto creation of topics and if its enabled set some
> >
> > sane
> >
> > defaults so that clients can create a topic with in those limits. I
> >
> > don't
> >
> > see how this not related to this KIP.
> > If the server config options as I suggested doesn't interest
> >
> > you at
> >
> > least have a default CreateTopicPolices in place.
> > To give an example, In our environment we disable the
> > auto.create.topic.enable and force users to go through a centralized
> > service as we want capture more details about the topic creation and
> > requirements. With this KIP, a producer can create a topic with no
> >
> > bounds.
> >
> > Another example max.message.size we define that at cluster level
> >
> > and one
> >
> > can override max.messsage.size at topic level, any misconfiguration
> >
> > at
> >
> > this
> >
> > will cause service degradation. Its not always about the rogue
> >
> > clients,
> >
> > users can easily misconfigure and can cause an outage. Again we can talk
> > about CreateTopicPolicy but without having a
> >
> > default
> >
> > implementation and asking everyone to implement their own while
> >
> > changing
> >
> > the behavior in producer doesn't make sense to me.
> >
> > Thanks,
> > Harsha
> >
> > On Tue, Aug 06 , 2019 at 7:41 AM, Ismael Juma < ismael@ juma. me. uk (
> > ism...@juma.me.uk ) >
> >
> > wrote:
> >
> > Hi Harsha,
> >
> > I mentioned policies and the authorizer. For example, with
> > CreateTopicPolicy, you can implement the limits you describe. If
> >
> > you
> >
> > have
> >
> > ideas of how that should be improved, please submit a KIP. My
> >
> > point is
> >
> > that
> >
> > this KIP is not introducing any new functionality with regards to
> >
> > what
> >
> > rogue clients can do. It's using the existing protocol that is
> >
> > already
> >
> > exposed via the AdminClient. So, I don't think we need to address
> >
> > it in
> >
> > this KIP. Does that make sense?
> >
> > Ismael
> >
> > On Tue, Aug 6 , 2019 at 7:13 AM Harsha Chintalapani <
> >
> > kafka@ harsha. io ( ka...@harsha.io ) >
> >
> > wrote:
> >
> > Ismael,
> > Sure AdminClient can do that and we should've shipped a config or
> >
> > use
> >
> > the
> >
> > existing one to block that. Not all users are yet to upgrade to
> >
> > AdminClient
> >
> > and start using that to cause issues yet. In shared environment we
> >
> > should
> >
> > allow server to set sane defaults and not allow every client to go
> >
> > ahead
> >
> > create random no.of topic/partitions and replication factor. Even
> >
> > if
> >
> > the
> >
> > users want to allow topic creation proposed in the KIP , it makes
> >
> > sense to
> >
> > have some guards against the no.of partitions and replication
> >
> > factor.
> >
> > Authorizer is not always an answer to block requests and having
> >
> > users
> >
> > set
> >
> > server side configs to protect a multi-tenant environment is
> >
> > required.
> >
> > In a
> >
> > non-secure environment Authorizer is a blunt instrument either you
> >
> > end
> >
> > up
> &g

Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-26 Thread Justine Olshan

Hello all,
I've just added the proposed changes to the KIP page
https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
.
The PR has been updated as well. https://github.com/apache/kafka/pull/6997.

The idea is that there will just be a separate void method to change the
partition, and the partition method will be left alone.

Please take a look when you get a chance and let me know what you think.

Thank you,
Justine

On Fri, Jul 26, 2019 at 9:31 AM Justine Olshan  wrote:

> Hi Jun,
> I agree that it is confusing. I think there might be a way to not
> deprecate the partition method after all, and instead create a separate
> method to perform the necessary actions on new batches. I will try to
> update the KIP with the details as soon as I can.
>
> Thank you,
> Justine
>
> On Fri, Jul 26, 2019 at 9:28 AM Jun Rao  wrote:
>
>> Hi, Justine,
>>
>> Thanks for the KIP. It looks good overall. Just a followup comment.
>>
>> Should we mark Partitioner.partition() as deprecated? If someone tries to
>> implement a new Partitioner on the new interface. They will see both
>> partition() and computePartition(). It's not clear to them which one they
>> should be using and which one takes precedence.
>>
>> Jun
>>
>> On Fri, Jul 19, 2019 at 9:39 AM Justine Olshan 
>> wrote:
>>
>> > Thanks everyone for reviewing and voting!
>> >
>> > I'm marking this KIP as accepted.
>> > There were 4 binding votes from Colin, Gwen, David and Bill, and 3
>> > non-binding votes from Stanislav, M, and Mickael.
>> > There were no +0 or -1 votes.
>> >
>> > Thanks again,
>> > Justine
>> >
>> > On Fri, Jul 19, 2019 at 9:10 AM Bill Bejeck  wrote:
>> >
>> > > Thanks for the KIP, looks like a great addition.
>> > >
>> > > +1 (binding)
>> > >
>> > > -Bill
>> > >
>> > > On Fri, Jul 19, 2019 at 5:55 AM Mickael Maison <
>> mickael.mai...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > +1 (non binding)
>> > > > Thanks for the KIP!
>> > > >
>> > > > On Fri, Jul 19, 2019 at 2:23 AM David Arthur <
>> davidart...@apache.org>
>> > > > wrote:
>> > > > >
>> > > > > +1 binding, looks like a nice improvement. Thanks!
>> > > > >
>> > > > > -David
>> > > > >
>> > > > > On Wed, Jul 17, 2019 at 6:17 PM Justine Olshan <
>> jols...@confluent.io
>> > >
>> > > > wrote:
>> > > > >
>> > > > > > Hello all,
>> > > > > >
>> > > > > > I wanted to let you all know the KIP has been updated. The
>> > > > > > ComputedPartition class has been removed in favor of simply
>> > returning
>> > > > an
>> > > > > > integer to represent the record's partition.
>> > > > > > In short, the implications of this change mean that keyed
>> records
>> > > will
>> > > > also
>> > > > > > trigger a change in the sticky partition. This was done for a
>> case
>> > in
>> > > > which
>> > > > > > there may be keyed and non-keyed records.
>> > > > > > Upon testing, this did not significantly change the latency for
>> > > records
>> > > > > > with keyed values.
>> > > > > >
>> > > > > > Thank you,
>> > > > > > Justine
>> > > > > >
>> > > > > > On Sun, Jul 14, 2019 at 3:07 AM M. Manna 
>> > wrote:
>> > > > > >
>> > > > > > > +1(na)
>> > > > > > >
>> > > > > > > On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski <
>> > > > > > stanis...@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > +1 (non-binding)
>> > > > > > > >
>> > > > > > > > Thanks!
>> > > > > > > >
>> > > > > > > > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira <
>> > g...@confluent.io>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > +1 (binding)
>> > > > > > > > >
>> > > > > > > > > Thank you for the KIP. This was long awaited.
>> > > > > > > > >
>> > > > > > > > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan <
>> > > > jols...@confluent.io>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > Hello all,
>> > > > > > > > > >
>> > > > > > > > > > I'd like to start the vote for KIP-480 : Sticky
>> > Partitioner.
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > > > > > > > > >
>> > > > > > > > > > Thank you,
>> > > > > > > > > > Justine Olshan
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Gwen Shapira
>> > > > > > > > > Product Manager | Confluent
>> > > > > > > > > 650.450.2760 | @gwenshap
>> > > > > > > > > Follow us: Twitter | blog
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best,
>> > > > > > > > Stanislav
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-29 Thread Justine Olshan

Currently the way it is implemented, the broker auto-creation configuration
takes precedence. The producer will not use the CreateTopics request.
(Technically it can--but the topic will already be created through the
broker, so it will never try to create the topic.)
It is possible to change this however, and I'd be happy to discuss the
benefits of this alternative.

Thank you,
Justine

On Mon, Jul 29, 2019 at 10:26 AM Mickael Maison 
wrote:

> Hi Justine,
>
> Thanks for the KIP!
>
> In case auto creation is enabled on both the client and server, will
> the producer still use the AdminClient (CreateTopics request) to
> create topics? and not rely on the broker auto create.
> I'm guessing the answer is yes but can you make it explicit.
>
> Thank you
>
> On Wed, Jul 24, 2019 at 6:23 PM Justine Olshan 
> wrote:
> >
> > Hi,
> > Just a friendly reminder to take a look at this KIP if you have the time.
> >
> > I was thinking about broker vs. client default precedence, and I think it
> > makes sense to keep the broker as the default used when both client-side
> > and broker-side defaults are configured. The idea is that there would be
> > pretty clear documentation, and that many systems with configurations
> that
> > the client could not change would likely have the auto-create default
> off.
> > (In cloud for example).
> >
> > It also seems like in most cases, the consumer config
> > 'allow.auto.create.topics' was created to actually prevent the creation
> of
> > topics, so the loss of creation functionality will not be a big problem.
> >
> >  I'm happy to discuss any other compatibility problems or components of
> > this KIP.
> >
> > Thank you,
> > Justine
> >
> > On Wed, Jul 17, 2019 at 9:11 AM Justine Olshan 
> wrote:
> >
> > > Hello all,
> > >
> > > I was looking at this KIP again, and there is a decision I made that I
> > > think is worth discussing.
> > >
> > > In the case where both the broker and producer's
> > > 'auto.create.topics.enable' are set to true, we have to choose either
> the
> > > broker configs or the producer configs for the replication
> > > factor/partitions.
> > >
> > > Currently, the decision is to have the broker defaults take precedence.
> > > (It is easier to do this in the implementation.) It also makes some
> sense
> > > for this behavior to take precedence since this behavior already
> occurs as
> > > the default.
> > >
> > > However, I was wondering if it would be odd for those who can only see
> the
> > > producer side to set configs for replication factor/partitions and see
> > > different results. Currently the documentation for the config states
> that
> > > the config values are only used when the broker config is not enabled,
> but
> > > this might not always be clear to the user.  Changing the code to have
> the
> > > producer's configurations take precedence is possible, but I was
> wondering
> > > what everyone thought.
> > >
> > > Thank you,
> > > Justine
> > >
> > > On Fri, Jul 12, 2019 at 2:49 PM Justine Olshan 
> > > wrote:
> > >
> > >> Just a quick update--
> > >>
> > >> It seems that enabling both the broker and producer configs works
> fine,
> > >> except that the broker configurations for partitions, replication
> factor
> > >> take precedence.
> > >> I don't know if that is something we would want to change, but I'll be
> > >> updating the KIP for now to reflect this. Perhaps we would want to
> add more
> > >> to the documentation of the the producer configs to clarify.
> > >>
> > >> Thank you,
> > >> Justine
> > >>
> > >> On Fri, Jul 12, 2019 at 9:28 AM Justine Olshan 
> > >> wrote:
> > >>
> > >>> Hi Colin,
> > >>>
> > >>> Thanks for looking at the KIP. I can definitely add to the title to
> make
> > >>> it more clear.
> > >>>
> > >>> It makes sense that both configurations could be turned on since
> there
> > >>> are many cases where the user can not control the server-side
> > >>> configurations. I was a little concerned about how both interacting
> would
> > >>> work out -- if there would be to many requests for new topics, for
> example.
> > >>> But it since it does make sense to allow both configurations
> enabled, I
> > >>>

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-08-06 Thread Justine Olshan

e itself.
> > By reading the KIP its not clear to me that there is a clear way to block
> > auto creation topics of all together from clients by server side config.
> > Server side configs of default topic, partitions should take higher
> > precedence and client shouldn't be able to create a topic with higher
> >
> > no.of
> >
> > partitions, replication than what server config specifies.
> >
> > Thanks,
> > Harsha
> >
> > On Mon, Aug 05, 2019 at 5:24 PM, Justine Olshan 
> > wrote:
> >
> > Hi all,
> > I made some changes to the KIP. Hopefully this configuration change
> >
> > will
> >
> > make things a little clearer.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > KIP-487%3A+Client-side+Automatic+Topic+Creation+on+Producer
> >
> > Please let me know if you have any feedback or questions!
> >
> > Thank you,
> > Justine
> >
> > On Wed, Jul 31, 2019 at 1:44 PM Colin McCabe 
> >
> > wrote:
> >
> > Hi Mickael,
> >
> > I think you bring up a good point. It would be better if we didn't ever
> > have to set up client-side configuration for this feature, and KIP-464
> > would let us skip this entirely.
> >
> > On Wed, Jul 31, 2019, at 09:19, Justine Olshan wrote:
> >
> > Hi Mickael,
> > I agree that KIP-464 works on newer brokers, but I was a bit worried
> >
> > how
> >
> > things would play out on older brokers that* do not *have KIP 464
> >
> > included.
> >
> > Is it enough to throw an error in this case when producer configs are
> >
> > not
> >
> > specified?
> >
> > I think the right thing to do would be to log an error message in the
> > client. We will need to have that capability in any case, to cover
> > scenarios like the client trying to auto-create a topic that they don't
> > have permission to create. Or a client trying to create a topic on a
> >
> > broker
> >
> > so old that CreateTopicsRequest is not supported.
> >
> > The big downside to relying on KIP-464 is that it is a very recent
> >
> > feature
> >
> > -- so recent that it hasn't even made its way to any official Apache
> > release. It's scheduled for the upcoming 2.4 release in a few months.
> >
> > So if you view this KIP as a step towards removing broker-side
> > auto-create, you might want to support older brokers just to accelerate
> > adoption, and hasten the day when we can finally flip broker-side
> > auto-create to off (or even remove it entirely).
> >
> > I have to agree, though, that having client-side configurations for
> >
> > number
> >
> > of partitions and replication factor is messy. Maybe it would be worth
> >
> > it
> >
> > to restrict support to post-KIP-464 brokers, if we could avoid creating
> > more configs.
> >
> > best,
> > Colin
> >
> > On Wed, Jul 31, 2019 at 9:10 AM Mickael Maison <
> >
> > mickael.mai...@gmail.com
> >
> > wrote:
> >
> > Hi Justine,
> >
> > We can rely on KIP-464 which allows to omit the partition count or
> > replication factor when creating a topic. In that case, the broker
> >
> > defaults
> >
> > are used.
> >
> > On Wed, Jul 31, 2019 at 4:55 PM Justine Olshan 
> > wrote:
> >
> > Michael,
> > That makes sense to me!
> > To clarify, in the current state of the KIP, the producer does not
> >
> > rely
> >
> > on
> >
> > the broker to autocreate--if the broker's config is disabled, then
> >
> > the
> >
> > producer can autocreate on its own with a create topic request (the
> >
> > same
> >
> > type of request the admin client uses).
> > However, if both configs are enabled, the broker will autocreate
> >
> > through
> >
> > a
> >
> > metadata request before the producer gets a chance. Of course, the way
> >
> > to
> >
> > avoid this, is to do as you suggested, and set
> >
> > the
> >
> > "allow_auto_topic_creation" field to false.
> >
> > I think the only thing we need to be careful with in this setup is
> >
> > without
> >
> > KIP 464, we can not use broker defaults for this topic. A user needs
> >
> > to
> >
> > specify the number of partition and replication factor in the config.
> >
> > An
> >
> > alternative to this is to have coded default

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-08-06 Thread Justine Olshan

Hi Satish,

Thanks for looking at the KIP.

Yes, the producer will wait for the topic to be created before it can send
any messages to it.

I would like to clarify "overriding" broker behavior. If the client enables
client-side autocreation, the only difference will be that the topic
auto-creation will no longer occur in the metadata request and will instead
come from a CreateTopic request on the producer.
Partitions and replication factor will be determined by the broker configs.

Is this similar to what you were thinking? Please let me know if there is
something you think I missed.

Thank you,
Justine

On Tue, Aug 6, 2019 at 12:01 PM Satish Duggana 
wrote:

> Hi Justine,
> Thanks for the KIP. This is a nice addition to the producer client
> without running admin-client’s create topic APIs. Does producer wait
> for the topic to be created successfully before it tries to publish
> messages to that topic? I assume that this will not throw an error
> that the topic does not exist.
>
> As mentioned by others, overriding broker behavior by producer looks
> to be a concern. IMHO, broker should have a way to use either default
> constraints or configure custom constraints before these can be
> overridden by clients but not vice versa. There should be an option on
> brokers whether those constraints can be overridden by producers or
> not.
>
> Thanks,
> Satish.
>
> On Tue, Aug 6, 2019 at 11:39 PM Justine Olshan 
> wrote:
> >
> > Hi Harsha,
> >
> > After taking this all into consideration, I've updated the KIP to no
> longer
> > allow client-side configuration of replication factor and partitions.
> > Instead, the broker defaults will be used as long as the broker supports
> > KIP 464.
> > If the broker does not support this KIP, then the client can not create
> > topics on its own. (Behavior that exists now)
> >
> > I think this will help with your concerns. Please let me know if you
> > further feedback.
> >
> > Thank you,
> > Justine
> >
> > On Tue, Aug 6, 2019 at 10:49 AM Harsha Chintalapani 
> wrote:
> >
> > > Hi,
> > > Even with policies one needs to implement that, so for every user
> who
> > > doesn't want a producer to create topics or have limits around
> partitions &
> > > replication factor they have to implement a policy.
> > >   The KIP is changing the behavior , it might not be introducing
> the
> > > new functionality but it will enable producers to override the create
> topic
> > > config settings on the broker. What I am asking for to provide a config
> > > that will disable auto creation of topics and if its enabled set some
> sane
> > > defaults so that clients can create a topic with in those limits. I
> don't
> > > see how this not related to this KIP.
> > >  If the server config options as I suggested doesn't interest you
> at
> > > least have a default CreateTopicPolices in place.
> > >To give an example, In our environment we disable the
> > > auto.create.topic.enable and force users to go through a centralized
> > > service as we want capture more details about the topic creation and
> > > requirements. With this KIP, a producer can create a topic with no
> bounds.
> > >  Another example max.message.size we define that at cluster level and
> one
> > > can override max.messsage.size at topic level, any misconfiguration at
> this
> > > will cause service degradation.  Its not always about the rogue
> clients,
> > > users can easily misconfigure and can cause an outage.
> > > Again we can talk about CreateTopicPolicy but without having a default
> > > implementation and asking everyone to implement their own while
> changing
> > > the behavior in producer  doesn't make sense to me.
> > >
> > > Thanks,
> > > Harsha
> > >
> > >
> > > On Tue, Aug 06, 2019 at 7:41 AM, Ismael Juma 
> wrote:
> > >
> > > > Hi Harsha,
> > > >
> > > > I mentioned policies and the authorizer. For example, with
> > > > CreateTopicPolicy, you can implement the limits you describe. If you
> have
> > > > ideas of how that should be improved, please submit a KIP. My point
> is
> > > that
> > > > this KIP is not introducing any new functionality with regards to
> what
> > > > rogue clients can do. It's using the existing protocol that is
> already
> > > > exposed via the AdminClient. So, I don't think we need to address it
> in
> > > > this KIP. Does that make sense?
> >

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-08-05 Thread Justine Olshan

Hi all,
I made some changes to the KIP. Hopefully this configuration change will
make things a little clearer.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-487%3A+Client-side+Automatic+Topic+Creation+on+Producer

Please let me know if you have any feedback or questions!

Thank you,
Justine

On Wed, Jul 31, 2019 at 1:44 PM Colin McCabe  wrote:

> Hi Mickael,
>
> I think you bring up a good point.  It would be better if we didn't ever
> have to set up client-side configuration for this feature, and KIP-464
> would let us skip this entirely.
>
> On Wed, Jul 31, 2019, at 09:19, Justine Olshan wrote:
> > Hi Mickael,
> > I agree that KIP-464 works on newer brokers, but I was a bit worried how
> > things would play out on older brokers that* do not *have KIP 464
> included.
> > Is it enough to throw an error in this case when producer configs are not
> > specified?
>
> I think the right thing to do would be to log an error message in the
> client.  We will need to have that capability in any case, to cover
> scenarios like the client trying to auto-create a topic that they don't
> have permission to create.  Or a client trying to create a topic on a
> broker so old that CreateTopicsRequest is not supported.
>
> The big downside to relying on KIP-464 is that it is a very recent feature
> -- so recent that it hasn't even made its way to any official Apache
> release.  It's scheduled for the upcoming 2.4 release in a few months.
>
> So if you view this KIP as a step towards removing broker-side
> auto-create, you might want to support older brokers just to accelerate
> adoption, and hasten the day when we can finally flip broker-side
> auto-create to off (or even remove it entirely).
>
> I have to agree, though, that having client-side configurations for number
> of partitions and replication factor is messy.  Maybe it would be worth it
> to restrict support to post-KIP-464 brokers, if we could avoid creating
> more configs.
>
> best,
> Colin
>
>
> > On Wed, Jul 31, 2019 at 9:10 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Justine,
> > >
> > > We can rely on KIP-464 which allows to omit the partition count or
> > > replication factor when creating a topic. In that case, the broker
> > > defaults are used.
> > >
> > > On Wed, Jul 31, 2019 at 4:55 PM Justine Olshan 
> > > wrote:
> > > >
> > > > Michael,
> > > > That makes sense to me!
> > > > To clarify, in the current state of the KIP, the producer does not
> rely
> > > on
> > > > the broker to autocreate--if the broker's config is disabled, then
> the
> > > > producer can autocreate on its own with a create topic request (the
> same
> > > > type of request the admin client uses).
> > > > However, if both configs are enabled, the broker will autocreate
> through
> > > a
> > > > metadata request before the producer gets a chance.
> > > > Of course, the way to avoid this, is to do as you suggested, and set
> the
> > > > "allow_auto_topic_creation" field to false.
> > > >
> > > > I think the only thing we need to be careful with in this setup is
> > > without
> > > > KIP 464, we can not use broker defaults for this topic. A user needs
> to
> > > > specify the number of partition and replication factor in the config.
> > > > An alternative to this is to have coded defaults for when these
> configs
> > > are
> > > > unspecified, but it is not immediately apparent what these defaults
> > > should
> > > > be.
> > > >
> > > > Thanks again for reading my KIP,
> > > > Justine
> > > >
> > > > On Wed, Jul 31, 2019 at 4:19 AM Mickael Maison <
> mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the response!
> > > > > In my opinion, it would be better if the producer did not rely at
> all
> > > > > on the broker auto create feature as this is what we're aiming to
> > > > > deprecate. When requesting metadata we can set the
> > > > > "allow_auto_topic_creation" field to false to avoid the broker auto
> > > > > creation. Then if the topic is not existing, send a
> > > > > CreateTopicRequest.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > On Mon, Jul 29, 2019 at 6:34 PM Justine Olshan <
>

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-17 Thread Justine Olshan

Hello all,

I was looking at this KIP again, and there is a decision I made that I
think is worth discussing.

In the case where both the broker and producer's
'auto.create.topics.enable' are set to true, we have to choose either the
broker configs or the producer configs for the replication
factor/partitions.

Currently, the decision is to have the broker defaults take precedence. (It
is easier to do this in the implementation.) It also makes some sense for
this behavior to take precedence since this behavior already occurs as the
default.

However, I was wondering if it would be odd for those who can only see the
producer side to set configs for replication factor/partitions and see
different results. Currently the documentation for the config states that
the config values are only used when the broker config is not enabled, but
this might not always be clear to the user.  Changing the code to have the
producer's configurations take precedence is possible, but I was wondering
what everyone thought.

Thank you,
Justine

On Fri, Jul 12, 2019 at 2:49 PM Justine Olshan  wrote:

> Just a quick update--
>
> It seems that enabling both the broker and producer configs works fine,
> except that the broker configurations for partitions, replication factor
> take precedence.
> I don't know if that is something we would want to change, but I'll be
> updating the KIP for now to reflect this. Perhaps we would want to add more
> to the documentation of the the producer configs to clarify.
>
> Thank you,
> Justine
>
> On Fri, Jul 12, 2019 at 9:28 AM Justine Olshan 
> wrote:
>
>> Hi Colin,
>>
>> Thanks for looking at the KIP. I can definitely add to the title to make
>> it more clear.
>>
>> It makes sense that both configurations could be turned on since there
>> are many cases where the user can not control the server-side
>> configurations. I was a little concerned about how both interacting would
>> work out -- if there would be to many requests for new topics, for example.
>> But it since it does make sense to allow both configurations enabled, I
>> will test out some scenarios and I'll change the KIP to support this.
>>
>> I also agree with documentation about distinguishing the differences. I
>> was having some trouble with the wording but I like the phrases
>> "server-side" and "client-side." That's a good distinction I can use when
>> describing.
>>
>> I'll try to update the KIP soon keeping everyone's input in mind.
>>
>> Thanks,
>> Justine
>>
>> On Thu, Jul 11, 2019 at 5:39 PM Colin McCabe  wrote:
>>
>>> Hi Justine,
>>>
>>> Thanks for the KIP.  This seems like a good step towards removing
>>> server-side topic auto-creation.
>>>
>>> We should add included "client-side" to the title of the KIP somewhere,
>>> to make it clear that we're talking about client-side auto creation.
>>>
>>> The KIP says:
>>> > In order to automatically create topics with the producer, the
>>> producer's
>>> > auto.create.topics.enable config must be set to true and the broker
>>> config should be set to false
>>>
>>> From a user's point of view, this seems counter-intuitive.  In order to
>>> auto-create topics the broker's auto.create.topics.enable config should be
>>> set to false?  It seems like the server-side auto-create is unrelated to
>>> the client-side auto-create.  We could have both turned on (and I'm sure
>>> that in the real world, people will try this configuration...)  There's no
>>> reason not to support this, I think.
>>>
>>> We should add some documentation explaining the difference between
>>> server-side and client-side auto-creation.  Without documentation, an admin
>>> might think that they had disabled all forms of auto-creation by setting
>>> the -side setting to false-- but this is not the case, of course.
>>>
>>> best,
>>> Colin
>>>
>>>
>>> On Thu, Jul 11, 2019, at 16:22, Justine Olshan wrote:
>>> > Hi Dhruvil,
>>> >
>>> > Thanks for reading the KIP!
>>> > That was the general idea for deprecation. We would log a warning when
>>> the
>>> > config is enabled on the broker.
>>> > I also believe that there would be a change to documentation.
>>> > If there is anything else that should be done, please let me know!
>>> >
>>> > Justine
>>> >
>>> > On Thu, Jul 11, 2019 at 4:17 PM Dhruvil Shah 
>>> wrote:
>>> >
>>> > > Hi Justine,
>

Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-17 Thread Justine Olshan

Hello all,

I wanted to let you all know the KIP has been updated. The
ComputedPartition class has been removed in favor of simply returning an
integer to represent the record's partition.
In short, the implications of this change mean that keyed records will also
trigger a change in the sticky partition. This was done for a case in which
there may be keyed and non-keyed records.
Upon testing, this did not significantly change the latency for records
with keyed values.

Thank you,
Justine

On Sun, Jul 14, 2019 at 3:07 AM M. Manna  wrote:

> +1(na)
>
> On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski 
> wrote:
>
> > +1 (non-binding)
> >
> > Thanks!
> >
> > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira  wrote:
> >
> > > +1 (binding)
> > >
> > > Thank you for the KIP. This was long awaited.
> > >
> > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan 
> > > wrote:
> > > >
> > > > Hello all,
> > > >
> > > > I'd like to start the vote for KIP-480 : Sticky Partitioner.
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>

Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-19 Thread Justine Olshan

Thanks everyone for reviewing and voting!

I'm marking this KIP as accepted.
There were 4 binding votes from Colin, Gwen, David and Bill, and 3
non-binding votes from Stanislav, M, and Mickael.
There were no +0 or -1 votes.

Thanks again,
Justine

On Fri, Jul 19, 2019 at 9:10 AM Bill Bejeck  wrote:

> Thanks for the KIP, looks like a great addition.
>
> +1 (binding)
>
> -Bill
>
> On Fri, Jul 19, 2019 at 5:55 AM Mickael Maison 
> wrote:
>
> > +1 (non binding)
> > Thanks for the KIP!
> >
> > On Fri, Jul 19, 2019 at 2:23 AM David Arthur 
> > wrote:
> > >
> > > +1 binding, looks like a nice improvement. Thanks!
> > >
> > > -David
> > >
> > > On Wed, Jul 17, 2019 at 6:17 PM Justine Olshan 
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I wanted to let you all know the KIP has been updated. The
> > > > ComputedPartition class has been removed in favor of simply returning
> > an
> > > > integer to represent the record's partition.
> > > > In short, the implications of this change mean that keyed records
> will
> > also
> > > > trigger a change in the sticky partition. This was done for a case in
> > which
> > > > there may be keyed and non-keyed records.
> > > > Upon testing, this did not significantly change the latency for
> records
> > > > with keyed values.
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Sun, Jul 14, 2019 at 3:07 AM M. Manna  wrote:
> > > >
> > > > > +1(na)
> > > > >
> > > > > On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira 
> > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thank you for the KIP. This was long awaited.
> > > > > > >
> > > > > > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan <
> > jols...@confluent.io>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > I'd like to start the vote for KIP-480 : Sticky Partitioner.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Justine Olshan
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Gwen Shapira
> > > > > > > Product Manager | Confluent
> > > > > > > 650.450.2760 | @gwenshap
> > > > > > > Follow us: Twitter | blog
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-24 Thread Justine Olshan

Hi,
Just a friendly reminder to take a look at this KIP if you have the time.

I was thinking about broker vs. client default precedence, and I think it
makes sense to keep the broker as the default used when both client-side
and broker-side defaults are configured. The idea is that there would be
pretty clear documentation, and that many systems with configurations that
the client could not change would likely have the auto-create default off.
(In cloud for example).

It also seems like in most cases, the consumer config
'allow.auto.create.topics' was created to actually prevent the creation of
topics, so the loss of creation functionality will not be a big problem.

 I'm happy to discuss any other compatibility problems or components of
this KIP.

Thank you,
Justine

On Wed, Jul 17, 2019 at 9:11 AM Justine Olshan  wrote:

> Hello all,
>
> I was looking at this KIP again, and there is a decision I made that I
> think is worth discussing.
>
> In the case where both the broker and producer's
> 'auto.create.topics.enable' are set to true, we have to choose either the
> broker configs or the producer configs for the replication
> factor/partitions.
>
> Currently, the decision is to have the broker defaults take precedence.
> (It is easier to do this in the implementation.) It also makes some sense
> for this behavior to take precedence since this behavior already occurs as
> the default.
>
> However, I was wondering if it would be odd for those who can only see the
> producer side to set configs for replication factor/partitions and see
> different results. Currently the documentation for the config states that
> the config values are only used when the broker config is not enabled, but
> this might not always be clear to the user.  Changing the code to have the
> producer's configurations take precedence is possible, but I was wondering
> what everyone thought.
>
> Thank you,
> Justine
>
> On Fri, Jul 12, 2019 at 2:49 PM Justine Olshan 
> wrote:
>
>> Just a quick update--
>>
>> It seems that enabling both the broker and producer configs works fine,
>> except that the broker configurations for partitions, replication factor
>> take precedence.
>> I don't know if that is something we would want to change, but I'll be
>> updating the KIP for now to reflect this. Perhaps we would want to add more
>> to the documentation of the the producer configs to clarify.
>>
>> Thank you,
>> Justine
>>
>> On Fri, Jul 12, 2019 at 9:28 AM Justine Olshan 
>> wrote:
>>
>>> Hi Colin,
>>>
>>> Thanks for looking at the KIP. I can definitely add to the title to make
>>> it more clear.
>>>
>>> It makes sense that both configurations could be turned on since there
>>> are many cases where the user can not control the server-side
>>> configurations. I was a little concerned about how both interacting would
>>> work out -- if there would be to many requests for new topics, for example.
>>> But it since it does make sense to allow both configurations enabled, I
>>> will test out some scenarios and I'll change the KIP to support this.
>>>
>>> I also agree with documentation about distinguishing the differences. I
>>> was having some trouble with the wording but I like the phrases
>>> "server-side" and "client-side." That's a good distinction I can use when
>>> describing.
>>>
>>> I'll try to update the KIP soon keeping everyone's input in mind.
>>>
>>> Thanks,
>>> Justine
>>>
>>> On Thu, Jul 11, 2019 at 5:39 PM Colin McCabe  wrote:
>>>
>>>> Hi Justine,
>>>>
>>>> Thanks for the KIP.  This seems like a good step towards removing
>>>> server-side topic auto-creation.
>>>>
>>>> We should add included "client-side" to the title of the KIP somewhere,
>>>> to make it clear that we're talking about client-side auto creation.
>>>>
>>>> The KIP says:
>>>> > In order to automatically create topics with the producer, the
>>>> producer's
>>>> > auto.create.topics.enable config must be set to true and the broker
>>>> config should be set to false
>>>>
>>>> From a user's point of view, this seems counter-intuitive.  In order to
>>>> auto-create topics the broker's auto.create.topics.enable config should be
>>>> set to false?  It seems like the server-side auto-create is unrelated to
>>>> the client-side auto-create.  We could have both turned on (and I'm sure
>>>> that in the real world, people will

Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-26 Thread Justine Olshan

Hi Jun,
I agree that it is confusing. I think there might be a way to not deprecate
the partition method after all, and instead create a separate method to
perform the necessary actions on new batches. I will try to update the KIP
with the details as soon as I can.

Thank you,
Justine

On Fri, Jul 26, 2019 at 9:28 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the KIP. It looks good overall. Just a followup comment.
>
> Should we mark Partitioner.partition() as deprecated? If someone tries to
> implement a new Partitioner on the new interface. They will see both
> partition() and computePartition(). It's not clear to them which one they
> should be using and which one takes precedence.
>
> Jun
>
> On Fri, Jul 19, 2019 at 9:39 AM Justine Olshan 
> wrote:
>
> > Thanks everyone for reviewing and voting!
> >
> > I'm marking this KIP as accepted.
> > There were 4 binding votes from Colin, Gwen, David and Bill, and 3
> > non-binding votes from Stanislav, M, and Mickael.
> > There were no +0 or -1 votes.
> >
> > Thanks again,
> > Justine
> >
> > On Fri, Jul 19, 2019 at 9:10 AM Bill Bejeck  wrote:
> >
> > > Thanks for the KIP, looks like a great addition.
> > >
> > > +1 (binding)
> > >
> > > -Bill
> > >
> > > On Fri, Jul 19, 2019 at 5:55 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (non binding)
> > > > Thanks for the KIP!
> > > >
> > > > On Fri, Jul 19, 2019 at 2:23 AM David Arthur  >
> > > > wrote:
> > > > >
> > > > > +1 binding, looks like a nice improvement. Thanks!
> > > > >
> > > > > -David
> > > > >
> > > > > On Wed, Jul 17, 2019 at 6:17 PM Justine Olshan <
> jols...@confluent.io
> > >
> > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I wanted to let you all know the KIP has been updated. The
> > > > > > ComputedPartition class has been removed in favor of simply
> > returning
> > > > an
> > > > > > integer to represent the record's partition.
> > > > > > In short, the implications of this change mean that keyed records
> > > will
> > > > also
> > > > > > trigger a change in the sticky partition. This was done for a
> case
> > in
> > > > which
> > > > > > there may be keyed and non-keyed records.
> > > > > > Upon testing, this did not significantly change the latency for
> > > records
> > > > > > with keyed values.
> > > > > >
> > > > > > Thank you,
> > > > > > Justine
> > > > > >
> > > > > > On Sun, Jul 14, 2019 at 3:07 AM M. Manna 
> > wrote:
> > > > > >
> > > > > > > +1(na)
> > > > > > >
> > > > > > > On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski <
> > > > > > stanis...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira <
> > g...@confluent.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > Thank you for the KIP. This was long awaited.
> > > > > > > > >
> > > > > > > > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan <
> > > > jols...@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hello all,
> > > > > > > > > >
> > > > > > > > > > I'd like to start the vote for KIP-480 : Sticky
> > Partitioner.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > > > >
> > > > > > > > > > Thank you,
> > > > > > > > > > Justine Olshan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Gwen Shapira
> > > > > > > > > Product Manager | Confluent
> > > > > > > > > 650.450.2760 | @gwenshap
> > > > > > > > > Follow us: Twitter | blog
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

[VOTE] KIP-480 : Sticky Partitioner

2019-07-09 Thread Justine Olshan

Hello all,

I'd like to start the vote for KIP-480 : Sticky Partitioner.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner

Thank you,
Justine Olshan

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-11 Thread Justine Olshan

Hello all,
Thanks for continuing the discussion! I have a few responses to your points.

Tom: You are correct in that this KIP has not mentioned the
DeleteTopicsRequest. I think that this would be out of scope for now, but
may be something worth adding in the future.

John: We did consider sequence ids, but there are a few reasons to favor
UUIDs. There are several cases where topics from different clusters may
interact now and in the future. For example, Mirror Maker 2 may benefit
from being able to detect when a cluster being mirrored is deleted and
recreated and globally unique identifiers would make resolving issues
easier than sequence IDs which may collide between clusters. KIP-405
(tiered storage) will also benefit from globally unique IDs as shared
buckets may be used between clusters.

Globally unique IDs would also make functionality like moving topics
between disparate clusters easier in the future, simplify any future
implementations of backups and restores, and more. In general, unique IDs
would ensure that the source cluster topics do not conflict with the
destination cluster topics.

If we were to use sequence ids, we would need sufficiently large cluster
ids to be stored with the topic identifiers or we run the risk of
collisions. This will give up any advantage in compactness that sequence
numbers may bring. Given these advantages I think it makes sense to use
UUIDs.

Gokul: This is an interesting idea, but this is a breaking change. Out of
scope for now, but maybe worth discussing in the future.

Hope this explains some of the decisions,

Justine

On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> Hi.
>
> Thanks for the KIP.
>
> Have you thought about whether it makes sense to support authorizing a
> principal for a topic ID rather than a topic name to achieve tighter
> security?
>
> Or is the topic ID fundamentally an internal detail similar to epochs used
> in a bunch of other places in Kafka?
>
> Thanks.
>
> On Fri, Sep 11, 2020 at 4:06 PM John Roesler  wrote:
>
> > Hello Justine,
> >
> > Thanks for the KIP!
> >
> > I happen to have been confronted recently with the need to keep track of
> a
> > large number of topics as compactly as possible. I was going to come up
> > with some way to dictionary encode the topic names as integers, but this
> > seems much better!
> >
> > Apologies if this has been raised before, but I’m wondering about the
> > choice of UUID vs sequence number for the ids. Typically, I’ve seen UUIDs
> > in two situations:
> > 1. When processes need to generate non-colliding identifiers without
> > coordination.
> > 2. When the identifier needs to be “universally unique”; I.e., the
> > identifier needs to distinguish the entity from all other entities that
> > could ever exist. This is useful in cases where entities from all kinds
> of
> > systems get mixed together, such as when dumping logs from all processes
> in
> > a company into a common system.
> >
> > Maybe I’m being short-sighted, but it doesn’t seem like either really
> > applies here. It seems like the brokers could and would achieve consensus
> > when creating a topic anyway, which is all that’s required to generate
> > non-colliding sequence ids. For the second, as you mention, we could
> assign
> > a UUID to the cluster as a whole, which would render any resource scoped
> to
> > the broker universally unique as well.
> >
> > The reason I mention this is that, although a UUID is way more compact
> > than topic names, it’s still 16 bytes. In contrast, a 4-byte integer
> > sequence id would give us 4 billion unique topics per cluster, which
> seems
> > like enough ;)
> >
> > Considering the number of different times these topic identifiers are
> sent
> > over the wire or stored in memory, it seems like it might be worth the
> > additional 4x space savings.
> >
> > What do you think about this?
> >
> > Thanks,
> > John
> >
> > On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote:
> > > Hi Justine,
> > >
> > > This looks like a very welcome improvement. Thanks!
> > >
> > > Maybe I missed it, but the KIP doesn't seem to mention changing
> > > DeleteTopicsRequest to identify the topic using an id. Maybe that's out
> > of
> > > scope, but DeleteTopicsRequest is not listed among the Future Work APIs
> > > either.
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thank

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-09 Thread Justine Olshan

Hello all, it's been almost a year! I've made some changes to this KIP and hope 
to continue the discussion. 

One of the main changes I've added is now the metadata response will include 
the topic ID (as Colin suggested). Clients can obtain the topicID of a given 
topic through a TopicDescription. The topicId will also be included with the 
UpdateMetadata request. 

Let me know what you all think.
Thank you,
Justine

On 2019/09/13 16:38:26, "Colin McCabe"  wrote: 
> Hi Lucas,
> 
> Thanks for tackling this.  Topic IDs are a great idea, and this is a really 
> good writeup.
> 
> For /brokers/topics/[topic], the schema version should be bumped to version 
> 3, rather than 2.  KIP-455 bumped the version of this znode to 2 already :)
> 
> Given that we're going to be seeing these things as strings as lot (in logs, 
> in ZooKeeper, on the command-line, etc.), does it make sense to use base64 
> when converting them to strings?
> 
> Here is an example of the hex representation:
> 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
> 
> And here is an example in base64.
> b8tRS7h4TJ2Vt43Dp85v2A
> 
> The base64 version saves 15 letters (to be fair, 4 of those were dashes that 
> we could have elided in the hex representation.)
> 
> Another thing to consider is that we should specify that the all-zeroes UUID 
> is not a valid topic UUID.   We can't use null for this because we can't pass 
> a null UUID over the RPC protocol (there is no special pattern for null, nor 
> do we want to waste space reserving such a pattern.)
> 
> Maybe I missed it, but did you describe "migration of... existing topic[s] 
> without topic IDs" in detail in any section?  It seems like when the new 
> controller becomes active, it should just generate random UUIDs for these, 
> and write the random UUIDs back to ZooKeeper.  It would be good to spell that 
> out.  We should make it clear that this happens regardless of the 
> inter-broker protocol version (it's a compatible change).
> 
> "LeaderAndIsrRequests including an is_every_partition flag" seems a bit 
> wordy.  Can we just call these "full LeaderAndIsrRequests"?  Then the RPC 
> field could be named "full".  Also, it would probably be better for the RPC 
> field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL }, so that we can 
> cleanly handle old versions (by treating them as UNSPECIFIED)
> 
> In the LeaderAndIsrRequest section, you write "A final deletion event will be 
> secheduled for X ms after the LeaderAndIsrRequest was first received..."  I 
> guess the X was a placeholder that you intended to replace before posting? :) 
>  In any case, this seems like the kind of thing we'd want a configuration 
> for.  Let's describe that configuration key somewhere in this KIP, including 
> what its default value is.
> 
> We should probably also log a bunch of messages at WARN level when something 
> is scheduled for deletion, as well.  (Maybe this was assumed, but it would be 
> good to mention it).
> 
> I feel like there are a few sections that should be moved to "rejected 
> alternatives."  For example, in the DeleteTopics section, since we're not 
> going to do option 1 or 2, these should be moved into "rejected 
> alternatives,"  rather than appearing inline.  Another case is the "Should we 
> remove topic name from the protocol where possible" section.  This is clearly 
> discussing a design alternative that we're not proposing to implement: 
> removing the topic name from those protocols.
> 
> Is it really necessary to have a new /admin/delete_topics_by_id path in 
> ZooKeeper?  It seems like we don't really need this.  Whenever there is a new 
> controller, we'll send out full LeaderAndIsrRequests which will trigger the 
> stale topics to be cleaned up.   The active controller will also send the 
> full LeaderAndIsrRequest to brokers that are just starting up.So we don't 
> really need this kind of two-phase commit (send out StopReplicasRequest, get 
> ACKs from all nodes, commit by removing /admin/delete_topics node) any more.
> 
> You mention that FetchRequest will now include UUID to avoid issues where 
> requests are made to stale partitions.  However, adding a UUID to 
> MetadataRequest is listed as future work, out of scope for this KIP.  How 
> will the client learn what the topic UUID is, if the metadata response 
> doesn't include that information?  It seems like adding the UUID to 
> MetadataResponse would be an improvement here that might not be too hard to 
> make.
> 
> best,
> Colin
> 
> 
> On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> > Lucas, this would be great. I've run into issues with topics being
> > resurrected accidentally, since a client cannot easily distinguish between
> > a deleted topic and a new topic with the same name. I'd need the ID
> > accessible from the client to solve that issue, but this is a good first
> > step.
> > 
> > Ryanne
> > 
> > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet  wrote:
> > 
> > > Hi all,
> > >
> > > I would like to kick

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-15 Thread Justine Olshan

Hi all,
Jun brought up a good point in his last email about tagged fields, and I've
updated the KIP to reflect that the changes to requests and responses will
be in the form of tagged fields to avoid changing IBP.

Jun: I plan on sending a followup email to address some of the other points.

Thanks,
Justine

On Mon, Sep 14, 2020 at 4:25 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the updated KIP. A few comments below.
>
> 10. LeaderAndIsr Response: Do we need the topic name?
>
> 11. For the changed request/response, other than LeaderAndIsr,
> UpdateMetadata, Metadata, do we need to include the topic name?
>
> 12. It seems that upgrades don't require IBP. Does that mean the new fields
> in all the request/response are added as tagged fields without bumping up
> the request version? It would be useful to make that clear.
>
> 13. Partition Metadata file: Do we need to include the topic name and the
> partition id since they are implied in the directory name?
>
> 14. In the JBOD mode, we support moving a partition's data from one disk to
> another. Will the new partition metadata file be copied during that
> process?
>
> 15. The KIP says "Remove deleted topics from replicas by sending
> StopReplicaRequest V2 for any topics which do not contain a topic ID, and
> V3 for any topics which do contain a topic ID.". However, it seems the
> updated controller will create all missing topic IDs first before doing
> other actions. So, is StopReplicaRequest V2 needed?
>
> Jun
>
> On Fri, Sep 11, 2020 at 10:31 AM John Roesler  wrote:
>
> > Thanks, Justine!
> >
> > Your response seems compelling to me.
> >
> > -John
> >
> > On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan wrote:
> > > Hello all,
> > > Thanks for continuing the discussion! I have a few responses to your
> > points.
> > >
> > > Tom: You are correct in that this KIP has not mentioned the
> > > DeleteTopicsRequest. I think that this would be out of scope for now,
> but
> > > may be something worth adding in the future.
> > >
> > > John: We did consider sequence ids, but there are a few reasons to
> favor
> > > UUIDs. There are several cases where topics from different clusters may
> > > interact now and in the future. For example, Mirror Maker 2 may benefit
> > > from being able to detect when a cluster being mirrored is deleted and
> > > recreated and globally unique identifiers would make resolving issues
> > > easier than sequence IDs which may collide between clusters. KIP-405
> > > (tiered storage) will also benefit from globally unique IDs as shared
> > > buckets may be used between clusters.
> > >
> > > Globally unique IDs would also make functionality like moving topics
> > > between disparate clusters easier in the future, simplify any future
> > > implementations of backups and restores, and more. In general, unique
> IDs
> > > would ensure that the source cluster topics do not conflict with the
> > > destination cluster topics.
> > >
> > > If we were to use sequence ids, we would need sufficiently large
> cluster
> > > ids to be stored with the topic identifiers or we run the risk of
> > > collisions. This will give up any advantage in compactness that
> sequence
> > > numbers may bring. Given these advantages I think it makes sense to use
> > > UUIDs.
> > >
> > > Gokul: This is an interesting idea, but this is a breaking change. Out
> of
> > > scope for now, but maybe worth discussing in the future.
> > >
> > > Hope this explains some of the decisions,
> > >
> > > Justine
> > >
> > >
> > >
> > > On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan Subramanian <
> > > gokul24...@gmail.com> wrote:
> > >
> > > > Hi.
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > Have you thought about whether it makes sense to support authorizing
> a
> > > > principal for a topic ID rather than a topic name to achieve tighter
> > > > security?
> > > >
> > > > Or is the topic ID fundamentally an internal detail similar to epochs
> > used
> > > > in a bunch of other places in Kafka?
> > > >
> > > > Thanks.
> > > >
> > > > On Fri, Sep 11, 2020 at 4:06 PM John Roesler 
> > wrote:
> > > >
> > > > > Hello Justine,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > I happen

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-15 Thread Justine Olshan

Hello again,
To follow up on some of the other comments:

10/11) We can remove the topic name from these requests/responses, and that
means we will just have to make a few internal changes to make partitions
accessible by topic id and partition. I can update the KIP to remove them
unless anyone thinks they should stay.

12) Addressed in the previous email. I've updated the KIP to include tagged
fields for the requests and responses. (More on that below)

13) I think part of the idea for including this information is to prepare
for future changes. Perhaps the directory structure might change from
topicName_partitionNumber to something like topicID_partitionNumber. Then
it would be useful to have the topic name in the file since it would not be
in the directory structure. Supporting topic renames might be easier if the
other fields are included. Would there be any downsides to including this
information?

14)  Yes, we would need to copy the partition metadata file in this
process. I've updated the KIP to include this.

15) I believe Lucas meant v1 and v2 here. He was referring to how the
requests would fall under different IBP and meant that older brokers would
have to use the older version of the request and the existing topic
deletion process. At first, it seemed like tagged fields would resolve
the IBP issue. However, we may need IBP for this request after all since
the controller handles the topic deletion differently depending on the IBP
version. In an older version, we can't just send a StopReplica delete the
topic immediately like we'd want to for this KIP.

This makes me wonder if we want tagged fields on all the requests after
all. Let me know your thoughts!

Justine

On Tue, Sep 15, 2020 at 1:03 PM Justine Olshan  wrote:

> Hi all,
> Jun brought up a good point in his last email about tagged fields, and
> I've updated the KIP to reflect that the changes to requests and responses
> will be in the form of tagged fields to avoid changing IBP.
>
> Jun: I plan on sending a followup email to address some of the other
> points.
>
> Thanks,
> Justine
>
> On Mon, Sep 14, 2020 at 4:25 PM Jun Rao  wrote:
>
>> Hi, Justine,
>>
>> Thanks for the updated KIP. A few comments below.
>>
>> 10. LeaderAndIsr Response: Do we need the topic name?
>>
>> 11. For the changed request/response, other than LeaderAndIsr,
>> UpdateMetadata, Metadata, do we need to include the topic name?
>>
>> 12. It seems that upgrades don't require IBP. Does that mean the new
>> fields
>> in all the request/response are added as tagged fields without bumping up
>> the request version? It would be useful to make that clear.
>>
>> 13. Partition Metadata file: Do we need to include the topic name and the
>> partition id since they are implied in the directory name?
>>
>> 14. In the JBOD mode, we support moving a partition's data from one disk
>> to
>> another. Will the new partition metadata file be copied during that
>> process?
>>
>> 15. The KIP says "Remove deleted topics from replicas by sending
>> StopReplicaRequest V2 for any topics which do not contain a topic ID, and
>> V3 for any topics which do contain a topic ID.". However, it seems the
>> updated controller will create all missing topic IDs first before doing
>> other actions. So, is StopReplicaRequest V2 needed?
>>
>> Jun
>>
>> On Fri, Sep 11, 2020 at 10:31 AM John Roesler 
>> wrote:
>>
>> > Thanks, Justine!
>> >
>> > Your response seems compelling to me.
>> >
>> > -John
>> >
>> > On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan wrote:
>> > > Hello all,
>> > > Thanks for continuing the discussion! I have a few responses to your
>> > points.
>> > >
>> > > Tom: You are correct in that this KIP has not mentioned the
>> > > DeleteTopicsRequest. I think that this would be out of scope for now,
>> but
>> > > may be something worth adding in the future.
>> > >
>> > > John: We did consider sequence ids, but there are a few reasons to
>> favor
>> > > UUIDs. There are several cases where topics from different clusters
>> may
>> > > interact now and in the future. For example, Mirror Maker 2 may
>> benefit
>> > > from being able to detect when a cluster being mirrored is deleted and
>> > > recreated and globally unique identifiers would make resolving issues
>> > > easier than sequence IDs which may collide between clusters. KIP-405
>> > > (tiered storage) will also benefit from globally unique IDs as shared
>> > > buckets may be used between clusters.
>> &g

Re: [VOTE] KIP-516: Topic Identifiers

2020-10-12 Thread Justine Olshan

Hi all,

After further discussion and changes to this KIP, I think we are ready to
restart this vote.

Again, here is the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers

The discussion thread is here:
https://lists.apache.org/thread.html/7efa8cd169cadc7dc9cf86a7c0dbbab1836ddb5024d310fcebacf80c@%3Cdev.kafka.apache.org%3E

Please take a look and vote if you have a chance.

Thanks,
Justine

On Tue, Sep 22, 2020 at 8:52 AM Justine Olshan  wrote:

> Hi all,
>
> I'd like to call a vote on KIP-516: Topic Identifiers. Here is the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
>
> The discussion thread is here:
>
> https://lists.apache.org/thread.html/7efa8cd169cadc7dc9cf86a7c0dbbab1836ddb5024d310fcebacf80c@%3Cdev.kafka.apache.org%3E
>
> Please take a look and vote if you have a chance.
>
> Thank you,
> Justine
>

Re: [VOTE] KIP-516: Topic Identifiers

2020-10-19 Thread Justine Olshan

Thanks everyone for the votes. KIP-516 has been accepted.

Binding: Jun, Rajini, David
Non-binding: Lucas, Satish, Tom

Justine

On Sat, Oct 17, 2020 at 3:22 AM Tom Bentley  wrote:

> +1 non-binding. Thanks!
>
> On Sat, Oct 17, 2020 at 7:55 AM David Jacot  wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP! This is a great and long awaited improvement.
> >
> > +1 (binding)
> >
> > Best,
> > David
> >
> > Le ven. 16 oct. 2020 à 17:36, Rajini Sivaram  a
> > écrit :
> >
> > > Hi Justine,
> > >
> > > +1 (binding)
> > >
> > > Thanks for all the work you put into this KIP!
> > >
> > > btw, there is a typo in the DeleteTopics Request/Response schema in the
> > > KIP, it says Metadata request.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Fri, Oct 16, 2020 at 4:06 PM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Justine,
> > > > Thanks for the KIP,  +1 (non-binding)
> > > >
> > > > On Thu, Oct 15, 2020 at 10:48 PM Lucas Bradstreet <
> lu...@confluent.io>
> > > > wrote:
> > > > >
> > > > > Hi Justine,
> > > > >
> > > > > +1 (non-binding). Thanks for all your hard work on this KIP!
> > > > >
> > > > > Lucas
> > > > >
> > > > > On Wed, Oct 14, 2020 at 8:59 AM Jun Rao  wrote:
> > > > >
> > > > > > Hi, Justine,
> > > > > >
> > > > > > Thanks for the updated KIP. +1 from me.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Oct 13, 2020 at 2:38 PM Jun Rao 
> wrote:
> > > > > >
> > > > > > > Hi, Justine,
> > > > > > >
> > > > > > > Thanks for starting the vote. Just a few minor comments.
> > > > > > >
> > > > > > > 1. It seems that we should remove the topic field from the
> > > > > > > StopReplicaResponse below?
> > > > > > > StopReplica Response (Version: 4) => error_code [topics]
> > > > > > >   error_code => INT16
> > > > > > > topics => topic topic_id* [partitions]
> > > > > > >
> > > > > > > 2. "After controller election, upon receiving the result,
> assign
> > > the
> > > > > > > metadata topic its unique topic ID". Will the UUID for the
> > metadata
> > > > topic
> > > > > > > be written to the metadata topic itself?
> > > > > > >
> > > > > > > 3. The vote request is designed to support multiple topics,
> each
> > of
> > > > them
> > > > > > > may require a different sentinel ID. Should we reserve more
> than
> > > one
> > > > > > > sentinel ID for future usage?
> > > > > > >
> > > > > > > 4. UUID.randomUUID(): Could we clarify whether this method
> > returns
> > > > any
> > > > > > > sentinel ID? Also, how do we expect the user to use it?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Mon, Oct 12, 2020 at 9:54 AM Justine Olshan <
> > > jols...@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> After further discussion and changes to this KIP, I think we
> are
> > > > ready
> > > > > > to
> > > > > > >> restart this vote.
> > > > > > >>
> > > > > > >> Again, here is the KIP:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > > >>
> > > > > > >> The discussion thread is here:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/7efa8cd169cadc7dc9cf86a7c0dbbab1836ddb5024d310fcebacf80c@%3Cdev.kafka.apache.org%3E
> > > > > > >>
> > > > > > >> Please take a look and vote if you have a chance.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Justine
> > > > > > >>
> > > > > > >> On Tue, Sep 22, 2020 at 8:52 AM Justine Olshan <
> > > > jols...@confluent.io>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi all,
> > > > > > >> >
> > > > > > >> > I'd like to call a vote on KIP-516: Topic Identifiers. Here
> is
> > > the
> > > > > > KIP:
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > > >> >
> > > > > > >> > The discussion thread is here:
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/7efa8cd169cadc7dc9cf86a7c0dbbab1836ddb5024d310fcebacf80c@%3Cdev.kafka.apache.org%3E
> > > > > > >> >
> > > > > > >> > Please take a look and vote if you have a chance.
> > > > > > >> >
> > > > > > >> > Thank you,
> > > > > > >> > Justine
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-24 Thread Justine Olshan

Hi all,

Thanks for the discussion. I'm glad we are able to get our best ideas out
there.

David Jacot
1. I apologize for the incorrect information. I have fixed the KIP.
2. Yes. The difference between full and incremental is that on full we
check for the two types of stale request—topics on the broker that are not
contained in the request, and topics in the request that do not match the
id stored on the broker. In incremental, we can only delete in the second
scenario since not all topics are in the request.
3. Yes we should use delete.stale.topic.delay.ms in both FULL and
INCREMENTAL requests.

I’ve updated the KIP to make some of these things more clear.

Ismael
Removing the topic ID has been requested by a few in this discussion so
I’ve decided to do this. I’ve updated the KIP to reflect this new protocol
and explained the reasoning.

As for partition metadata file. On LeaderAndIsr requests, we will check
this file for all the topic partitions included the request. If the topic
ID in the file does not match the topic ID in the request, it implies that
the local topic partition is stale, as the topic must have been deleted to
create a new topic with a different topic ID. We will mark this topic for
deletion. I’ve updated the KIP to make this more clear in the partition
metadata file section.

Jun
I’ve updated the KIP to fix some of the points you brought up in 30, 31,
and 32. 33 will require a bit more thought, so I will get back to you on
that.

Thanks,
Justine

On Thu, Sep 24, 2020 at 11:10 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the updated KIP. A few more comments below.
>
> 30.  {"name": "id", "type": "string", "doc": "version id"}}: The doc should
> say UUID.
>
> 31. LeaderAndIsrResponse v5 and StopReplicaResponse v4 : It seems there is
> no need to add topic_id at partitions level.
>
> 32. Regarding partition metadata file. Perhaps the key can be a single
> word, sth like the following.
> version: 0
> topic_id: 46bdb63f-9e8d-4a38-bf7b-ee4eb2a794e4
>
> 33. Another tricky thing that I realized is how to support the metadata
> topic introduced in KIP-595. It's a bit tricky to assign a UUID to the
> metadata topic since we have a chicken-and-egg problem. The controller
> needs to persist the UUID in the metadata topic in order to assign one
> successfully, but the metadata topic is needed to elect a controller
> (KIP-631). So, this probably needs a bit more thought.
>
> Jun
>
> On Thu, Sep 24, 2020 at 3:04 AM Ismael Juma  wrote:
>
> > Also, can we provide more details on how the Partition Metadata file will
> > be used?
> >
> > Ismael
> >
> > On Thu, Sep 24, 2020 at 3:01 AM Ismael Juma  wrote:
> >
> > > Hi Justine,
> > >
> > > I think we need to update the "Rejected Alternatives" section to take
> > into
> > > account that the proposal now removes the topic name from the fetch
> > > request. Also, if we are removing it from the Fetch request, does it
> make
> > > sense not to remove it from similar requests like ListOffsetRequest?
> > >
> > > Ismael
> > >
> > > On Thu, Sep 24, 2020 at 2:46 AM David Jacot 
> wrote:
> > >
> > >> Hi Justine,
> > >>
> > >> Thanks for the KIP. I finally had time to read it :). It is a great
> > >> improvement.
> > >>
> > >> I have a few comments/questions:
> > >>
> > >> 1. It seems that the schema of the StopReplicaRequest is slightly
> > >> outdated.
> > >> We
> > >> did some changes as part of KIP-570. V3 is already organized by
> topics.
> > >>
> > >> 2. I just want to make sure that I understand the reconciliation
> > >> logic correctly. When
> > >> an "INCREMENTAL" LeaderAndIsr Request is received, the broker will
> also
> > >> reconcile
> > >> when the local uuid does not match the uuid in the request, right? In
> > this
> > >> case, the
> > >> local log is staged for deletion.
> > >>
> > >> 3. In the documentation of the `delete.stale.topic.delay.ms` config,
> it
> > >> says "When a
> > >> FULL LeaderAndIsrRequest is received..." but I suppose it applies to
> > both
> > >> types.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Thu, Sep 24, 2020 at 1:40 AM Justine Olshan 
> > >> wrote:
> > >>
> > >> > Hi Jun,
> > >> >
> > >> > Thanks for the comments. I apologize for some of the typos and
> > >>

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-24 Thread Justine Olshan

gt; > On Thu, Sep 24, 2020 at 2:46 AM David Jacot 
> > wrote:
> > > >
> > > >> Hi Justine,
> > > >>
> > > >> Thanks for the KIP. I finally had time to read it :). It is a great
> > > >> improvement.
> > > >>
> > > >> I have a few comments/questions:
> > > >>
> > > >> 1. It seems that the schema of the StopReplicaRequest is slightly
> > > >> outdated.
> > > >> We
> > > >> did some changes as part of KIP-570. V3 is already organized by
> > topics.
> > > >>
> > > >> 2. I just want to make sure that I understand the reconciliation
> > > >> logic correctly. When
> > > >> an "INCREMENTAL" LeaderAndIsr Request is received, the broker will
> > also
> > > >> reconcile
> > > >> when the local uuid does not match the uuid in the request, right?
> In
> > > this
> > > >> case, the
> > > >> local log is staged for deletion.
> > > >>
> > > >> 3. In the documentation of the `delete.stale.topic.delay.ms`
> config,
> > it
> > > >> says "When a
> > > >> FULL LeaderAndIsrRequest is received..." but I suppose it applies to
> > > both
> > > >> types.
> > > >>
> > > >> Best,
> > > >> David
> > > >>
> > > >> On Thu, Sep 24, 2020 at 1:40 AM Justine Olshan <
> jols...@confluent.io>
> > > >> wrote:
> > > >>
> > > >> > Hi Jun,
> > > >> >
> > > >> > Thanks for the comments. I apologize for some of the typos and
> > > >> confusion.
> > > >> > I’ve updated the KIP to fix some of the issues you mentioned.
> > > >> >
> > > >> > 20.2 I’ve changed the type to String.
> > > >> > 20.1/3 I’ve updated the TopicZNode to fix formatting and reflect
> the
> > > >> latest
> > > >> > version before this change.
> > > >> >
> > > >> > 21. You are correct and I’ve removed this line. I’ve also added a
> > line
> > > >> > mentioning an IBP bump is necessary for migration
> > > >> >
> > > >> > 22. I think the wording was unclear but your summary is what was
> > > >> intended
> > > >> > by this line. I’ve updated to make this point more clear. “Remove
> > > >> deleted
> > > >> > topics from replicas by sending StopReplicaRequest V3 before the
> IBP
> > > >> bump
> > > >> > using the old logic, and using V4 and the new logic with topic IDs
> > > after
> > > >> > the IBP bump.”
> > > >> >
> > > >> > 23. I’ve removed the unspecified type from the KIP and mention
> that
> > > IBP
> > > >> > will be used to handle this request. “IBP will be used to
> determine
> > > >> whether
> > > >> > this new form of the request will be used. For older requests, we
> > will
> > > >> > ignore this field and default to previous behavior.”
> > > >> >
> > > >> > 24. I’ve fixed this typo.
> > > >> >
> > > >> > 25. I've added a topics at a higher level for LeaderAndIsrResponse
> > v5,
> > > >> > StopReplicaResponse v4. I've also changed StopReplicaRequest v4 to
> > > have
> > > >> > topics at a higher level.
> > > >> >
> > > >> > 26. I’ve updated forgotten_topics_data--added the topic ID and
> > removed
> > > >> the
> > > >> > topic name
> > > >> >
> > > >> > 27. I’ve decided on plain text, and I’ve added an example.
> > > >> >
> > > >> > 28. I’ve added this idea to future work.
> > > >> >
> > > >> > Thanks again for taking a look,
> > > >> >
> > > >> > Justine
> > > >> >
> > > >> > On Wed, Sep 23, 2020 at 10:28 AM Jun Rao 
> wrote:
> > > >> >
> > > >> > > Hi, Justine,
> > > >> > >
> > > >> > > Thanks for the response. Made another pass. A few more comments
> > > below.
> > > >> > >
> > > >> > &

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-23 Thread Justine Olshan

Hi Tom,

Thanks for the comment. I think this is a really good idea and it has been
added to the KIP under the newly added tooling section.

Thanks again,
Justine

On Wed, Sep 23, 2020 at 3:17 AM Tom Bentley  wrote:

> Hi Justine,
>
> I know you started the vote thread, but on re-reading the KIP I noticed
> that although the topic id is included in the MetadataResponse it's not
> surfaced in the output from `kafka-topics.sh --describe`. Maybe that was
> intentional because ids are intentionally not really something the user
> should care deeply about, but it would also make life harder for anyone
> debugging Kafka and this would likely get worse the more topic ids got
> rolled out across the protocols, clients etc. It seems likely that
> `kafka-topics.sh` will eventually need the ability to show the id of a
> topic and perhaps find a topic name given an id. Is there any reason not to
> implement that in this KIP?
>
> Many thanks,
>
> Tom
>
> On Mon, Sep 21, 2020 at 9:54 PM Justine Olshan 
> wrote:
>
> > Hi all,
> >
> > After thinking about it, I've decided to remove the topic name from the
> > Fetch Request and Response after all. Since there are so many of these
> > requests per second, it is worth removing the extra information. I've
> > updated the KIP to reflect this change.
> >
> > Please let me know if there is anything else we should discuss before
> > voting.
> >
> > Thank you,
> > Justine
> >
> > On Fri, Sep 18, 2020 at 9:46 AM Justine Olshan 
> > wrote:
> >
> > > Hi Jun,
> > >
> > > I see what you are saying. For now we can remove the extra information.
> > > I'll leave the option to add more fields to the file in the future. The
> > KIP
> > > has been updated to reflect this change.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Sep 18, 2020 at 8:46 AM Jun Rao  wrote:
> > >
> > >> Hi, Justine,
> > >>
> > >> Thanks for the reply.
> > >>
> > >> 13. If the log directory is the source of truth, it means that the
> > >> redundant info in the metadata file will be ignored. Then the question
> > is
> > >> why do we need to put the redundant info in the metadata file now?
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Thu, Sep 17, 2020 at 5:07 PM Justine Olshan 
> > >> wrote:
> > >>
> > >> > Hi Jun,
> > >> > Thanks for the quick response!
> > >> >
> > >> > 12. I've decided to bump up the versions on the requests and updated
> > the
> > >> > KIP. I think it's good we thoroughly discussed the options here, so
> we
> > >> know
> > >> > we made a good choice. :)
> > >> >
> > >> > 13. This is an interesting situation. I think if this does occur we
> > >> should
> > >> > give a warning. I agree that it's hard to know the source of truth
> for
> > >> sure
> > >> > since the directory or the file could be manually modified. I guess
> > the
> > >> > directory could be used as the source of truth. To be honest, I'm
> not
> > >> > really sure what happens in kafka when the log directory is renamed
> > >> > manually in such a way. I'm also wondering if the situation is
> > >> recoverable
> > >> > in this scenario.
> > >> >
> > >> > Thanks,
> > >> > Justine
> > >> >
> > >> > On Thu, Sep 17, 2020 at 4:28 PM Jun Rao  wrote:
> > >> >
> > >> > > Hi, Justine,
> > >> > >
> > >> > > Thanks for the reply.
> > >> > >
> > >> > > 12. I don't have a strong preference either. However, if we need
> IBP
> > >> > > anyway, maybe it's easier to just bump up the version for all
> inter
> > >> > broker
> > >> > > requests and add the topic id field as a regular field. A regular
> > >> field
> > >> > is
> > >> > > a bit more concise in wire transfer than a flexible field.
> > >> > >
> > >> > > 13. The confusion that I was referring to is between the topic
> name
> > >> and
> > >> > > partition number between the log dir and the metadata file. For
> > >> example,
> > >> > if
> > >> > > the log dir is topicA-1 and the

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-23 Thread Justine Olshan

Hi Jun,

Thanks for the comments. I apologize for some of the typos and confusion.
I’ve updated the KIP to fix some of the issues you mentioned.

20.2 I’ve changed the type to String.
20.1/3 I’ve updated the TopicZNode to fix formatting and reflect the latest
version before this change.

21. You are correct and I’ve removed this line. I’ve also added a line
mentioning an IBP bump is necessary for migration

22. I think the wording was unclear but your summary is what was intended
by this line. I’ve updated to make this point more clear. “Remove deleted
topics from replicas by sending StopReplicaRequest V3 before the IBP bump
using the old logic, and using V4 and the new logic with topic IDs after
the IBP bump.”

23. I’ve removed the unspecified type from the KIP and mention that IBP
will be used to handle this request. “IBP will be used to determine whether
this new form of the request will be used. For older requests, we will
ignore this field and default to previous behavior.”

24. I’ve fixed this typo.

25. I've added a topics at a higher level for LeaderAndIsrResponse v5,
StopReplicaResponse v4. I've also changed StopReplicaRequest v4 to have
topics at a higher level.

26. I’ve updated forgotten_topics_data--added the topic ID and removed the
topic name

27. I’ve decided on plain text, and I’ve added an example.

28. I’ve added this idea to future work.

Thanks again for taking a look,

Justine

On Wed, Sep 23, 2020 at 10:28 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the response. Made another pass. A few more comments below.
>
> 20. znode schema:
> 20.1 It seems that {"name": "version", "type": "int", "id": "UUID", "doc":
> "version id"} should be {"name": "version", "type": "int"}, {"name": "id",
> "type": "UUID", "doc": "version id"}.
> 20.2 The znode format is JSON which doesn't have UUID type. So the type
> probably should be string?
> 20.3 Also, the existing format used seems outdated. It should have the
> following format.
> Json.encodeAsBytes(Map(
>   "version" -> 2,
>   "partitions" -> replicaAssignmentJson.asJava,
>   "adding_replicas" -> addingReplicasAssignmentJson.asJava,
>   "removing_replicas" -> removingReplicasAssignmentJson.asJava
> ).asJava)
>   }
>
> 21. Migration: The KIP says "The migration process can take place without
> an inter-broker protocol bump, as the format stored in
> /brokers/topics/[topic] will be compatible with older broker versions."
> However, since we are bumping up the version of inter-broker requests, it
> seems that we need to use IBP for migration.
>
> 22. The KIP says "Remove deleted topics from replicas by sending
> StopReplicaRequest V3 for any topics which do not contain a topic ID, and
> V4 for any topics which do contain a topic ID.". However, if we use IBP, it
> seems that the controller will either send StopReplicaRequest V3
> or StopReplicaRequest V4, but never mixed V3 and V4 for different topics.
> Basically, before the IBP bump, V3 will be used. After the IBP bump,
> topicId will be created and V4 will be used.
>
> 23. Given that we depend on IBP, do we still need "0 UNSPECIFIED"
> in LeaderAndIsr?
>
> 24. LeaderAndIsrResponse v5 : It still has the topic field.
>
> 25. LeaderAndIsrResponse v5, StopReplicaResponse v4: Could we use this
> opportunity to organize the response in 2 levels, first by topic, then by
> partition, as most other requests/responses?
>
> 26. FetchRequest v13 : Should forgotten_topics_data use topicId too?
>
> 27. "This file can either be plain text (key/value pairs) or JSON." Have we
> decided which one to use? Also, it would be helpful to provide an example.
>
> 28. Future improvement: Another future improvement opportunity is to use
> topicId in GroupMetadataManager.offsetCommitKey in the offset_commit topic.
> This may save some space.
>
> Thanks,
>
> Jun
>
> On Wed, Sep 23, 2020 at 8:50 AM Justine Olshan 
> wrote:
>
> > Hi Tom,
> >
> > Thanks for the comment. I think this is a really good idea and it has
> been
> > added to the KIP under the newly added tooling section.
> >
> > Thanks again,
> > Justine
> >
> > On Wed, Sep 23, 2020 at 3:17 AM Tom Bentley  wrote:
> >
> > > Hi Justine,
> > >
> > > I know you started the vote thread, but on re-reading the KIP I noticed
> > > that although the topic id is included in the MetadataResponse it's not
> > > surfaced in the output from `kafka-topics.sh --describe`. Maybe that
> was
> > > int

[VOTE] KIP-516: Topic Identifiers

2020-09-22 Thread Justine Olshan

Hi all,

I'd like to call a vote on KIP-516: Topic Identifiers. Here is the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers

The discussion thread is here:
https://lists.apache.org/thread.html/7efa8cd169cadc7dc9cf86a7c0dbbab1836ddb5024d310fcebacf80c@%3Cdev.kafka.apache.org%3E

Please take a look and vote if you have a chance.

Thank you,
Justine

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-21 Thread Justine Olshan

Hi all,

After thinking about it, I've decided to remove the topic name from the
Fetch Request and Response after all. Since there are so many of these
requests per second, it is worth removing the extra information. I've
updated the KIP to reflect this change.

Please let me know if there is anything else we should discuss before
voting.

Thank you,
Justine

On Fri, Sep 18, 2020 at 9:46 AM Justine Olshan  wrote:

> Hi Jun,
>
> I see what you are saying. For now we can remove the extra information.
> I'll leave the option to add more fields to the file in the future. The KIP
> has been updated to reflect this change.
>
> Thanks,
> Justine
>
> On Fri, Sep 18, 2020 at 8:46 AM Jun Rao  wrote:
>
>> Hi, Justine,
>>
>> Thanks for the reply.
>>
>> 13. If the log directory is the source of truth, it means that the
>> redundant info in the metadata file will be ignored. Then the question is
>> why do we need to put the redundant info in the metadata file now?
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Sep 17, 2020 at 5:07 PM Justine Olshan 
>> wrote:
>>
>> > Hi Jun,
>> > Thanks for the quick response!
>> >
>> > 12. I've decided to bump up the versions on the requests and updated the
>> > KIP. I think it's good we thoroughly discussed the options here, so we
>> know
>> > we made a good choice. :)
>> >
>> > 13. This is an interesting situation. I think if this does occur we
>> should
>> > give a warning. I agree that it's hard to know the source of truth for
>> sure
>> > since the directory or the file could be manually modified. I guess the
>> > directory could be used as the source of truth. To be honest, I'm not
>> > really sure what happens in kafka when the log directory is renamed
>> > manually in such a way. I'm also wondering if the situation is
>> recoverable
>> > in this scenario.
>> >
>> > Thanks,
>> > Justine
>> >
>> > On Thu, Sep 17, 2020 at 4:28 PM Jun Rao  wrote:
>> >
>> > > Hi, Justine,
>> > >
>> > > Thanks for the reply.
>> > >
>> > > 12. I don't have a strong preference either. However, if we need IBP
>> > > anyway, maybe it's easier to just bump up the version for all inter
>> > broker
>> > > requests and add the topic id field as a regular field. A regular
>> field
>> > is
>> > > a bit more concise in wire transfer than a flexible field.
>> > >
>> > > 13. The confusion that I was referring to is between the topic name
>> and
>> > > partition number between the log dir and the metadata file. For
>> example,
>> > if
>> > > the log dir is topicA-1 and the metadata file in it has topicB and
>> > > partition 0 (say due to a bug or manual modification), which one do we
>> > use
>> > > as the source of truth?
>> > >
>> > > Jun
>> > >
>> > > On Thu, Sep 17, 2020 at 3:43 PM Justine Olshan 
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > > Thanks for the comments.
>> > > >
>> > > > 12. I bumped the LeaderAndIsrRequest because I removed the topic
>> name
>> > > field
>> > > > in the response. It may be possible to avoid bumping the version
>> > without
>> > > > that change, but I may be missing something.
>> > > > I believe StopReplica is actually on version 3 now, but because
>> > version 2
>> > > > is flexible, I kept that listed as version 2 on the KIP page.
>> However,
>> > > you
>> > > > may be right in that we may need to bump the version on StopReplica
>> to
>> > > deal
>> > > > with deletion differently as mentioned above. I don't know if I
>> have a
>> > > big
>> > > > preference over used tagged fields or not.
>> > > >
>> > > > 13. I was thinking that in the case where the file and the request
>> > topic
>> > > > ids don't match, it means that the broker's topic/the one in the
>> file
>> > has
>> > > > been deleted. In that case, we would need to delete the old topic
>> and
>> > > start
>> > > > receiving the new version. If the topic name were to change, but the
>> > ids
>> > > > still matched, the file would also need to update. Am I missing a
>> case
>> > > > where the file would be

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-30 Thread Justine Olshan

Hello again,

I've taken some time to discuss some of the remaining points brought up by
the previous emails offline. Here are some of the conclusions.

1. Directory Structure:
There was some talk about whether the directory structure should be changed
to replace all topic names with topic IDs.
This will be a useful change, but will prevent downgrades. It will be best
to wait until a major release, likely alongside KIP-500 changes that will
prevent downgrades. I've updated the KIP to include this change with the
note about migration and deprecation.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-log.dirlayout

2. Partition Metadata file
There was some disagreement about the inclusion of the partition metadata
file.
This file will be needed to persist the topic ID, especially while we still
have the old directory structure. Even after the changes, this file can be
useful for debugging and recovery.
Creating a single mapping file from topic names to topic IDs was
considered, but ultimately scrapped as it would not be as easy to maintain.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PartitionMetadatafile

https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PersistingTopicIDs

3. Produce Protocols
After some further discussion, this replacing the topic name with topic ID
in these protocols has been added to the KIP.
This will cut down on the size of the protocol. Since changes to fetch are
included, it does make sense to update these protocols.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-Produce

4. KIP-500 Compatibility
After some discussion with those involved with KIP-500, it seems best to
use a sentinel topic ID for the metadata topic that is used before the
first controller election.
We had an issue where this metadata topic may not be assigned an ID before
utilizing the Vote and Fetch protocols. It was decided to reserve a unique
ID for this topic to be used until the controller could give the topic a
unique ID.
Having the topic keep the sentinel ID (not be reassigned to a unique ID)
was considered, but it seemed like a good idea to leave the option open for
the metadata topic to have a unique ID in cases where it would need to be
differentiated from other clusters' metadata topics. (ex. tiered storage)
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-CompatibilitywithKIP-500

I've also split up the KIP into sub-tasks on the JIRA. Hopefully this will
provide a better idea about what tasks we have, and eventually provide a
place to see what's done and what is left.
If there is a task I am missing, please let me know!
https://issues.apache.org/jira/browse/KAFKA-8872

Of course, these decisions are not set in stone, and I would love to hear
any feedback.

Thanks,
Justine

On Mon, Sep 28, 2020 at 11:38 AM Justine Olshan 
wrote:

> Hello all,
>
> I just wanted to follow up on this discussion. Did we come to an
> understanding about the directory structure?
>
> I think the biggest question here is what is acceptable to leave out due
> to scope vs. what is considered to be too much tech debt.
> This KIP is already pretty large with the number of changes, but it also
> makes sense to do things correctly, so I'd love to hear everyone's thoughts.
>
> Thanks,
> Justine
>
> On Fri, Sep 25, 2020 at 8:19 AM Lucas Bradstreet 
> wrote:
>
>> Hi Ismael,
>>
>> If you do not store it in a metadata file or in the directory structure
>> would you then
>> require the LeaderAndIsrRequest following startup to give you some notion
>> of
>> topic name in memory? We would still need this information for the older
>> protocols, but
>> perhaps this is what's meant by tech debt.
>>
>> Once we're free of the old non-topicID requests then I think you wouldn't
>> need to retain the topic name.
>> I think the ability to easily look up topic names associated with
>> partition
>> directories would still be missed
>> when diagnosing issues, though maybe it wouldn't be a deal breaker with
>> the
>> right tooling.
>>
>> Thanks,
>>
>> Lucas
>>
>> On Fri, Sep 25, 2020 at 7:55 AM Ismael Juma  wrote:
>>
>> > Hi Lucas,
>> >
>> > Why would you include the name and id? I think you'd want to remove the
>> > name from the directory name right? Jason's suggestion was that if you
>> > remove the name from the directory, then why would you need the id name
>> > mapping file?
>> >
>> > Ismael
>> >
>> > On Thu, Sep 24, 2020 at 4:24 PM Lucas Bradstreet 
>> > wrote:
>> >
>

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-10-01 Thread Justine Olshan

Hi Jun,
Thanks for the response!

30. I think I might have changed this in between. The current version
says:  {"name":
"id", "type": "option[UUID]"}, "doc": topic id}
I have switched to the option type to cover the migration case where a
TopicZNode does not yet have a topic ID.
I understand that due to json, this field is written as a string, so if I
should move the "option[uuid]" to the doc field and the type should be
"string" please let me know.

40. I've added a definition for UUID.
41,42. Fixed

Thank you,
Justine

On Wed, Sep 30, 2020 at 1:15 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the summary. Just a few minor comments blow.
>
> 30.  {"name": "id", "type": "string", "doc": "version id"}}: The doc should
> say UUID. The issue seems unfixed.
>
> 40. Since UUID is public facing, could you include its definition?
>
> 41. StopReplicaResponse still includes the topic field.
>
> 42. "It is unnecessary to include the name of the topic in the following
> Request/Response calls" It would be useful to include all modified requests
> (e.g. produce) in the list below.
>
> Thanks,
>
> Jun
>
>
> On Wed, Sep 30, 2020 at 10:17 AM Justine Olshan 
> wrote:
>
> > Hello again,
> >
> > I've taken some time to discuss some of the remaining points brought up
> by
> > the previous emails offline. Here are some of the conclusions.
> >
> > 1. Directory Structure:
> > There was some talk about whether the directory structure should be
> changed
> > to replace all topic names with topic IDs.
> > This will be a useful change, but will prevent downgrades. It will be
> best
> > to wait until a major release, likely alongside KIP-500 changes that will
> > prevent downgrades. I've updated the KIP to include this change with the
> > note about migration and deprecation.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-log.dirlayout
> >
> > 2. Partition Metadata file
> > There was some disagreement about the inclusion of the partition metadata
> > file.
> > This file will be needed to persist the topic ID, especially while we
> still
> > have the old directory structure. Even after the changes, this file can
> be
> > useful for debugging and recovery.
> > Creating a single mapping file from topic names to topic IDs was
> > considered, but ultimately scrapped as it would not be as easy to
> maintain.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PartitionMetadatafile
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PersistingTopicIDs
> >
> > 3. Produce Protocols
> > After some further discussion, this replacing the topic name with topic
> ID
> > in these protocols has been added to the KIP.
> > This will cut down on the size of the protocol. Since changes to fetch
> are
> > included, it does make sense to update these protocols.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-Produce
> >
> > 4. KIP-500 Compatibility
> > After some discussion with those involved with KIP-500, it seems best to
> > use a sentinel topic ID for the metadata topic that is used before the
> > first controller election.
> > We had an issue where this metadata topic may not be assigned an ID
> before
> > utilizing the Vote and Fetch protocols. It was decided to reserve a
> unique
> > ID for this topic to be used until the controller could give the topic a
> > unique ID.
> > Having the topic keep the sentinel ID (not be reassigned to a unique ID)
> > was considered, but it seemed like a good idea to leave the option open
> for
> > the metadata topic to have a unique ID in cases where it would need to be
> > differentiated from other clusters' metadata topics. (ex. tiered storage)
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-CompatibilitywithKIP-500
> >
> > I've also split up the KIP into sub-tasks on the JIRA. Hopefully this
> will
> > provide a better idea about what tasks we have, and eventually provide a
> > place to see what's done and what is left.
> > If there is a task I am missing, please let me know!
> > https://issues.apache.org/jira/browse/KAFKA-8872
> >
> > Of course, these decisions are not set i

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-10-01 Thread Justine Olshan

Hi Jun,
Thanks for looking at it again. Here's the new spec. (I fixed the typo in
it too.)
{"name": "id", "type": "string", "doc": option[UUID]}

Justine


On Thu, Oct 1, 2020 at 5:03 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the update. The KIP looks good to me now. Just a minor comment
> below.
>
> 30. Perhaps "option[UUID]" can be put in the doc.
>
> Jun
>
> On Thu, Oct 1, 2020 at 3:28 PM Justine Olshan 
> wrote:
>
> > Hi Jun,
> > Thanks for the response!
> >
> > 30. I think I might have changed this in between. The current version
> > says:  {"name":
> > "id", "type": "option[UUID]"}, "doc": topic id}
> > I have switched to the option type to cover the migration case where a
> > TopicZNode does not yet have a topic ID.
> > I understand that due to json, this field is written as a string, so if I
> > should move the "option[uuid]" to the doc field and the type should be
> > "string" please let me know.
> >
> > 40. I've added a definition for UUID.
> > 41,42. Fixed
> >
> > Thank you,
> > Justine
> >
> > On Wed, Sep 30, 2020 at 1:15 PM Jun Rao  wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the summary. Just a few minor comments blow.
> > >
> > > 30.  {"name": "id", "type": "string", "doc": "version id"}}: The doc
> > should
> > > say UUID. The issue seems unfixed.
> > >
> > > 40. Since UUID is public facing, could you include its definition?
> > >
> > > 41. StopReplicaResponse still includes the topic field.
> > >
> > > 42. "It is unnecessary to include the name of the topic in the
> following
> > > Request/Response calls" It would be useful to include all modified
> > requests
> > > (e.g. produce) in the list below.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Sep 30, 2020 at 10:17 AM Justine Olshan 
> > > wrote:
> > >
> > > > Hello again,
> > > >
> > > > I've taken some time to discuss some of the remaining points brought
> up
> > > by
> > > > the previous emails offline. Here are some of the conclusions.
> > > >
> > > > 1. Directory Structure:
> > > > There was some talk about whether the directory structure should be
> > > changed
> > > > to replace all topic names with topic IDs.
> > > > This will be a useful change, but will prevent downgrades. It will be
> > > best
> > > > to wait until a major release, likely alongside KIP-500 changes that
> > will
> > > > prevent downgrades. I've updated the KIP to include this change with
> > the
> > > > note about migration and deprecation.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-log.dirlayout
> > > >
> > > > 2. Partition Metadata file
> > > > There was some disagreement about the inclusion of the partition
> > metadata
> > > > file.
> > > > This file will be needed to persist the topic ID, especially while we
> > > still
> > > > have the old directory structure. Even after the changes, this file
> can
> > > be
> > > > useful for debugging and recovery.
> > > > Creating a single mapping file from topic names to topic IDs was
> > > > considered, but ultimately scrapped as it would not be as easy to
> > > maintain.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PartitionMetadatafile
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-PersistingTopicIDs
> > > >
> > > > 3. Produce Protocols
> > > > After some further discussion, this replacing the topic name with
> topic
> > > ID
> > > > in these protocols has been added to the KIP.
> > > > This will cut down on the size of the protocol. Since changes to
> fetch
> > > are
> > > > included, it does make sense to update these protocols.
> > > >
> > > >
> > >
> >
> ht

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-17 Thread Justine Olshan

Hi Jun,
Thanks for the quick response!

12. I've decided to bump up the versions on the requests and updated the
KIP. I think it's good we thoroughly discussed the options here, so we know
we made a good choice. :)

13. This is an interesting situation. I think if this does occur we should
give a warning. I agree that it's hard to know the source of truth for sure
since the directory or the file could be manually modified. I guess the
directory could be used as the source of truth. To be honest, I'm not
really sure what happens in kafka when the log directory is renamed
manually in such a way. I'm also wondering if the situation is recoverable
in this scenario.

Thanks,
Justine

On Thu, Sep 17, 2020 at 4:28 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> 12. I don't have a strong preference either. However, if we need IBP
> anyway, maybe it's easier to just bump up the version for all inter broker
> requests and add the topic id field as a regular field. A regular field is
> a bit more concise in wire transfer than a flexible field.
>
> 13. The confusion that I was referring to is between the topic name and
> partition number between the log dir and the metadata file. For example, if
> the log dir is topicA-1 and the metadata file in it has topicB and
> partition 0 (say due to a bug or manual modification), which one do we use
> as the source of truth?
>
> Jun
>
> On Thu, Sep 17, 2020 at 3:43 PM Justine Olshan 
> wrote:
>
> > Hi Jun,
> > Thanks for the comments.
> >
> > 12. I bumped the LeaderAndIsrRequest because I removed the topic name
> field
> > in the response. It may be possible to avoid bumping the version without
> > that change, but I may be missing something.
> > I believe StopReplica is actually on version 3 now, but because version 2
> > is flexible, I kept that listed as version 2 on the KIP page. However,
> you
> > may be right in that we may need to bump the version on StopReplica to
> deal
> > with deletion differently as mentioned above. I don't know if I have a
> big
> > preference over used tagged fields or not.
> >
> > 13. I was thinking that in the case where the file and the request topic
> > ids don't match, it means that the broker's topic/the one in the file has
> > been deleted. In that case, we would need to delete the old topic and
> start
> > receiving the new version. If the topic name were to change, but the ids
> > still matched, the file would also need to update. Am I missing a case
> > where the file would be correct and not the request?
> >
> > Thanks,
> > Justine
> >
> > On Thu, Sep 17, 2020 at 3:18 PM Jun Rao  wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply. A couple of more comments below.
> > >
> > > 12. ListOffset and OffsetForLeader currently don't support flexible
> > fields.
> > > So, we have to bump up the version number and use IBP at least for
> these
> > > two requests. Note that it seems 2.7.0 will require IBP anyway because
> of
> > > changes in KAFKA-10435. Also, it seems that the version for
> > > LeaderAndIsrRequest and StopReplica are bumped even though we only
> added
> > a
> > > tagged field. But since IBP is needed anyway, we may want to revisit
> the
> > > overall tagged field choice.
> > >
> > > 13. The only downside is the potential confusion on which one is the
> > source
> > > of truth if they don't match. Another option is to include those fields
> > in
> > > the metadata file when we actually change the directory structure.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Sep 17, 2020 at 2:01 PM Justine Olshan 
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I've thought some more about removing the topic name field from some
> of
> > > the
> > > > requests. On closer inspection of the requests/responses, it seems
> that
> > > the
> > > > internal changes would be much larger than I expected. Some protocols
> > > > involve clients, so they would require changes too. I'm thinking that
> > for
> > > > now, removing the topic name from these requests and responses are
> out
> > of
> > > > scope.
> > > >
> > > > I have decided to just keep the change LeaderAndIsrResponse to remove
> > the
> > > > topic name, and have updated the KIP to reflect this change. I have
> > also
> > > > mentioned the other requests and responses in future work.
> &

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-17 Thread Justine Olshan

Hello all,

I've thought some more about removing the topic name field from some of the
requests. On closer inspection of the requests/responses, it seems that the
internal changes would be much larger than I expected. Some protocols
involve clients, so they would require changes too. I'm thinking that for
now, removing the topic name from these requests and responses are out of
scope.

I have decided to just keep the change LeaderAndIsrResponse to remove the
topic name, and have updated the KIP to reflect this change. I have also
mentioned the other requests and responses in future work.

I'm hoping to start the voting process soon, so let me know if there is
anything else we should discuss.

Thank you,
Justine

On Tue, Sep 15, 2020 at 3:57 PM Justine Olshan  wrote:

> Hello again,
> To follow up on some of the other comments:
>
> 10/11) We can remove the topic name from these requests/responses, and
> that means we will just have to make a few internal changes to make
> partitions accessible by topic id and partition. I can update the KIP to
> remove them unless anyone thinks they should stay.
>
> 12) Addressed in the previous email. I've updated the KIP to include
> tagged fields for the requests and responses. (More on that below)
>
> 13) I think part of the idea for including this information is to prepare
> for future changes. Perhaps the directory structure might change from
> topicName_partitionNumber to something like topicID_partitionNumber. Then
> it would be useful to have the topic name in the file since it would not be
> in the directory structure. Supporting topic renames might be easier if the
> other fields are included. Would there be any downsides to including this
> information?
>
> 14)  Yes, we would need to copy the partition metadata file in this
> process. I've updated the KIP to include this.
>
> 15) I believe Lucas meant v1 and v2 here. He was referring to how the
> requests would fall under different IBP and meant that older brokers would
> have to use the older version of the request and the existing topic
> deletion process. At first, it seemed like tagged fields would resolve
> the IBP issue. However, we may need IBP for this request after all since
> the controller handles the topic deletion differently depending on the IBP
> version. In an older version, we can't just send a StopReplica delete the
> topic immediately like we'd want to for this KIP.
>
> This makes me wonder if we want tagged fields on all the requests after
> all. Let me know your thoughts!
>
> Justine
>
> On Tue, Sep 15, 2020 at 1:03 PM Justine Olshan 
> wrote:
>
>> Hi all,
>> Jun brought up a good point in his last email about tagged fields, and
>> I've updated the KIP to reflect that the changes to requests and responses
>> will be in the form of tagged fields to avoid changing IBP.
>>
>> Jun: I plan on sending a followup email to address some of the other
>> points.
>>
>> Thanks,
>> Justine
>>
>> On Mon, Sep 14, 2020 at 4:25 PM Jun Rao  wrote:
>>
>>> Hi, Justine,
>>>
>>> Thanks for the updated KIP. A few comments below.
>>>
>>> 10. LeaderAndIsr Response: Do we need the topic name?
>>>
>>> 11. For the changed request/response, other than LeaderAndIsr,
>>> UpdateMetadata, Metadata, do we need to include the topic name?
>>>
>>> 12. It seems that upgrades don't require IBP. Does that mean the new
>>> fields
>>> in all the request/response are added as tagged fields without bumping up
>>> the request version? It would be useful to make that clear.
>>>
>>> 13. Partition Metadata file: Do we need to include the topic name and the
>>> partition id since they are implied in the directory name?
>>>
>>> 14. In the JBOD mode, we support moving a partition's data from one disk
>>> to
>>> another. Will the new partition metadata file be copied during that
>>> process?
>>>
>>> 15. The KIP says "Remove deleted topics from replicas by sending
>>> StopReplicaRequest V2 for any topics which do not contain a topic ID, and
>>> V3 for any topics which do contain a topic ID.". However, it seems the
>>> updated controller will create all missing topic IDs first before doing
>>> other actions. So, is StopReplicaRequest V2 needed?
>>>
>>> Jun
>>>
>>> On Fri, Sep 11, 2020 at 10:31 AM John Roesler 
>>> wrote:
>>>
>>> > Thanks, Justine!
>>> >
>>> > Your response seems compelling to me.
>>> >
>>> > -John
>>> >
>>> > On Fri, 2020-09-11 at 10:17 -0700, Justi

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-17 Thread Justine Olshan

Hi Jun,
Thanks for the comments.

12. I bumped the LeaderAndIsrRequest because I removed the topic name field
in the response. It may be possible to avoid bumping the version without
that change, but I may be missing something.
I believe StopReplica is actually on version 3 now, but because version 2
is flexible, I kept that listed as version 2 on the KIP page. However, you
may be right in that we may need to bump the version on StopReplica to deal
with deletion differently as mentioned above. I don't know if I have a big
preference over used tagged fields or not.

13. I was thinking that in the case where the file and the request topic
ids don't match, it means that the broker's topic/the one in the file has
been deleted. In that case, we would need to delete the old topic and start
receiving the new version. If the topic name were to change, but the ids
still matched, the file would also need to update. Am I missing a case
where the file would be correct and not the request?

Thanks,
Justine

On Thu, Sep 17, 2020 at 3:18 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply. A couple of more comments below.
>
> 12. ListOffset and OffsetForLeader currently don't support flexible fields.
> So, we have to bump up the version number and use IBP at least for these
> two requests. Note that it seems 2.7.0 will require IBP anyway because of
> changes in KAFKA-10435. Also, it seems that the version for
> LeaderAndIsrRequest and StopReplica are bumped even though we only added a
> tagged field. But since IBP is needed anyway, we may want to revisit the
> overall tagged field choice.
>
> 13. The only downside is the potential confusion on which one is the source
> of truth if they don't match. Another option is to include those fields in
> the metadata file when we actually change the directory structure.
>
> Thanks,
>
> Jun
>
> On Thu, Sep 17, 2020 at 2:01 PM Justine Olshan 
> wrote:
>
> > Hello all,
> >
> > I've thought some more about removing the topic name field from some of
> the
> > requests. On closer inspection of the requests/responses, it seems that
> the
> > internal changes would be much larger than I expected. Some protocols
> > involve clients, so they would require changes too. I'm thinking that for
> > now, removing the topic name from these requests and responses are out of
> > scope.
> >
> > I have decided to just keep the change LeaderAndIsrResponse to remove the
> > topic name, and have updated the KIP to reflect this change. I have also
> > mentioned the other requests and responses in future work.
> >
> > I'm hoping to start the voting process soon, so let me know if there is
> > anything else we should discuss.
> >
> > Thank you,
> > Justine
> >
> > On Tue, Sep 15, 2020 at 3:57 PM Justine Olshan 
> > wrote:
> >
> > > Hello again,
> > > To follow up on some of the other comments:
> > >
> > > 10/11) We can remove the topic name from these requests/responses, and
> > > that means we will just have to make a few internal changes to make
> > > partitions accessible by topic id and partition. I can update the KIP
> to
> > > remove them unless anyone thinks they should stay.
> > >
> > > 12) Addressed in the previous email. I've updated the KIP to include
> > > tagged fields for the requests and responses. (More on that below)
> > >
> > > 13) I think part of the idea for including this information is to
> prepare
> > > for future changes. Perhaps the directory structure might change from
> > > topicName_partitionNumber to something like topicID_partitionNumber.
> Then
> > > it would be useful to have the topic name in the file since it would
> not
> > be
> > > in the directory structure. Supporting topic renames might be easier if
> > the
> > > other fields are included. Would there be any downsides to including
> this
> > > information?
> > >
> > > 14)  Yes, we would need to copy the partition metadata file in this
> > > process. I've updated the KIP to include this.
> > >
> > > 15) I believe Lucas meant v1 and v2 here. He was referring to how the
> > > requests would fall under different IBP and meant that older brokers
> > would
> > > have to use the older version of the request and the existing topic
> > > deletion process. At first, it seemed like tagged fields would resolve
> > > the IBP issue. However, we may need IBP for this request after all
> since
> > > the controller handles the topic deletion differently depending on the
> > IBP
> > > version. In an older

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-28 Thread Justine Olshan

 > > scope. Perhaps Lucas Bradstreet might have more insight about the
> > > decision.
> > > > Basically my point is that we have to create additional
> infrastructure
> > > here
> > > > to support the name/id mapping, so I wanted to understand if we just
> > > > consider this a sort of tech debt. It would be useful to cover how we
> > > > handle the case when this file gets corrupted. Seems like we just
> have
> > to
> > > > trust that it matches whatever the controller tells us and rewrite
> it?
> > > >
> > > > > 3. I think this is a good point, but I again I wonder about the
> scope
> > > of
> > > > the KIP. Most of the changes mentioned in the KIP are for supporting
> > > topic
> > > > deletion and I believe that is why the produce request was listed
> under
> > > > future work.
> > > >
> > > > That's fair. I brought it up since `Fetch` is already included. If
> > we've
> > > > got `Metadata` and `Fetch`, seems we may as well do `Produce` and
> save
> > an
> > > > extra kip. No strong objection though if you want to leave it out.
> > > >
> > > >
> > > > -Jason
> > > >
> > > >
> > > > On Thu, Sep 24, 2020 at 3:26 PM Justine Olshan  >
> > > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > 1. Yes, the directory will still be based on the topic names.
> > > > > LeaderAndIsrRequest is one of the few requests that will still
> > contain
> > > > the
> > > > > topic name. So I think we have this covered. Sorry for confusion.
> > > > >
> > > > > 2. Part of the usage of the file is to have persistent storage of
> the
> > > > topic
> > > > > ID and use it to compare with the ID supplied in the LeaderAndIsr
> > > > Request.
> > > > > There is some discussion in the KIP about changes to the directory
> > > > > structure, but I believe directory changes were considered to be
> out
> > of
> > > > > scope when the KIP was written.
> > > > >
> > > > > 3. I think this is a good point, but I again I wonder about the
> scope
> > > of
> > > > > the KIP. Most of the changes mentioned in the KIP are for
> supporting
> > > > topic
> > > > > deletion and I believe that is why the produce request was listed
> > under
> > > > > future work.
> > > > >
> > > > > 4. This sounds like it might be a good solution, but I will need to
> > > > discuss
> > > > > more with KIP-500 folks to get the details right.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Thu, Sep 24, 2020 at 12:30 PM Jason Gustafson <
> ja...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for picking up this work. I have a few questions/comments:
> > > > > >
> > > > > > 1. It sounds like the directory structure is still going to be
> > based
> > > on
> > > > > > topic names. Do I have that right? One complication is that the
> > > > > > LeaderAndIsr request does not include the topic name any longer.
> > This
> > > > > means
> > > > > > that a replica must wait for the UpdateMetadata request to arrive
> > > with
> > > > > the
> > > > > > topic name mapping before it can create the directory. I wonder
> if
> > it
> > > > > would
> > > > > > be simpler to avoid assumptions on the ordering of UpdateMetadata
> > and
> > > > let
> > > > > > LeaderAndIsr include the topic name as well. Feels like we are
> not
> > > > saving
> > > > > > that much by excluding it.
> > > > > >
> > > > > > 2. On a related note, it seems that the reason we have the
> > partition
> > > > > > metadata file is because we are not changing the directory
> > structure.
> > > > We
> > > > > > need it so that we remember which directories map to which topic
> > id.
> > > I
> >

Re: [DISCUSS] KIP-516: Topic Identifiers

2020-09-18 Thread Justine Olshan

Hi Jun,

I see what you are saying. For now we can remove the extra information.
I'll leave the option to add more fields to the file in the future. The KIP
has been updated to reflect this change.

Thanks,
Justine

On Fri, Sep 18, 2020 at 8:46 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> 13. If the log directory is the source of truth, it means that the
> redundant info in the metadata file will be ignored. Then the question is
> why do we need to put the redundant info in the metadata file now?
>
> Thanks,
>
> Jun
>
> On Thu, Sep 17, 2020 at 5:07 PM Justine Olshan 
> wrote:
>
> > Hi Jun,
> > Thanks for the quick response!
> >
> > 12. I've decided to bump up the versions on the requests and updated the
> > KIP. I think it's good we thoroughly discussed the options here, so we
> know
> > we made a good choice. :)
> >
> > 13. This is an interesting situation. I think if this does occur we
> should
> > give a warning. I agree that it's hard to know the source of truth for
> sure
> > since the directory or the file could be manually modified. I guess the
> > directory could be used as the source of truth. To be honest, I'm not
> > really sure what happens in kafka when the log directory is renamed
> > manually in such a way. I'm also wondering if the situation is
> recoverable
> > in this scenario.
> >
> > Thanks,
> > Justine
> >
> > On Thu, Sep 17, 2020 at 4:28 PM Jun Rao  wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply.
> > >
> > > 12. I don't have a strong preference either. However, if we need IBP
> > > anyway, maybe it's easier to just bump up the version for all inter
> > broker
> > > requests and add the topic id field as a regular field. A regular field
> > is
> > > a bit more concise in wire transfer than a flexible field.
> > >
> > > 13. The confusion that I was referring to is between the topic name and
> > > partition number between the log dir and the metadata file. For
> example,
> > if
> > > the log dir is topicA-1 and the metadata file in it has topicB and
> > > partition 0 (say due to a bug or manual modification), which one do we
> > use
> > > as the source of truth?
> > >
> > > Jun
> > >
> > > On Thu, Sep 17, 2020 at 3:43 PM Justine Olshan 
> > > wrote:
> > >
> > > > Hi Jun,
> > > > Thanks for the comments.
> > > >
> > > > 12. I bumped the LeaderAndIsrRequest because I removed the topic name
> > > field
> > > > in the response. It may be possible to avoid bumping the version
> > without
> > > > that change, but I may be missing something.
> > > > I believe StopReplica is actually on version 3 now, but because
> > version 2
> > > > is flexible, I kept that listed as version 2 on the KIP page.
> However,
> > > you
> > > > may be right in that we may need to bump the version on StopReplica
> to
> > > deal
> > > > with deletion differently as mentioned above. I don't know if I have
> a
> > > big
> > > > preference over used tagged fields or not.
> > > >
> > > > 13. I was thinking that in the case where the file and the request
> > topic
> > > > ids don't match, it means that the broker's topic/the one in the file
> > has
> > > > been deleted. In that case, we would need to delete the old topic and
> > > start
> > > > receiving the new version. If the topic name were to change, but the
> > ids
> > > > still matched, the file would also need to update. Am I missing a
> case
> > > > where the file would be correct and not the request?
> > > >
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Thu, Sep 17, 2020 at 3:18 PM Jun Rao  wrote:
> > > >
> > > > > Hi, Justine,
> > > > >
> > > > > Thanks for the reply. A couple of more comments below.
> > > > >
> > > > > 12. ListOffset and OffsetForLeader currently don't support flexible
> > > > fields.
> > > > > So, we have to bump up the version number and use IBP at least for
> > > these
> > > > > two requests. Note that it seems 2.7.0 will require IBP anyway
> > because
> > > of
> > > > > changes in KAFKA-10435. Also, it seems that the version for
> > > > > LeaderAndIsrRequest and StopReplica are bumped ev

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-08 Thread Justine Olshan

Hi George,
I've been looking at the discussion on improving the sticky partitioner,
and one of the potential issues we discussed is how we could get
information to the partitioner to tell it not to choose certain partitions.
Currently, the partitioner can only use availablePartitionsForTopic. I took
a quick look at your KIP and it seemed that your KIP would change what
partitions are returned with this method. This seems like a step in the
right direction for solving that issue too.

I agree with Jun that looking at both of these issues and the proposed
solutions would be very helpful.
Justine

On Tue, Dec 8, 2020 at 10:07 AM Jun Rao  wrote:

> Hi, George,
>
> Thanks for submitting the KIP. There was an earlier discussing on improving
> the sticky partitioner in the producer (
>
> https://lists.apache.org/thread.html/rae8d2d5587dae57ad9093a85181e0cb4256f10d1e57138ecdb3ef287%40%3Cdev.kafka.apache.org%3E
> ).
> It seems to be solving a very similar issue. It would be useful to analyze
> both approaches and see which one solves the problem better.
>
> Jun
>
> On Tue, Dec 8, 2020 at 8:05 AM georgeshu(舒国强) 
> wrote:
>
> > Hello,
> >
> > We write up a KIP based on a straightforward mechanism implemented and
> > tested in order to solve a practical issue in production.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors
> > Look forward to hearing feedback and suggestions.
> >
> > Thanks!
> >
> >
>

Re: Sticky Partitioner

2020-12-02 Thread Justine Olshan

Hi Evelyn,

Thanks for taking a look at improving the sticky partitioner! These edge
cases seem like they would cause quite a bit a trouble.
I think the idea to check for max.in.flight.requests.per.connection is a
good one, but one concern I have is how this information will be available
to the partitioner.

Justine

On Mon, Nov 30, 2020 at 7:10 AM Eevee  wrote:

> Hi all,
>
> I've noticed a couple edge cases in the Sticky Partitioner and I'd like
> to discuss introducing a new KIP to fix it.
>
> Behavior
> 1. Low throughput producers
> The first edge case occurs when a broker becomes temporarily unavailable
> for a period less then replica.lag.time.max.ms. If you have a low
> throughput producer generating records without a key and using a small
> value of linger.ms you will quickly hit the
> max.in.flight.requests.per.connection limit for that broker or another
> broker which depends on the unavailable broker to achieve acks=all.
> At this point, all records will be redirected to whichever broker hits
> max.in.flight.requests.per.connection first and if the producer has low
> enough throughput compared to batch.size this will result in no records
> being sent to any broker until the failing broker becomes available
> again. Effectively this transforms a short broker failure into a cluster
> failure. Ideally, we'd rather see all records redirected away from these
> brokers rather then too them. 2. Overwhelmed brokers The second edge
> case occurs when an individual broker begins under performing and cannot
> keep up with the producers. Once the broker hits
> max.in.flight.requests.per.connection the producer will begin to
> redirecting all records without keys to the broker. This results in a
> disproportionate percentage of the cluster load going to the failing
> broker and begins a death spiral in which the broker becomes more and
> more overwhelmed resulting in the producers redirecting more and more of
> the clusters load towards it.Proposed Changes We need a solution which
> fixes the interaction between the back pressure mechanism
> max.in.flight.requests.per.connection and the sticky partitioner.
>
> My current thought is we should remove partitions associated with
> brokers which have hit max.in.flight.requests.per.connection from the
> available choices for the sticky partitioners. Once they are below
> max.in.flight.requests.per.connection they'd then be added back into the
> available partition list.
>
> My one concern is that this could cause further edge case behavior for
> producers with small values of linger.ms. In particular I could see a
> scenario in which the producer hits
> max.in.flight.requests.per.connection for all brokers and then blocks on
> send() until a request returns rather then building up a new batch. It's
> possible (I'd need to investigate the send loop further) the producer
> could create a new batch as soon as a request arrives, add a single
> record to it and immediately send it then block on send() again. This
> would result in the producer doing near to no batching and limiting it's
> throughput drastically.
>
> If this is the case, I figure we can allow the sticky partitioner to use
> all partitions if all brokers are at
> max.in.flight.requests.per.connection. In such a case it would add
> records to a single partition until a request completed or it hit
> batch.size and then picked a new partition at random.
>
> Feedback
> Before writing a KIP I'd love to hear peoples feedback, alternatives and
> concerns.
>
> Regards,
> Evelyn.
>
>
>

Re: Spam

2021-01-05 Thread Justine Olshan

The user has been blocked. https://issues.apache.org/jira/browse/INFRA-21268

On Tue, Jan 5, 2021 at 2:52 PM Brandon Brown 
wrote:

> Is there any way to block Tim van der Kooi from making issues? I’m getting
> about 10 new email issues created a minute.
>
> Brandon Brown
>
>

Re: [VOTE] KIP-516: Topic Identifiers

2021-06-29 Thread Justine Olshan

Hello again. Quick update on KIP-516. After much discussion and thought,
I've updated the KIP to include new admin apis and new/updated classes to
support them. The new APIs are:

default DescribeTopicsResult describeTopics(TopicCollection topics);

DescribeTopicsResult describeTopics(TopicCollection topics,
DescribeTopicsOptions options);

default DeleteTopicsResult deleteTopics(TopicCollection topics);

DeleteTopicsResult deleteTopics(TopicCollection topics,
DeleteTopicsOptions options);


As you may notice, there is a new class TopicCollection that can store
topics by name or ID. This is a new class also described in the KIP.
Finally Delete/DescribeTopicsResult classes have been modified to
support topic IDs as well as name and some of the older methods will
be deprecated. For more information, please check out the
KIP.https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-AdminClientSupport

Thanks,
Justine


On Tue, Mar 30, 2021 at 2:38 PM Justine Olshan  wrote:

> Hi all,
> Another quick update. After some offline discussion with KIP-500 folks,
> I'm making a small tweak to one of the configs in KIP-516.
> Instead of delete.stale.topics.ms, KIP-516 will introduce
> delete.topic.delay.ms which is defined as *"**The minimum amount of time
> to wait before removing a deleted topic's data on every broker."*
> The idea behind this config is to give a configurable window before the
> data is fully deleted and removed from the brokers. This config will apply
> to all topic deletions, not just the "stale topic" case described in
> KIP-516.
>
> Let me know if there are any questions,
> Justine
>
> On Thu, Feb 18, 2021 at 10:16 AM Justine Olshan 
> wrote:
>
>> Hi all,
>> I realized that the DISCUSS thread got very long, so I'll be posting
>> updates to this thread from now on.
>> Just a quick update to the KIP. As a part of
>> https://issues.apache.org/jira/browse/KAFKA-12332 and
>> https://github.com/apache/kafka/pull/10143, I'm proposing adding a new
>> error.
>> INCONSISTENT_TOPIC_ID will be returned on partitions in
>> LeaderAndIsrResponses where the topic ID in the request did not match the
>> topic ID in the log. This will only occur when a valid topic ID is provided
>> in the request.
>>
>> I've also updated the KIP to reflect this change.
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsrRequestv5
>>
>>
>> Please let me know if you have any thoughts or concerns with this change.
>>
>> Thanks,
>> Justine
>>
>> On Mon, Oct 19, 2020 at 8:50 AM Justine Olshan 
>> wrote:
>>
>>> Thanks everyone for the votes. KIP-516 has been accepted.
>>>
>>> Binding: Jun, Rajini, David
>>> Non-binding: Lucas, Satish, Tom
>>>
>>> Justine
>>>
>>> On Sat, Oct 17, 2020 at 3:22 AM Tom Bentley  wrote:
>>>
>>>> +1 non-binding. Thanks!
>>>>
>>>> On Sat, Oct 17, 2020 at 7:55 AM David Jacot 
>>>> wrote:
>>>>
>>>> > Hi Justine,
>>>> >
>>>> > Thanks for the KIP! This is a great and long awaited improvement.
>>>> >
>>>> > +1 (binding)
>>>> >
>>>> > Best,
>>>> > David
>>>> >
>>>> > Le ven. 16 oct. 2020 à 17:36, Rajini Sivaram 
>>>> a
>>>> > écrit :
>>>> >
>>>> > > Hi Justine,
>>>> > >
>>>> > > +1 (binding)
>>>> > >
>>>> > > Thanks for all the work you put into this KIP!
>>>> > >
>>>> > > btw, there is a typo in the DeleteTopics Request/Response schema in
>>>> the
>>>> > > KIP, it says Metadata request.
>>>> > >
>>>> > > Regards,
>>>> > >
>>>> > > Rajini
>>>> > >
>>>> > >
>>>> > > On Fri, Oct 16, 2020 at 4:06 PM Satish Duggana <
>>>> satish.dugg...@gmail.com
>>>> > >
>>>> > > wrote:
>>>> > >
>>>> > > > Hi Justine,
>>>> > > > Thanks for the KIP,  +1 (non-binding)
>>>> > > >
>>>> > > > On Thu, Oct 15, 2020 at 10:48 PM Lucas Bradstreet <
>>>> lu...@confluent.io>
>>>> > > > wrote:
>>>> > > > >
>>>> > > > > Hi Justine,
>>>> > > > >
>>>> &

Re: Requesting to be added to Kafka project

2021-04-30 Thread Justine Olshan

Hi Alyssa,
Are you asking to be added to JIRA? If so, can you provide your jira
username?

Thanks,
Justine

On Fri, Apr 30, 2021 at 9:48 AM Alyssa Huang 
wrote:

> Hello,
>
> I'm interested in contributing to Kafka! Can I be added to the project?
>
> Best,
> Alyssa
>

Re: [VOTE] KIP-516: Topic Identifiers

2021-03-30 Thread Justine Olshan

Hi all,
Another quick update. After some offline discussion with KIP-500 folks, I'm
making a small tweak to one of the configs in KIP-516.
Instead of delete.stale.topics.ms, KIP-516 will introduce
delete.topic.delay.ms which is defined as *"**The minimum amount of time to
wait before removing a deleted topic's data on every broker."*
The idea behind this config is to give a configurable window before the
data is fully deleted and removed from the brokers. This config will apply
to all topic deletions, not just the "stale topic" case described in
KIP-516.

Let me know if there are any questions,
Justine

On Thu, Feb 18, 2021 at 10:16 AM Justine Olshan 
wrote:

> Hi all,
> I realized that the DISCUSS thread got very long, so I'll be posting
> updates to this thread from now on.
> Just a quick update to the KIP. As a part of
> https://issues.apache.org/jira/browse/KAFKA-12332 and
> https://github.com/apache/kafka/pull/10143, I'm proposing adding a new
> error.
> INCONSISTENT_TOPIC_ID will be returned on partitions in
> LeaderAndIsrResponses where the topic ID in the request did not match the
> topic ID in the log. This will only occur when a valid topic ID is provided
> in the request.
>
> I've also updated the KIP to reflect this change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsrRequestv5
>
>
> Please let me know if you have any thoughts or concerns with this change.
>
> Thanks,
> Justine
>
> On Mon, Oct 19, 2020 at 8:50 AM Justine Olshan 
> wrote:
>
>> Thanks everyone for the votes. KIP-516 has been accepted.
>>
>> Binding: Jun, Rajini, David
>> Non-binding: Lucas, Satish, Tom
>>
>> Justine
>>
>> On Sat, Oct 17, 2020 at 3:22 AM Tom Bentley  wrote:
>>
>>> +1 non-binding. Thanks!
>>>
>>> On Sat, Oct 17, 2020 at 7:55 AM David Jacot 
>>> wrote:
>>>
>>> > Hi Justine,
>>> >
>>> > Thanks for the KIP! This is a great and long awaited improvement.
>>> >
>>> > +1 (binding)
>>> >
>>> > Best,
>>> > David
>>> >
>>> > Le ven. 16 oct. 2020 à 17:36, Rajini Sivaram 
>>> a
>>> > écrit :
>>> >
>>> > > Hi Justine,
>>> > >
>>> > > +1 (binding)
>>> > >
>>> > > Thanks for all the work you put into this KIP!
>>> > >
>>> > > btw, there is a typo in the DeleteTopics Request/Response schema in
>>> the
>>> > > KIP, it says Metadata request.
>>> > >
>>> > > Regards,
>>> > >
>>> > > Rajini
>>> > >
>>> > >
>>> > > On Fri, Oct 16, 2020 at 4:06 PM Satish Duggana <
>>> satish.dugg...@gmail.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi Justine,
>>> > > > Thanks for the KIP,  +1 (non-binding)
>>> > > >
>>> > > > On Thu, Oct 15, 2020 at 10:48 PM Lucas Bradstreet <
>>> lu...@confluent.io>
>>> > > > wrote:
>>> > > > >
>>> > > > > Hi Justine,
>>> > > > >
>>> > > > > +1 (non-binding). Thanks for all your hard work on this KIP!
>>> > > > >
>>> > > > > Lucas
>>> > > > >
>>> > > > > On Wed, Oct 14, 2020 at 8:59 AM Jun Rao 
>>> wrote:
>>> > > > >
>>> > > > > > Hi, Justine,
>>> > > > > >
>>> > > > > > Thanks for the updated KIP. +1 from me.
>>> > > > > >
>>> > > > > > Jun
>>> > > > > >
>>> > > > > > On Tue, Oct 13, 2020 at 2:38 PM Jun Rao 
>>> wrote:
>>> > > > > >
>>> > > > > > > Hi, Justine,
>>> > > > > > >
>>> > > > > > > Thanks for starting the vote. Just a few minor comments.
>>> > > > > > >
>>> > > > > > > 1. It seems that we should remove the topic field from the
>>> > > > > > > StopReplicaResponse below?
>>> > > > > > > StopReplica Response (Version: 4) => error_code [topics]
>>> > > > > > >   error_code => INT16
>>> > > > > > > topics => topic topic_id* [pa

Re: [VOTE] KIP-516: Topic Identifiers

2021-02-18 Thread Justine Olshan

Hi all,
I realized that the DISCUSS thread got very long, so I'll be posting
updates to this thread from now on.
Just a quick update to the KIP. As a part of
https://issues.apache.org/jira/browse/KAFKA-12332 and
https://github.com/apache/kafka/pull/10143, I'm proposing adding a new
error.
INCONSISTENT_TOPIC_ID will be returned on partitions in
LeaderAndIsrResponses where the topic ID in the request did not match the
topic ID in the log. This will only occur when a valid topic ID is provided
in the request.

I've also updated the KIP to reflect this change.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsrRequestv5


Please let me know if you have any thoughts or concerns with this change.

Thanks,
Justine

On Mon, Oct 19, 2020 at 8:50 AM Justine Olshan  wrote:

> Thanks everyone for the votes. KIP-516 has been accepted.
>
> Binding: Jun, Rajini, David
> Non-binding: Lucas, Satish, Tom
>
> Justine
>
> On Sat, Oct 17, 2020 at 3:22 AM Tom Bentley  wrote:
>
>> +1 non-binding. Thanks!
>>
>> On Sat, Oct 17, 2020 at 7:55 AM David Jacot 
>> wrote:
>>
>> > Hi Justine,
>> >
>> > Thanks for the KIP! This is a great and long awaited improvement.
>> >
>> > +1 (binding)
>> >
>> > Best,
>> > David
>> >
>> > Le ven. 16 oct. 2020 à 17:36, Rajini Sivaram 
>> a
>> > écrit :
>> >
>> > > Hi Justine,
>> > >
>> > > +1 (binding)
>> > >
>> > > Thanks for all the work you put into this KIP!
>> > >
>> > > btw, there is a typo in the DeleteTopics Request/Response schema in
>> the
>> > > KIP, it says Metadata request.
>> > >
>> > > Regards,
>> > >
>> > > Rajini
>> > >
>> > >
>> > > On Fri, Oct 16, 2020 at 4:06 PM Satish Duggana <
>> satish.dugg...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi Justine,
>> > > > Thanks for the KIP,  +1 (non-binding)
>> > > >
>> > > > On Thu, Oct 15, 2020 at 10:48 PM Lucas Bradstreet <
>> lu...@confluent.io>
>> > > > wrote:
>> > > > >
>> > > > > Hi Justine,
>> > > > >
>> > > > > +1 (non-binding). Thanks for all your hard work on this KIP!
>> > > > >
>> > > > > Lucas
>> > > > >
>> > > > > On Wed, Oct 14, 2020 at 8:59 AM Jun Rao  wrote:
>> > > > >
>> > > > > > Hi, Justine,
>> > > > > >
>> > > > > > Thanks for the updated KIP. +1 from me.
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > > On Tue, Oct 13, 2020 at 2:38 PM Jun Rao 
>> wrote:
>> > > > > >
>> > > > > > > Hi, Justine,
>> > > > > > >
>> > > > > > > Thanks for starting the vote. Just a few minor comments.
>> > > > > > >
>> > > > > > > 1. It seems that we should remove the topic field from the
>> > > > > > > StopReplicaResponse below?
>> > > > > > > StopReplica Response (Version: 4) => error_code [topics]
>> > > > > > >   error_code => INT16
>> > > > > > > topics => topic topic_id* [partitions]
>> > > > > > >
>> > > > > > > 2. "After controller election, upon receiving the result,
>> assign
>> > > the
>> > > > > > > metadata topic its unique topic ID". Will the UUID for the
>> > metadata
>> > > > topic
>> > > > > > > be written to the metadata topic itself?
>> > > > > > >
>> > > > > > > 3. The vote request is designed to support multiple topics,
>> each
>> > of
>> > > > them
>> > > > > > > may require a different sentinel ID. Should we reserve more
>> than
>> > > one
>> > > > > > > sentinel ID for future usage?
>> > > > > > >
>> > > > > > > 4. UUID.randomUUID(): Could we clarify whether this method
>> > returns
>> > > > any
>> > > > > > > sentinel ID? Also, how do we expect the user to use it?
>> > > > > > >
>> > >

Re: [DISCUSS] KIP-804: OfflinePartitionsCount Tagged by Topic

2021-12-06 Thread Justine Olshan

Hi Mason,

Thanks for the KIP. I had a few questions.
Are you saying that we will be keeping the original (untagged) offline
partitions count metric? I was a little confused by the wording in the KIP>

I'm also curious about potential performance impacts. Have you looked into
this?

Thanks,
Justine

On Mon, Dec 6, 2021 at 10:00 AM Mason Legere
 wrote:

> Hey,
>
> Planning to open a vote for this small change tomorrow - haven't heard
> anything yet but open to any feedback.
>
> Best,
> Mason
>
> On Fri, Nov 26, 2021 at 1:54 PM Mason Legere 
> wrote:
>
> > Hi All,
> >
> > I would like to start a discussion for KIP-804
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-804%3A+OfflinePartitionsCount+Tagged+by+Topic>,
> which
> > proposes tagging the offline partition counter metric (managed by the
> > controller) by the topic name of the corresponding offline partition(s).
> >
> > Open to any thoughts and suggestions,
> > Mason
> >
>

Re: [DISCUSS] KIP-794: Strictly Uniform Sticky Partitioner

2021-11-08 Thread Justine Olshan

Hi Artem,
Thanks for working on improving the Sticky Partitioner!

I had a few questions about this portion:

*The batching will continue until either an in-flight batch completes or we
hit the N bytes and move to the next partition.  This way it takes just 5
records to get to batching mode, not 5 x number of partition records, and
the batching mode will stay longer as we'll be batching while waiting for a
request to be completed.  As the production rate accelerates, the logic
will automatically switch to use larger batches to sustain higher
throughput.*

*If one of the brokers has higher latency the records for the partitions
hosted on that broker are going to form larger batches, but it's still
going to be the same *amount records* sent less frequently in larger
batches, the logic automatically adapts to that.*

I was curious about how the logic automatically switches here. It seems
like we are just adding *partitioner.sticky.batch.size *which seems like a
static value. Can you go into more detail about this logic? Or clarify
something I may have missed.

On Mon, Nov 8, 2021 at 1:34 AM Luke Chen  wrote:

> Thanks Artem,
> It's much better now.
> I've got your idea. In KIP-480: Sticky Partitioner, we'll change partition
> (call partitioner) when either 1 of below condition match
> 1. the batch is full
> 2. when linger.ms is up
> But, you are changing the definition, into a
> "partitioner.sticky.batch.size" size is reached.
>
> It'll fix the uneven distribution issue, because we did the sent out size
> calculation in the producer side.
> But it might have another issue that when the producer rate is low, there
> will be some period of time the distribution is not even. Ex:
> tp-1: 12KB
> tp-2: 0KB
> tp-3: 0KB
> tp-4: 0KB
> while the producer is still keeping sending records into tp-1 (because we
> haven't reached the 16KB threshold)
> Maybe the user should set a good value to "partitioner.sticky.batch.size"
> to fix this issue?
>
> Some comment to the KIP:
> 1. This paragraph is a little confusing, because there's no "batch mode" or
> "non-batch mode", right?
>
> > The batching will continue until either an in-flight batch completes or
> we hit the N bytes and move to the next partition.  This way it takes just
> 5 records to get to batching mode, not 5 x number of partition records, and
> the batching mode will stay longer as we'll be batching while waiting for a
> request to be completed.
>
> Even with linger.ms=0, before the sender thread is ready, we're always
> batching (accumulating) records into batches. So I think the "batch mode"
> description is confusing. And that's why I asked you if you have some kind
> of "batch switch" here.
>
> 2. In motivation, you mentioned 1 drawback of current
> UniformStickyPartitioner is "the sticky partitioner doesn't create batches
> as efficiently", because it sent out a batch with only 1 record (under
> linger.ms=0). But I can't tell how you fix this un-efficient issue in the
> proposed solution. I still see we sent 1 record within 1 batch. Could you
> explain more here?
>
> Thank you.
> Luke
>
> On Sat, Nov 6, 2021 at 6:41 AM Artem Livshits
>  wrote:
>
> > Hi Luke,
> >
> > Thank you for your feedback.  I've updated the KIP with your suggestions.
> >
> > 1. Updated with a better example.
> > 2. I removed the reference to ClassicDefaultPartitioner, it was probably
> > confusing.
> > 3. The logic doesn't rely on checking batches, I've updated the proposal
> to
> > make it more explicit.
> > 4. The primary issue (uneven distribution) is described in the linked
> jira,
> > copied an example from jira into the KIP as well.
> >
> > -Artem
> >
> >
> > On Thu, Nov 4, 2021 at 8:34 PM Luke Chen  wrote:
> >
> > > Hi Artem,
> > > Thanks for the KIP! And thanks for reminding me to complete KIP-782,
> > soon.
> > > :)
> > >
> > > Back to the KIP, I have some comments:
> > > 1. You proposed to have a new config: "partitioner.sticky.batch.size",
> > but
> > > I can't see how we're going to use it to make the partitioner better.
> > > Please explain more in KIP (with an example will be better as
> suggestion
> > > (4))
> > > 2. In the "Proposed change" section, you take an example to use
> > > "ClassicDefaultPartitioner", is that referring to the current default
> > > sticky partitioner? I think it'd better you name your proposed
> partition
> > > with a different name for distinguish between the default one and new
> > one.
> > > (Although after implementation, we are going to just use the same name)
> > > 3. So, if my understanding is correct, you're going to have a "batch"
> > > switch, and before the in-flight is full, it's disabled. Otherwise,
> we'll
> > > enable it. Is that right? Sorry, I don't see any advantage of having
> this
> > > batch switch. Could you explain more?
> > > 4. I think it should be more clear if you can have a clear real example
> > in
> > > the motivation section, to describe what issue we faced using current
> > > sticky partitioner. And in proposed

Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-12-07 Thread Justine Olshan

Hi all,
I've filed a bug for an extra map allocation that is used in the fetch
path. https://issues.apache.org/jira/browse/KAFKA-13512
I think it qualifies as a blocker since this path is used pretty frequently
and it looks to be a regression.

I also have a PR open to fix the issue. With this change, the performance
looks much better. https://github.com/apache/kafka/pull/11576
Thanks,
Justine

On Fri, Dec 3, 2021 at 5:29 AM David Jacot 
wrote:

> Hi Rajini,
>
> Interesting bug. The patch seems to be low risk so I suppose that
> it is fine to keep it in 3.1.0.
>
> Thanks,
> David
>
> On Fri, Dec 3, 2021 at 2:26 PM David Jacot  wrote:
> >
> > Hi Colin,
> >
> > Thanks for the heads up. It makes sense to include it in order
> > to keep the KRaft inline with ZK behavior.
> >
> > Thanks,
> > David
> >
> > On Fri, Dec 3, 2021 at 9:44 AM Rajini Sivaram 
> wrote:
> > >
> > > Hi David,
> > >
> > > Sorry, I had completely forgotten about code freeze and merged
> > > https://issues.apache.org/jira/browse/KAFKA-13461 to 3.1 branch
> yesterday.
> > > Can you take a look and see if we want it in 3.1.0? It is not a
> regression
> > > in 3.1, but we see this issue in tests and when it happens, the
> controller
> > > no longer operates as a controller.
> > >
> > > Thank you,
> > >
> > > Rajini
> > >
> > > On Thu, Dec 2, 2021 at 10:56 PM Colin McCabe 
> wrote:
> > >
> > > > Hi David,
> > > >
> > > > We'd like to include "KAFKA-13490: Fix createTopics and
> > > > incrementalAlterConfigs for KRaft mode #11416" in the upcoming
> release.
> > > > This fixes some bugs in how createTopics and incrementalAlterConfigs
> are
> > > > handled by the controller. It is specific to KRaft, so will not
> affect ZK
> > > > mode.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Wed, Nov 24, 2021, at 01:20, David Jacot wrote:
> > > > > Hi Mickael,
> > > > >
> > > > > Thanks for reporting it. It makes sense to include it in the 3.1
> release
> > > > > as well as it is a regression.
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > > On Tue, Nov 23, 2021 at 6:52 PM Mickael Maison <
> mickael.mai...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Hi David,
> > > > >>
> > > > >> Can we also consider
> https://issues.apache.org/jira/browse/KAFKA-13397?
> > > > >> It's essentially a regression but in a very specific case. To hit
> it,
> > > > >> you must be running MirrorMaker in dedicated mode and have
> changed the
> > > > >> separator of the default replication policy.
> > > > >>
> > > > >> Thanks,
> > > > >> Mickael
> > > > >>
> > > > >> On Tue, Nov 23, 2021 at 4:58 PM David Jacot
> 
> > > > wrote:
> > > > >> >
> > > > >> > Hi Ron,
> > > > >> >
> > > > >> > Thank you for reaching out about this. While this is clearly
> not a
> > > > >> > regression, I agree with including it in 3.1 in order to have
> proper
> > > > >> > and correct configuration constraints for KRaft. You can
> proceed.
> > > > >> >
> > > > >> > Cheers,
> > > > >> > David
> > > > >> >
> > > > >> > On Tue, Nov 23, 2021 at 2:55 PM Ron Dagostino <
> rndg...@gmail.com>
> > > > wrote:
> > > > >> > >
> > > > >> > > Hi David.  I would like to nominate
> > > > >> > >
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13456
> > > > >> > > "Tighten KRaft config checks/constraints" as a 3.1.0
> blocker.  The
> > > > >> > > existing configuration constraints/checks related to KRaft
> currently
> > > > >> > > do not eliminate certain illegal configuration combinations.
> The
> > > > >> > > illegal combinations do not cause harm at the moment, but we
> would
> > > > >> > > like to implement constraints in 3.1.0 to catch them while
> KRaft is
> > > > >> > > still in Preview.  We could add these additional checks later
> in
> > > > 3.2.x
> > > > >> > > instead, but we would like to add these as early as possible:
> we
> > > > >> > > expect more people to begin trying KRaft with each subsequent
> > > > release,
> > > > >> > > and it would be best to eliminate as quickly as we can the
> > > > possibility
> > > > >> > > of people using configurations that would need fixing later.
> > > > >> > >
> > > > >> > > A patch is available at
> https://github.com/apache/kafka/pull/11503/
> > > > .
> > > > >> > >
> > > > >> > > Ron
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Nov 23, 2021 at 3:19 AM David Jacot
> > > >  wrote:
> > > > >> > > >
> > > > >> > > > Hi Chris,
> > > > >> > > >
> > > > >> > > > Thanks for reporting both issues. As both are regressions,
> I do
> > > > agree that
> > > > >> > > > they are blockers and that we would fix them for 3.1.
> > > > >> > > >
> > > > >> > > > Cheers,
> > > > >> > > > David
> > > > >> > > >
> > > > >> > > > On Mon, Nov 22, 2021 at 10:50 PM Chris Egerton
> > > > >> > > >  wrote:
> > > > >> > > > >
> > > > >> > > > > Hi David,
> > > > >> > > > >
> > > > >> > > > > I have another blocker to propose. KAFKA-13472 (
> > > > >> > > > > https://issues.apache.org/jira/browse/KAFKA-13472) is
> another
> > > > regression in
> > > > >>

Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-07-25 Thread Justine Olshan

Hi Konstantine,
I've discovered a bug with topic IDs that can be encountered when upgrading
from IBP versions below 2.8.

Since 2.8, when handling leader and isr requests, the request topic IDs are
compared to the log IDs and partitions with inconsistent IDs are skipped.

With a change introduced in 3.0, in some upgrade scenarios, topic IDs for
existing topics will not be assigned to the logs. For topics affected with
the issue, we will not be able to check the topic ID. We could potentially
handle partitions with inconsistent topic IDs incorrectly.

I believe this should be considered a blocker for 3.0. JIRA is here
 and PR to fix the issue
here .

Thanks,
Justine

On Fri, Jul 23, 2021 at 10:51 AM Konstantine Karantasis <
kkaranta...@apache.org> wrote:

> Thanks for the PR and the follow up Sophie.
>
> We can still get this in and there's no risk to do so, given the proposed
> changes.
> Therefore, I agree to cherry-pick to 3.0 since the PR is about to get
> merged.
>
> Konstantine
>
> On Thu, Jul 22, 2021 at 9:12 PM Sophie Blee-Goldman
>  wrote:
>
> > Hey Konstantine,
> >
> > A javadocs ticket of ours was demoted to a non-blocker earlier this week
> > due to lack of action,
> > but I now have a PR ready and under review. It's picking up some
> essential
> > followup that was
> > missed during the implementation of KIP-633 and is pretty essential. I
> > tagged you on the PR,
> > it's technically touching on a few things that aren't just docs, but only
> > to add a handful of checks
> > that already existed on the old APIs and just got missed on the new APIs.
> > Anything beyond that
> > I left as a TODO to follow up on after 3.0.
> >
> > KAFKA-13021  ---
> > https://github.com/apache/kafka/pull/4
> >
> > I think we should be able to get it merged by tomorrow. Assuming we do,
> can
> > I promote it back
> > to blocker status and pick the fix to the 3.0 branch?
> >
> > Thanks!
> > Sophie
> >
> > On Thu, Jul 22, 2021 at 4:29 PM Konstantine Karantasis
> >  wrote:
> >
> > > Thanks for raising this John.
> > >
> > > While we are working to eliminate the existing blockers I think it
> would
> > be
> > > great to use this time in order to test the upgrade path that you
> > mention.
> > >
> > > Before we approve a release candidate (once such a RC is generated) we
> > > should confirm that the upgrade works as expected.
> > > So, I agree with you that this is not an RC generation blocker per se
> but
> > > it's a release blocker overall.
> > >
> > > Konstantine
> > >
> > >
> > > On Thu, Jul 22, 2021 at 4:21 PM John Roesler 
> > wrote:
> > >
> > > > Hello Konstantine,
> > > >
> > > > Someone just called to my attention that KAFKA-12724 had not
> > > > been marked as a 3.0 blocker. We never added 2.8 to the
> > > > Streams upgrade system test suite. This isn't a blocker in
> > > > that it is a problem, but we should make sure that Streams
> > > > is actually upgradable before releasing 3.0.
> > > >
> > > > I'm sorry for the oversight. For what it's worth, I think we
> > > > could proceed with a release candidate while we continue to
> > > > address the missing system test.
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > > > https://issues.apache.org/jira/browse/KAFKA-12724
> > > >
> > > > On Wed, 2021-07-21 at 14:00 -0700, Konstantine Karantasis
> > > > wrote:
> > > > > Thanks for the heads up Colin.
> > > > >
> > > > > KAFKA-13112 seems important and of course relevant to what we ship
> > with
> > > > > 3.0.
> > > > > Same for the test failures captured by KAFKA-13095 and KAFKA-12851.
> > > > Fixing
> > > > > those will increase the stability of our builds.
> > > > >
> > > > > Therefore, considering these tickets as blockers currently makes
> > sense
> > > to
> > > > > me.
> > > > >
> > > > > Konstantine
> > > > >
> > > > >
> > > > > On Wed, Jul 21, 2021 at 11:46 AM Colin McCabe 
> > > > wrote:
> > > > >
> > > > > > Hi Konstantine,
> > > > > >
> > > > > > Thanks for your work on this release! We discovered three blocker
> > > bugs
> > > > > > which are worth bringing up here:
> > > > > >
> > > > > > 1. KAFKA-13112: Controller's committed offset get out of sync
> with
> > > raft
> > > > > > client listener context
> > > > > > 2. KAFKA-13095: TransactionsTest is failing in kraft mode
> > > > > > 3. KAFKA-12851: Flaky Test
> > > > > > RaftEventSimulationTest.canMakeProgressIfMajorityIsReachable
> > > > > >
> > > > > > There are two subtasks for #1 which we are working on. We suspect
> > > that
> > > > #3
> > > > > > has been fixed by a previous fix we made... we're looking into
> it.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > On Mon, Jul 19, 2021, at 20:23, Konstantine Karantasis wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Since last week, we have reached the stage of Code Freeze for
> the
> > > > 3.0.0
> > > > > > > Apache

Re: Apache Kafka 3.6.0 release

2023-09-04 Thread Justine Olshan

Thanks Satish. This is done 

Justine

On Mon, Sep 4, 2023 at 5:16 PM Satish Duggana 
wrote:

> Hey Justine,
> I went through KAFKA-15424 and the PR[1]. It seems there are no
> dependent changes missing in 3.6 branch. They seem to be low risk as
> you mentioned. Please merge it to the 3.6 branch as well.
>
> 1. https://github.com/apache/kafka/pull/14324.
>
> Thanks,
> Satish.
>
> On Tue, 5 Sept 2023 at 05:06, Justine Olshan
>  wrote:
> >
> > Sorry I meant to add the jira as well.
> > https://issues.apache.org/jira/browse/KAFKA-15424
> >
> > Justine
> >
> > On Mon, Sep 4, 2023 at 4:34 PM Justine Olshan 
> wrote:
> >
> > > Hey Satish,
> > >
> > > I was working on adding dynamic configuration for
> > > transaction verification. The PR is approved and ready to merge into
> trunk.
> > > I was thinking I could also add it to 3.6 since it is fairly low risk.
> > > What do you think?
> > >
> > > Justine
> > >
> > > On Sat, Sep 2, 2023 at 6:21 PM Sophie Blee-Goldman <
> ableegold...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Satish! The fix has been merged and cherrypicked to 3.6
> > >>
> > >> On Sat, Sep 2, 2023 at 6:02 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Sophie,
> > >> > Please feel free to add that to 3.6 branch as you say this is a
> minor
> > >> > change and will not cause any regressions.
> > >> >
> > >> > Thanks,
> > >> > Satish.
> > >> >
> > >> > On Sat, 2 Sept 2023 at 08:44, Sophie Blee-Goldman
> > >> >  wrote:
> > >> > >
> > >> > > Hey Satish, someone reported a minor bug in the Streams
> application
> > >> > > shutdown which was a recent regression, though not strictly a new
> one:
> > >> > was
> > >> > > introduced in 3.4 I believe.
> > >> > >
> > >> > > The fix seems to be super lightweight and low-risk so I was
> hoping to
> > >> > slip
> > >> > > it into 3.6 if that's ok with you? They plan to have the patch
> > >> tonight.
> > >> > >
> > >> > > https://issues.apache.org/jira/browse/KAFKA-15429
> > >> > >
> > >> > > On Thu, Aug 31, 2023 at 5:45 PM Satish Duggana <
> > >> satish.dugg...@gmail.com
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks Chris for bringing this issue here and filing the new
> JIRA
> > >> for
> > >> > > > 3.6.0[1]. It seems to be a blocker for 3.6.0.
> > >> > > >
> > >> > > > Please help review https://github.com/apache/kafka/pull/14314
> as
> > >> Chris
> > >> > > > requested.
> > >> > > >
> > >> > > > 1. https://issues.apache.org/jira/browse/KAFKA-15425
> > >> > > >
> > >> > > > ~Satish.
> > >> > > >
> > >> > > > On Fri, 1 Sept 2023 at 03:59, Chris Egerton
>  > >> >
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > Hi all,
> > >> > > > >
> > >> > > > > Quick update: I've filed a separate ticket,
> > >> > > > > https://issues.apache.org/jira/browse/KAFKA-15425, to track
> the
> > >> > behavior
> > >> > > > > change in Admin::listOffsets. For the full history of the
> ticket,
> > >> > it's
> > >> > > > > worth reading the comment thread on the old ticket at
> > >> > > > > https://issues.apache.org/jira/browse/KAFKA-12879.
> > >> > > > >
> > >> > > > > I've also published
> https://github.com/apache/kafka/pull/14314
> > >> as a
> > >> > > > fairly
> > >> > > > > lightweight PR to revert the behavior of Admin::listOffsets
> > >> without
> > >> > also
> > >> > > > > reverting the refactoring to use the internal admin driver
> API.
> > >> Would
> > >> > > > > appreciate a review on that if anyone can spare the cycles.
> > >> > > > >
> > >> > > > &

Re: Apache Kafka 3.6.0 release

2023-09-11 Thread Justine Olshan

Hey Satish,

We just discovered a gap in KIP-890 part 1. We currently don't verify on
txn offset commits, so it is still possible to have hanging transactions on
the consumer offsets partitions.
I've opened a jira to wire the verification in that request.
https://issues.apache.org/jira/browse/KAFKA-15449

This also isn't a regression, but it would be nice to have part 1 fully
complete. I have opened a PR with the fix:
https://github.com/apache/kafka/pull/14370.

I understand if there are concerns about last minute changes to this API
and we can hold off if that makes the most sense.
If we take that route, I think we should still keep verification for the
data partitions since it still provides full protection there and improves
the transactions experience. We will need to call out the gap in the
release notes for consumer offsets partitions

Let me know what you think.
Justine


On Mon, Sep 11, 2023 at 12:29 PM David Arthur
 wrote:

> Another (small) ZK migration issue was identified. This one isn't a
> regression (it has existed since 3.4), but I think it's reasonable to
> include. It's a small configuration check that could potentially save end
> users from some headaches down the line.
>
> https://issues.apache.org/jira/browse/KAFKA-15450
> https://github.com/apache/kafka/pull/14367
>
> I think we can get this one committed to trunk today.
>
> -David
>
>
>
> On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma  wrote:
>
> > Hi Satish,
> >
> > That sounds great. I think we should aim to only allow blockers
> > (regressions, impactful security issues, etc.) on the 3.6 branch until
> > 3.6.0 is out.
> >
> > Ismael
> >
> >
> > On Sat, Sep 9, 2023, 12:20 AM Satish Duggana 
> > wrote:
> >
> > > Hi Ismael,
> > > It looks like we will publish RC0 by 14th Sep.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Fri, 8 Sept 2023 at 19:23, Ismael Juma  wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > Do you have a sense of when we'll publish RC0?
> > > >
> > > > Thanks,
> > > > Ismael
> > > >
> > > > On Fri, Sep 8, 2023 at 6:27 AM David Arthur
> > > >  wrote:
> > > >
> > > > > Quick update on my two blockers: KAFKA-15435 is merged to trunk and
> > > > > cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will
> > > hopefully
> > > > > get it merged today.
> > > > >
> > > > > -David
> > > > >
> > > > > On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko 
> > wrote:
> > > > >
> > > > > > Hi Satish and all,
> > > > > >
> > > > > > I wonder if https://issues.apache.org/jira/browse/KAFKA-14993
> > > should be
> > > > > > included in the 3.6 release plan. I'm thinking that when
> > > implemented, it
> > > > > > would be a small, but still a change in the RSM contract: throw
> an
> > > > > > exception instead of returning an empty InputStream. Maybe it
> > should
> > > be
> > > > > > included right away to save the migration later? What do you
> think?
> > > > > >
> > > > > > Best,
> > > > > > Ivan
> > > > > >
> > > > > > On Fri, Sep 8, 2023, at 02:52, Satish Duggana wrote:
> > > > > > > Hi Jose,
> > > > > > > Thanks for looking into this issue and resolving it with a
> quick
> > > fix.
> > > > > > >
> > > > > > > ~Satish.
> > > > > > >
> > > > > > > On Thu, 7 Sept 2023 at 21:40, José Armando García Sancio
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Satish,
> > > > > > > >
> > > > > > > > On Wed, Sep 6, 2023 at 4:58 PM Satish Duggana <
> > > > > > satish.dugg...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Greg,
> > > > > > > > > It seems https://issues.apache.org/jira/browse/KAFKA-14273
> > has
> > > > > been
> > > > > > > > > there in 3.5.x too.
> > > > > > > >
> > > > > > > > I also agree that it should be a blocker for 3.6.0. It should
> > > have
> > > > > > > > been a blocker for those previous releases. I didn't fix it
> > > because,
> > > > > > > > unfortunately, I wasn't aware of the issue and jira.
> > > > > > > > I'll create a PR with a fix in case the original author
> doesn't
> > > > > > respond in time.
> > > > > > > >
> > > > > > > > Satish, do you agree?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > --
> > > > > > > > -José
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -David
> > > > >
> > >
> >
>
>
> --
> -David
>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-31 Thread Justine Olshan

Hey Calvin,

Thanks for the responses. I think I understood most of it, but had a few
follow up questions

1. For the acks=1 case, I was wondering if there is any way to continue
with the current behavior (ie -- we only need one ack to produce to the log
and consider the request complete.) My understanding is that we can also
consume from such topics at that point.
If users wanted this lower durability could they set min.insync.replicas to
1?

2. For the case where we elect a leader that was unknowingly offline. Say
this replica was the only one in ELR. My understanding is that we would
promote it to ISR and remove it from ELR when it is the leader, but then we
would remove it from ISR and have no brokers in ISR or ELR. From there we
would need to do unclean recovery right?

3. Did we address the case where dynamically min isr is increased?

4. I think my comment was more about confusion on the KIP. It was not clear
to me that the section was describing points if one was done before the
other. But I now see the sentence explaining that. I think I skipped from
"delivery plan" to the bullet points.

Justine

On Thu, Aug 31, 2023 at 4:04 PM Calvin Liu 
wrote:

> Hi Justine
> Thanks for the questions!
>   *a. For my understanding, will we block replication? Or just the high
> watermark advancement?*
>   - The replication will not be blocked. The followers are free to
> replicate messages above the HWM. Only HWM advancement is blocked.
>
>   b. *Also in the acks=1 case, if folks want to continue the previous
> behavior, they also need to set min.insync.replicas to 1, correct?*
>   - If the clients only send ack=1 messages and minISR=2. The HWM behavior
> will only be different when there is 1 replica in the ISR. In this case,
> the min ISR does not do much in the current system. It is kind of a
> trade-off but we think it is ok.
>
>   c. *The KIP seems to suggest that we remove from ELR when we start up
> again and notice we do not have the clean shutdown file. Is there a chance
> we have an offline broker in ELR that had an unclean shutdown that we elect
> as a leader before we get the change to realize the shutdown was unclean?*
> *  - *The controller will only elect an unfenced(online) replica as the
> leader. If a broker has an unclean shutdown, it should register to the
> controller first(where it has to declair whether it is a clean/unclean
> shutdown) and then start to serve broker requests. So
>  1. If the broker has an unclean shutdown before the controller is
> aware that the replica is offline, then the broker can become the leader
> temporarily. But it can't serve any Fetch requests before it registers
> again, and that's when the controller will re-elect a leader.
>  2. If the controller knows the replica is offline(missing heartbeats
> from the broker for a while) before the broker re-registers, the broker
> can't be elected as a leader.
>
> d. *Would this be the case for strictly a smaller min ISR?*
> - Yes, only when we have a smaller min ISR. Once the leader is aware of the
> minISR change, the HWM can advance and make the current ELR obsolete. So
> the controller should clear the ELR if the ISR >= the new min ISR.
>
> e. *I thought we said the above "Last Leader” behavior can’t be maintained
> with an empty ISR and it should be removed."*
> -  As the Kip is a big one, we have to consider delivering it in phases. If
> only the Unclean Recovery is delivered, we do not touch the ISR then the
> ISR behavior will be the same as the current. I am open to the proposal
> that directly starting unclean recovery if the last leader fails. Let's see
> if other folks hope to have more if Unclean Recover delivers first.
>
> On Tue, Aug 29, 2023 at 4:53 PM Justine Olshan
> 
> wrote:
>
> > Hey Calvin,
> >
> > Thanks for the KIP. This will close some of the gaps in leader election!
> I
> > has a few questions:
> >
> > *>* *High Watermark can only advance if the ISR size is larger or equal
> > to min.insync.replicas*.
> >
> > For my understanding, will we block replication? Or just the high
> watermark
> > advancement?
> > Also in the acks=1 case, if folks want to continue the previous behavior,
> > they also need to set min.insync.replicas to 1, correct? It seems like
> this
> > change takes some control away from clients when it comes to durability
> vs
> > availability.
> >
> > *> *
> > *ELR + ISR size will not be dropped below the min ISR unless the
> controller
> > discovers an ELR member has an unclean shutdown. *
> > The KIP seems to suggest that we remove from ELR when we start up again
> and
> > notice we do not have the clean shutdown file. Is there a chance we have
> an

Re: Apache Kafka 3.6.0 release

2023-09-12 Thread Justine Olshan

Thanks Satish. I understand.
Just curious, is this something that could be added to 3.6.1? It would be
nice to say that hanging transactions are fully covered in a 3.6 release.
I'm not as familiar with the rules around minor releases, but adding it
there would give more time to ensure stability.

Thanks,
Justine

On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana 
wrote:

> Hi Justine,
> We can skip this change into 3.6 now as it is not a blocker or
> regression and it involves changes to the API implementation. Let us
> plan to add the gap in the release notes as you mentioned.
>
> Thanks,
> Satish.
>
> On Tue, 12 Sept 2023 at 04:44, Justine Olshan
>  wrote:
> >
> > Hey Satish,
> >
> > We just discovered a gap in KIP-890 part 1. We currently don't verify on
> > txn offset commits, so it is still possible to have hanging transactions
> on
> > the consumer offsets partitions.
> > I've opened a jira to wire the verification in that request.
> > https://issues.apache.org/jira/browse/KAFKA-15449
> >
> > This also isn't a regression, but it would be nice to have part 1 fully
> > complete. I have opened a PR with the fix:
> > https://github.com/apache/kafka/pull/14370.
> >
> > I understand if there are concerns about last minute changes to this API
> > and we can hold off if that makes the most sense.
> > If we take that route, I think we should still keep verification for the
> > data partitions since it still provides full protection there and
> improves
> > the transactions experience. We will need to call out the gap in the
> > release notes for consumer offsets partitions
> >
> > Let me know what you think.
> > Justine
> >
> >
> > On Mon, Sep 11, 2023 at 12:29 PM David Arthur
> >  wrote:
> >
> > > Another (small) ZK migration issue was identified. This one isn't a
> > > regression (it has existed since 3.4), but I think it's reasonable to
> > > include. It's a small configuration check that could potentially save
> end
> > > users from some headaches down the line.
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-15450
> > > https://github.com/apache/kafka/pull/14367
> > >
> > > I think we can get this one committed to trunk today.
> > >
> > > -David
> > >
> > >
> > >
> > > On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma  wrote:
> > >
> > > > Hi Satish,
> > > >
> > > > That sounds great. I think we should aim to only allow blockers
> > > > (regressions, impactful security issues, etc.) on the 3.6 branch
> until
> > > > 3.6.0 is out.
> > > >
> > > > Ismael
> > > >
> > > >
> > > > On Sat, Sep 9, 2023, 12:20 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > > It looks like we will publish RC0 by 14th Sep.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Fri, 8 Sept 2023 at 19:23, Ismael Juma 
> wrote:
> > > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > Do you have a sense of when we'll publish RC0?
> > > > > >
> > > > > > Thanks,
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Sep 8, 2023 at 6:27 AM David Arthur
> > > > > >  wrote:
> > > > > >
> > > > > > > Quick update on my two blockers: KAFKA-15435 is merged to
> trunk and
> > > > > > > cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will
> > > > > hopefully
> > > > > > > get it merged today.
> > > > > > >
> > > > > > > -David
> > > > > > >
> > > > > > > On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Satish and all,
> > > > > > > >
> > > > > > > > I wonder if
> https://issues.apache.org/jira/browse/KAFKA-14993
> > > > > should be
> > > > > > > > included in the 3.6 release plan. I'm thinking that when
> > > > > implemented, it
> > > > > > > > would be a small, but still a change in the RSM contract:
> throw
> > > an
> > > > > > > > exception instead

Re: Apache Kafka 3.6.0 release

2023-09-12 Thread Justine Olshan

It's me again. 
While reviewing the previous PR, it was discovered we had a potential
breaking return code for non-java clients.
This unfortunately seems like a blocker.
https://issues.apache.org/jira/browse/KAFKA-15459

I will be able to get a PR open today.

Apologies for all the noise in the thread,
Justine

On Tue, Sep 12, 2023 at 10:21 AM David Arthur
 wrote:

> Satish,
>
> KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4)
>
> Thanks!
> David
>
> On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma  wrote:
>
> > Justine,
> >
> > Probably best to have the conversation in the JIRA ticket vs the release
> > thread. Generally, we want to only include low risk bug fixes that are
> > fully compatible in patch releases.
> >
> > Ismael
> >
> > On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan
> > 
> > wrote:
> >
> > > Thanks Satish. I understand.
> > > Just curious, is this something that could be added to 3.6.1? It would
> be
> > > nice to say that hanging transactions are fully covered in a 3.6
> release.
> > > I'm not as familiar with the rules around minor releases, but adding it
> > > there would give more time to ensure stability.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Justine,
> > > > We can skip this change into 3.6 now as it is not a blocker or
> > > > regression and it involves changes to the API implementation. Let us
> > > > plan to add the gap in the release notes as you mentioned.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan
> > > >  wrote:
> > > > >
> > > > > Hey Satish,
> > > > >
> > > > > We just discovered a gap in KIP-890 part 1. We currently don't
> verify
> > > on
> > > > > txn offset commits, so it is still possible to have hanging
> > > transactions
> > > > on
> > > > > the consumer offsets partitions.
> > > > > I've opened a jira to wire the verification in that request.
> > > > > https://issues.apache.org/jira/browse/KAFKA-15449
> > > > >
> > > > > This also isn't a regression, but it would be nice to have part 1
> > fully
> > > > > complete. I have opened a PR with the fix:
> > > > > https://github.com/apache/kafka/pull/14370.
> > > > >
> > > > > I understand if there are concerns about last minute changes to
> this
> > > API
> > > > > and we can hold off if that makes the most sense.
> > > > > If we take that route, I think we should still keep verification
> for
> > > the
> > > > > data partitions since it still provides full protection there and
> > > > improves
> > > > > the transactions experience. We will need to call out the gap in
> the
> > > > > release notes for consumer offsets partitions
> > > > >
> > > > > Let me know what you think.
> > > > > Justine
> > > > >
> > > > >
> > > > > On Mon, Sep 11, 2023 at 12:29 PM David Arthur
> > > > >  wrote:
> > > > >
> > > > > > Another (small) ZK migration issue was identified. This one
> isn't a
> > > > > > regression (it has existed since 3.4), but I think it's
> reasonable
> > to
> > > > > > include. It's a small configuration check that could potentially
> > save
> > > > end
> > > > > > users from some headaches down the line.
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/KAFKA-15450
> > > > > > https://github.com/apache/kafka/pull/14367
> > > > > >
> > > > > > I think we can get this one committed to trunk today.
> > > > > >
> > > > > > -David
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma 
> > > wrote:
> > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > That sounds great. I think we should aim to only allow blockers
> > > > > > > (regressions, impactful se

Re: Apache Kafka 3.6.0 release

2023-09-13 Thread Justine Olshan

Hey Satish -- yes, you are correct. KAFKA-15459 only affects 3.6.
PR should be finalized soon.

Thanks,
Justine

On Wed, Sep 13, 2023 at 1:41 AM Federico Valeri 
wrote:

> Hi Satish, this is a small documentation fix about ZK to KRaft
> migration, that we would like to backport to 3.5 and 3.6 branches. Are
> you ok with that?
>
> https://github.com/apache/kafka/pull/14366
>
> On Wed, Sep 13, 2023 at 3:13 AM Satish Duggana 
> wrote:
> >
> > Thanks David for the quick resolution.
> >
> > ~Satish.
> >
> > On Tue, 12 Sept 2023 at 22:51, David Arthur
> >  wrote:
> > >
> > > Satish,
> > >
> > > KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4)
> > >
> > > Thanks!
> > > David
> > >
> > > On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma 
> wrote:
> > >
> > > > Justine,
> > > >
> > > > Probably best to have the conversation in the JIRA ticket vs the
> release
> > > > thread. Generally, we want to only include low risk bug fixes that
> are
> > > > fully compatible in patch releases.
> > > >
> > > > Ismael
> > > >
> > > > On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Thanks Satish. I understand.
> > > > > Just curious, is this something that could be added to 3.6.1? It
> would be
> > > > > nice to say that hanging transactions are fully covered in a 3.6
> release.
> > > > > I'm not as familiar with the rules around minor releases, but
> adding it
> > > > > there would give more time to ensure stability.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana <
> satish.dugg...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > > We can skip this change into 3.6 now as it is not a blocker or
> > > > > > regression and it involves changes to the API implementation.
> Let us
> > > > > > plan to add the gap in the release notes as you mentioned.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hey Satish,
> > > > > > >
> > > > > > > We just discovered a gap in KIP-890 part 1. We currently don't
> verify
> > > > > on
> > > > > > > txn offset commits, so it is still possible to have hanging
> > > > > transactions
> > > > > > on
> > > > > > > the consumer offsets partitions.
> > > > > > > I've opened a jira to wire the verification in that request.
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-15449
> > > > > > >
> > > > > > > This also isn't a regression, but it would be nice to have
> part 1
> > > > fully
> > > > > > > complete. I have opened a PR with the fix:
> > > > > > > https://github.com/apache/kafka/pull/14370.
> > > > > > >
> > > > > > > I understand if there are concerns about last minute changes
> to this
> > > > > API
> > > > > > > and we can hold off if that makes the most sense.
> > > > > > > If we take that route, I think we should still keep
> verification for
> > > > > the
> > > > > > > data partitions since it still provides full protection there
> and
> > > > > > improves
> > > > > > > the transactions experience. We will need to call out the gap
> in the
> > > > > > > release notes for consumer offsets partitions
> > > > > > >
> > > > > > > Let me know what you think.
> > > > > > > Justine
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Sep 11, 2023 at 12:29 PM David Arthur
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Another (small) ZK migration issue was identified. This one
> isn't a
> > > > > > > > regre

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-29 Thread Justine Olshan

Hey Calvin,

Thanks for the KIP. This will close some of the gaps in leader election! I
has a few questions:

*>* *High Watermark can only advance if the ISR size is larger or equal
to min.insync.replicas*.

For my understanding, will we block replication? Or just the high watermark
advancement?
Also in the acks=1 case, if folks want to continue the previous behavior,
they also need to set min.insync.replicas to 1, correct? It seems like this
change takes some control away from clients when it comes to durability vs
availability.

*> *
*ELR + ISR size will not be dropped below the min ISR unless the controller
discovers an ELR member has an unclean shutdown. *
The KIP seems to suggest that we remove from ELR when we start up again and
notice we do not have the clean shutdown file. Is there a chance we have an
offline broker in ELR that had an unclean shutdown that we elect as a
leader before we get the change to realize the shutdown was unclean?
This seems like it could cause some problems. I may have missed how we
avoid this scenario though.

*> When updating the config **min.insync.replicas, *
*if the new min ISR <= current ISR, the ELR will be removed.*Would this be
the case for strictly a smaller min ISR? I suppose if we increase the ISR,
we can't reason about ELR. Can we reason about high water mark in this
case--seems like we will have the broker out of ISR not in ISR or ELR?
(Forgive me if we can't increase min ISR if the increase will put us under
it)

*> Unclean recovery. *

   - *The unclean leader election will be replaced by the unclean recovery.*
   - *unclean.leader.election.enable will only be replaced by
   the unclean.recovery.strategy after ELR is delivered.*
   - *As there is no change to the ISR, the "last known leader" behavior is
   maintained.*

What does "last known leader behavior maintained" mean here? I thought we
said *"*The above “*Last Leader” behavior can’t be maintained with an empty
ISR and it should be removed." *My understanding is once metadata version
is updated we will always take the more thoughtful unclean election process
(ie, inspect the logs)

Overall though, the general KIP is pretty solid. Looking at the rejected
alternatives, it looks like a lot was considered, so it's nice to see the
final proposal.

Justine

On Mon, Aug 14, 2023 at 8:50 AM Calvin Liu 
wrote:

>1. Yes, the new protocol requires 2 things to advance the HWM. a) The
>messages have been replicated to the controller-committed ISR members.
> b)
>The number of ISR members should be at least the min ISR.
>2. With the current protocol, we are not able to select broker 1 as the
>leader. If we first imply we have the new HWM requirement in place, then
>broker 1 is a good candidate to choose. The following part of the KIP
> (ELR)
>part will explain a new mechanism to enable us to choose broker 1.
> Note, if
>both HWM and ELR are in place, broker 1 will be actually elected in T3.
>
>
> On Fri, Aug 11, 2023 at 10:05 AM Jeff Kim 
> wrote:
>
> > Hi Calvin,
> >
> > Thanks for the KIP! I'm still digesting it but I have two questions:
> >
> > > In the scenario raised in the motivation section, the server may
> receive
> > ack=1 messages during T1 and advance High Watermark when the leader
> > is the only one in ISR.
> >
> > To confirm, the current protocol allows advancing the HWM if all brokers
> in
> > the ISR append to their logs (in this case only the leader). And we're
> > proposing
> > to advance the HWM only when  brokers
> > replicate. Is this correct?
> >
> > > Then, if we elect broker 1 as the leader at T4, though we can guarantee
> > the safety of ack=all messages, the High Watermark may move backward
> > which causes further impacts on the consumers.
> >
> > How can broker 1 become the leader if it was ineligible in T3? Or are
> > you referring to broker 2?
> >
> > Thanks,
> > Jeff
> >
> > On Thu, Aug 10, 2023 at 6:48 PM Calvin Liu 
> > wrote:
> >
> > > Hi everyone,
> > > I'd like to discuss a series of enhancement to the replication
> protocol.
> > >
> > > A partition replica can experience local data loss in unclean shutdown
> > > scenarios where unflushed data in the OS page cache is lost - such as
> an
> > > availability zone power outage or a server error. The Kafka replication
> > > protocol is designed to handle these situations by removing such
> replicas
> > > from the ISR and only re-adding them once they have caught up and
> > therefore
> > > recovered any lost data. This prevents replicas that lost an arbitrary
> > log
> > > suffix, which included committed data, from being elected leader.
> > > However, there is a "last replica standing" state which when combined
> > with
> > > a data loss unclean shutdown event can turn a local data loss scenario
> > into
> > > a global data loss scenario, i.e., committed data can be removed from
> all
> > > replicas. When the last replica in the ISR experiences an unclean
> > shutdown
> > > and loses committed data, it

Re: Apache Kafka 3.6.0 release

2023-09-14 Thread Justine Olshan

Hi Satish,
We were able to merge
https://issues.apache.org/jira/browse/KAFKA-15459 yesterday
and pick to 3.6.

Hopefully nothing more from me on this release.

Thanks,
Justine

On Wed, Sep 13, 2023 at 9:51 PM Satish Duggana 
wrote:

> Thanks Luke for the update.
>
> ~Satish.
>
> On Thu, 14 Sept 2023 at 07:29, Luke Chen  wrote:
> >
> > Hi Satish,
> >
> > Since this PR:
> > https://github.com/apache/kafka/pull/14366 only changes the doc, I've
> > backported to 3.6 branch. FYI.
> >
> > Thanks.
> > Luke
> >
> > On Thu, Sep 14, 2023 at 12:15 AM Justine Olshan
> >  wrote:
> >
> > > Hey Satish -- yes, you are correct. KAFKA-15459 only affects 3.6.
> > > PR should be finalized soon.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Wed, Sep 13, 2023 at 1:41 AM Federico Valeri 
> > > wrote:
> > >
> > > > Hi Satish, this is a small documentation fix about ZK to KRaft
> > > > migration, that we would like to backport to 3.5 and 3.6 branches.
> Are
> > > > you ok with that?
> > > >
> > > > https://github.com/apache/kafka/pull/14366
> > > >
> > > > On Wed, Sep 13, 2023 at 3:13 AM Satish Duggana <
> satish.dugg...@gmail.com
> > > >
> > > > wrote:
> > > > >
> > > > > Thanks David for the quick resolution.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Tue, 12 Sept 2023 at 22:51, David Arthur
> > > > >  wrote:
> > > > > >
> > > > > > Satish,
> > > > > >
> > > > > > KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4)
> > > > > >
> > > > > > Thanks!
> > > > > > David
> > > > > >
> > > > > > On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma 
> > > > wrote:
> > > > > >
> > > > > > > Justine,
> > > > > > >
> > > > > > > Probably best to have the conversation in the JIRA ticket vs
> the
> > > > release
> > > > > > > thread. Generally, we want to only include low risk bug fixes
> that
> > > > are
> > > > > > > fully compatible in patch releases.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan
> > > > > > > 
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Satish. I understand.
> > > > > > > > Just curious, is this something that could be added to
> 3.6.1? It
> > > > would be
> > > > > > > > nice to say that hanging transactions are fully covered in a
> 3.6
> > > > release.
> > > > > > > > I'm not as familiar with the rules around minor releases, but
> > > > adding it
> > > > > > > > there would give more time to ensure stability.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana <
> > > > satish.dugg...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > > We can skip this change into 3.6 now as it is not a
> blocker or
> > > > > > > > > regression and it involves changes to the API
> implementation.
> > > > Let us
> > > > > > > > > plan to add the gap in the release notes as you mentioned.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Satish.
> > > > > > > > >
> > > > > > > > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hey Satish,
> > > > > > > > > >
> > > > > > > > > > We just discovered a gap in KIP-890 part 1. We currently
> > > don't
> > > > verify
> > &g

Re: Apache Kafka 3.6.0 release

2023-09-04 Thread Justine Olshan

Hey Satish,

I was working on adding dynamic configuration for transaction verification.
The PR is approved and ready to merge into trunk.
I was thinking I could also add it to 3.6 since it is fairly low risk. What
do you think?

Justine

On Sat, Sep 2, 2023 at 6:21 PM Sophie Blee-Goldman 
wrote:

> Thanks Satish! The fix has been merged and cherrypicked to 3.6
>
> On Sat, Sep 2, 2023 at 6:02 AM Satish Duggana 
> wrote:
>
> > Hi Sophie,
> > Please feel free to add that to 3.6 branch as you say this is a minor
> > change and will not cause any regressions.
> >
> > Thanks,
> > Satish.
> >
> > On Sat, 2 Sept 2023 at 08:44, Sophie Blee-Goldman
> >  wrote:
> > >
> > > Hey Satish, someone reported a minor bug in the Streams application
> > > shutdown which was a recent regression, though not strictly a new one:
> > was
> > > introduced in 3.4 I believe.
> > >
> > > The fix seems to be super lightweight and low-risk so I was hoping to
> > slip
> > > it into 3.6 if that's ok with you? They plan to have the patch tonight.
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-15429
> > >
> > > On Thu, Aug 31, 2023 at 5:45 PM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks Chris for bringing this issue here and filing the new JIRA for
> > > > 3.6.0[1]. It seems to be a blocker for 3.6.0.
> > > >
> > > > Please help review https://github.com/apache/kafka/pull/14314 as
> Chris
> > > > requested.
> > > >
> > > > 1. https://issues.apache.org/jira/browse/KAFKA-15425
> > > >
> > > > ~Satish.
> > > >
> > > > On Fri, 1 Sept 2023 at 03:59, Chris Egerton  >
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Quick update: I've filed a separate ticket,
> > > > > https://issues.apache.org/jira/browse/KAFKA-15425, to track the
> > behavior
> > > > > change in Admin::listOffsets. For the full history of the ticket,
> > it's
> > > > > worth reading the comment thread on the old ticket at
> > > > > https://issues.apache.org/jira/browse/KAFKA-12879.
> > > > >
> > > > > I've also published https://github.com/apache/kafka/pull/14314 as
> a
> > > > fairly
> > > > > lightweight PR to revert the behavior of Admin::listOffsets without
> > also
> > > > > reverting the refactoring to use the internal admin driver API.
> Would
> > > > > appreciate a review on that if anyone can spare the cycles.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Wed, Aug 30, 2023 at 1:01 PM Chris Egerton 
> > wrote:
> > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > Wanted to let you know that KAFKA-12879 (
> > > > > > https://issues.apache.org/jira/browse/KAFKA-12879), a breaking
> > change
> > > > in
> > > > > > Admin::listOffsets, has been reintroduced into the code base.
> > Since we
> > > > > > haven't yet published a release with this change (at least, not
> the
> > > > more
> > > > > > recent instance of it), I was hoping we could treat it as a
> > blocker for
> > > > > > 3.6.0. I'd also like to solicit the input of people familiar with
> > the
> > > > admin
> > > > > > client to weigh in on the Jira ticket about whether we should
> > continue
> > > > to
> > > > > > preserve the current behavior (if the consensus is that we
> should,
> > I'm
> > > > > > happy to file a fix).
> > > > > >
> > > > > > Please let me know if you agree that this qualifies as a
> blocker. I
> > > > plan
> > > > > > on publishing a potential fix sometime this week.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Wed, Aug 30, 2023 at 9:19 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >> Please plan to continue merging pull requ

Re: Apache Kafka 3.6.0 release

2023-09-04 Thread Justine Olshan

Sorry I meant to add the jira as well.
https://issues.apache.org/jira/browse/KAFKA-15424

Justine

On Mon, Sep 4, 2023 at 4:34 PM Justine Olshan  wrote:

> Hey Satish,
>
> I was working on adding dynamic configuration for
> transaction verification. The PR is approved and ready to merge into trunk.
> I was thinking I could also add it to 3.6 since it is fairly low risk.
> What do you think?
>
> Justine
>
> On Sat, Sep 2, 2023 at 6:21 PM Sophie Blee-Goldman 
> wrote:
>
>> Thanks Satish! The fix has been merged and cherrypicked to 3.6
>>
>> On Sat, Sep 2, 2023 at 6:02 AM Satish Duggana 
>> wrote:
>>
>> > Hi Sophie,
>> > Please feel free to add that to 3.6 branch as you say this is a minor
>> > change and will not cause any regressions.
>> >
>> > Thanks,
>> > Satish.
>> >
>> > On Sat, 2 Sept 2023 at 08:44, Sophie Blee-Goldman
>> >  wrote:
>> > >
>> > > Hey Satish, someone reported a minor bug in the Streams application
>> > > shutdown which was a recent regression, though not strictly a new one:
>> > was
>> > > introduced in 3.4 I believe.
>> > >
>> > > The fix seems to be super lightweight and low-risk so I was hoping to
>> > slip
>> > > it into 3.6 if that's ok with you? They plan to have the patch
>> tonight.
>> > >
>> > > https://issues.apache.org/jira/browse/KAFKA-15429
>> > >
>> > > On Thu, Aug 31, 2023 at 5:45 PM Satish Duggana <
>> satish.dugg...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Thanks Chris for bringing this issue here and filing the new JIRA
>> for
>> > > > 3.6.0[1]. It seems to be a blocker for 3.6.0.
>> > > >
>> > > > Please help review https://github.com/apache/kafka/pull/14314 as
>> Chris
>> > > > requested.
>> > > >
>> > > > 1. https://issues.apache.org/jira/browse/KAFKA-15425
>> > > >
>> > > > ~Satish.
>> > > >
>> > > > On Fri, 1 Sept 2023 at 03:59, Chris Egerton > >
>> > > > wrote:
>> > > > >
>> > > > > Hi all,
>> > > > >
>> > > > > Quick update: I've filed a separate ticket,
>> > > > > https://issues.apache.org/jira/browse/KAFKA-15425, to track the
>> > behavior
>> > > > > change in Admin::listOffsets. For the full history of the ticket,
>> > it's
>> > > > > worth reading the comment thread on the old ticket at
>> > > > > https://issues.apache.org/jira/browse/KAFKA-12879.
>> > > > >
>> > > > > I've also published https://github.com/apache/kafka/pull/14314
>> as a
>> > > > fairly
>> > > > > lightweight PR to revert the behavior of Admin::listOffsets
>> without
>> > also
>> > > > > reverting the refactoring to use the internal admin driver API.
>> Would
>> > > > > appreciate a review on that if anyone can spare the cycles.
>> > > > >
>> > > > > Cheers,
>> > > > >
>> > > > > Chris
>> > > > >
>> > > > > On Wed, Aug 30, 2023 at 1:01 PM Chris Egerton 
>> > wrote:
>> > > > >
>> > > > > > Hi Satish,
>> > > > > >
>> > > > > > Wanted to let you know that KAFKA-12879 (
>> > > > > > https://issues.apache.org/jira/browse/KAFKA-12879), a breaking
>> > change
>> > > > in
>> > > > > > Admin::listOffsets, has been reintroduced into the code base.
>> > Since we
>> > > > > > haven't yet published a release with this change (at least, not
>> the
>> > > > more
>> > > > > > recent instance of it), I was hoping we could treat it as a
>> > blocker for
>> > > > > > 3.6.0. I'd also like to solicit the input of people familiar
>> with
>> > the
>> > > > admin
>> > > > > > client to weigh in on the Jira ticket about whether we should
>> > continue
>> > > > to
>> > > > > > preserve the current behavior (if the consensus is that we
>> should,
>> > I'm
>> > > > > > happy to file a fix).
>> > > > > >
>> > > > > > Please let me know if you agree that this q

Re: ACCESS to Apache Pony Mail

2023-11-01 Thread Justine Olshan

If you would like to read any historical conversation you can do so from
the archive here: https://lists.apache.org/list.html?dev@kafka.apache.org

As Josep said, in order to reply, you can use your own client without
logging in.
Hope this helps!

Justine



On Wed, Nov 1, 2023 at 10:01 AM Josep Prat 
wrote:

> Hi Arpit,
>
> Pony Mail can be seen as the archive of the mailing list. We usually share
> these links because they are always accessible.
>
> That being said, if you want to reply to an email that you can only find on
> Pony Mail (maybe because the mail was sent before you subscribed or because
> you deleted the email), there is a button with a pen icon that lets you
> reply with your preferred mail client.
>
> Best,
>
> ———
> Josep Prat
>
> Aiven Deutschland GmbH
>
> Alexanderufer 3-7, 10117 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> m: +491715557497
>
> w: aiven.io
>
> e: josep.p...@aiven.io
>
> On Wed, Nov 1, 2023, 16:42 Arpit Goyal  wrote:
>
> > Thanks Joseph for providing detailed information.
> > I recently started contributing in the Kafka project and i observe
> > developers shares pony mail link for discussion around the design.How
> could
> > i be part of the thread and able to  share the opinion around  the
> design.
> >
> > On Wed, Nov 1, 2023, 17:22 Josep Prat 
> wrote:
> >
> > > Hi Arpit,
> > >
> > > By committer it is meant a person with write access to an ASF project
> > (for
> > > example Apache Kafka). Towards the end of this page you can see what
> > needs
> > > to be done to become a committer:
> > > https://kafka.apache.org/contributing.html
> > > .
> > > Committership happens on invite basis and it's done by the merits
> > described
> > > in the link above.
> > >
> > > Best,
> > >
> > > ———
> > > Josep Prat
> > >
> > > Aiven Deutschland GmbH
> > >
> > > Alexanderufer 3-7, 10117 Berlin
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > m: +491715557497
> > >
> > > w: aiven.io
> > >
> > > e: josep.p...@aiven.io
> > >
> > > On Wed, Nov 1, 2023, 04:02 Arpit Goyal 
> wrote:
> > >
> > > > I am already a committer of Apache Kafka.
> > > > On Wed, Nov 1, 2023, 05:18 Matthias J. Sax  wrote:
> > > >
> > > > > Only committers can login using their ASF account.
> > > > >
> > > > > -Matthias
> > > > >
> > > > > On 10/30/23 10:19 PM, Arpit Goyal wrote:
> > > > > > Hi
> > > > > > Can anyone help me provide access to Apache Pony Mail. I tried
> > login
> > > > > using
> > > > > > the jira credential but it didn't work.
> > > > > > Thanks and Regards
> > > > > > Arpit Goyal
> > > > > > 8861094754
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Apache Kafka 3.7.0 Release

2023-11-02 Thread Justine Olshan

This makes sense to me. Thanks for following up, Stan.

On Thu, Nov 2, 2023 at 7:02 AM Stanislav Kozlovski
 wrote:

> Hi all,
>
> Given the discussion here and the lack of any pushback, I have changed the
> dates of the release:
> - KIP Freeze - *November 22 *(moved 4 days later)
> - Feature Freeze - *December 6 *(moved 2 days earlier)
> - Code Freeze - *December 20*
>
> If anyone has any thoughts against this proposal - please let me know! It
> would be good to settle on this early. These will be the dates we're going
> with
>
> Best,
> Stanislav
>
> On Thu, Oct 26, 2023 at 12:15 AM Sophie Blee-Goldman <
> sop...@responsive.dev>
> wrote:
>
> > Thanks for the response and explanations -- I think the main question for
> > me
> > was whether we intended to permanently increase the KF -- FF gap from the
> > historical 1 week to 3 weeks? Maybe this was a conscious decision and I
> > just
> >  missed the memo, hopefully someone else can chime in here. I'm all for
> > additional though. And looking around at some of the recent releases, it
> > seems like we haven't been consistently following the "usual" schedule
> > since
> > the 2.x releases.
> >
> > Anyways, my main concern was making sure to leave a full 2 weeks between
> > feature freeze and code freeze, so I'm generally happy with the new
> > proposal.
> > Although I would still prefer to have the KIP freeze fall on a Wednesday
> --
> > Ismael actually brought up the same thing during the 3.5.0 release
> > planning,
> > so I'll just refer to his explanation for this:
> >
> > We typically choose a Wednesday for the various freeze dates - there are
> > > often 1-2 day slips and it's better if that doesn't require people
> > > working through the weekend.
> > >
> >
> > (From this mailing list thread
> > )
> >
> > Thanks for driving the release!
> > Sophie
> >
> > On Wed, Oct 25, 2023 at 8:13 AM Stanislav Kozlovski
> >  wrote:
> >
> > > Thanks for the thorough response, Sophie.
> > >
> > > - Added to the "Future Release Plan"
> > >
> > > > 1. Why is the KIP freeze deadline on a Saturday?
> > >
> > > It was simply added as a starting point - around 30 days from the
> > > announcement. We can move it earlier to the 15th of November, but my
> > > thinking is later is better with these things - it's already aggressive
> > > enough. e.g given the choice of Nov 15 vs Nov 18, I don't necessarily
> > see a
> > > strong reason to choose 15.
> > >
> > > If people feel strongly about this, to make up for this, we can eat
> into
> > > the KF-FF time as I'll touch upon later, and move FF a few days earlier
> > to
> > > land on a Wednesday.
> > >
> > > This reduces the time one has to get their feature complete after KF,
> but
> > > allows for longer time to a KIP accepted, so the KF-FF gap can be made
> up
> > > when developing the feature in parallel.
> > >
> > > > , this makes it easy for everyone to remember when the next deadline
> is
> > > so they can make sure to get everything in on time. I worry that
> varying
> > > this will catch people off guard.
> > >
> > > I don't see much value in optimizing the dates for ease of memory -
> > besides
> > > the KIP Freeze (which is the base date), there are only two more dates
> to
> > > remember that are on the wiki. More importantly, we have a plethora of
> > > tools that can be used to set up reminders - so a contributor doesn't
> > > necessarily need to remember anything if they're serious about getting
> > > their feature in.
> > >
> > > > 3. Is there a particular reason for having the feature freeze almost
> a
> > > full 3 weeks from the KIP freeze? ... having 3 weeks between the KIP
> and
> > > feature freeze (which are
> > > usually separated by just a single week)?
> > >
> > > I was going off the last two releases, which had *20 days* (~3 weeks)
> in
> > > between KF & FF. Here are their dates:
> > >
> > > - AK 3.5
> > >   - KF: 22 March
> > >   - FF: 12 April
> > > - (20 days after)
> > >   - CF: 26 April
> > > - (14 days after)
> > >   - Release: 15 June
> > >  - 50 days after CF
> > > - AK 3.6
> > >   - KF: 26 July
> > >   - FF: 16 Aug
> > > - (20 days after)
> > >   - CF: 30 Aug
> > > - (14 days after)
> > >   - Release: 11 October
> > > - 42 days after CF
> > >
> > > I don't know the precise reasoning for extending the time, nor what is
> > the
> > > most appropriate time - but having talked offline to some folks prior
> to
> > > this discussion, it seemed reasonable.
> > >
> > > Your proposal uses an aggressive 1-week gap between both, which is
> quite
> > > the jump from the previous 3 weeks.
> > >
> > > Perhaps someone with more direct experience in the recent can chime in
> > > here. Both for the reasoning for the extension from 1w to 3w in the
> last
> > 2
> > > releases, and how they feel about reducing this range.
> > >
> > > > 4. On the other hand, we usually have a full two weeks from the
> feature
> > > freeze

Re: [DISCUSS] How to detect (and prevent) complex bugs in Kafka?

2023-10-24 Thread Justine Olshan

Hey Colin,

For context on this specific issue, we have opened a JIRA to consider
thread safety in the future. Another option is documentation or to make
thread local.
Don't want to detract too much from this conversation, but did want to say
there is a JIRA to discuss the buffer specific problem.
https://issues.apache.org/jira/browse/KAFKA-15674

Thanks all,
Justine

On Tue, Oct 24, 2023 at 12:16 PM Colin McCabe  wrote:

> Hi Divij,
>
> I've worked on several projects that had a "debug mode." It was something
> that a lot of old-fashioned C and C++ projects would do. Usually
> implemented through an ASSERT macro or similar that was defined away when
> in "production mode"
>
> I didn't like this back then, and still don't like it. If the assertion
> isn't expensive, you should just do it all the time. If the assertion is
> expensive, then you should do it in a test rather than when running.
> Because an expensive operation will change the timings of a distributed
> system, and make your "debug mode server" perform quite differently than
> the "real production server."
>
> Another issue is that, based on my experience, people often did stuff in
> the assert blocks that would change other things in the system. Since code
> in C/C++ (and also Java) can have side effects, it's easy to accidentally
> change things with your verification code.
>
> It sounds like concretely you hit a race condition with the
> non-thread-safe buffer pool code. It would be good to think about how we
> could avoid this in the future, but I don't think "debug mode" is the
> answer. Instead, it might be better to take another look at how we're doing
> buffer pooling to see if we can simplify. Why are we passing a
> non-thread-safe object between threads in the first place? Should this be
> documented better, or better yet, avoided? Why not use a thread-local
> instead to make this all so much simpler? etc.
>
> best,
> Colin
>
> On Tue, Oct 24, 2023, at 02:32, Divij Vaidya wrote:
> > Hey folks
> >
> > We recently came across a bug [1] which was very hard to detect during
> > testing and easy to introduce during development. I would like to kick
> > start a discussion on potential ways which could avoid this category of
> > bugs in Apache Kafka.
> >
> > I think we might want to start working towards a "debug" mode in the
> broker
> > which will enable assertions for different invariants in Kafka.
> Invariants
> > could be derived from formal verification that Jack [2] and others have
> > shared with the community earlier AND from tribal knowledge in the
> > community such as network threads should not perform any storage IO,
> files
> > should not fsync in critical product path, metric gauges should not
> acquire
> > a lock etc. The release qualification  process (system tests +
> integration
> > tests) will run the broker in "debug" mode and will validate these
> > assertions while testing the system in different scenarios. The
> inspiration
> > for this idea is derived from Marc Brooker's post at
> > https://brooker.co.za/blog/2023/07/28/ds-testing.html
> >
> > Your thoughts on this topic are welcome! Also, please feel free to take
> > this idea forward and draft a KIP for a more formal discussion.
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > [2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j
> >
> > --
> > Divij Vaidya
>

Re: UncleanLeaderElectionsPerSec metric and Raft

2023-10-24 Thread Justine Olshan

Hey folks,
Thanks for replying. If we could file a JIRA to track this work, that would
be great.

Thanks,
Justine

On Tue, Oct 24, 2023 at 11:55 AM Colin McCabe  wrote:

> Hi Neil,
>
> Yes, I think we should probably report the UncleanLeaderElectionsPerSec
> metric in KRaft. We don't have it currently.
>
> We do have the concept of unclean leader election in KRaft, but it has to
> be triggered by the leader election tool currently. We've been talking
> about adding configuration-based unclean leader election as part of the
> KIP-966 work.
>
> best,
> Colin
>
>
> On Wed, Oct 18, 2023, at 10:27, Neil Buesing wrote:
> > Development,
> >
> > with Raft controllers, is the unclean leader election / sec metric supose
> > to be available?
> >
> > kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
> >
> > Nothing in documentation indicates that it isn’t as well as in code
> > navigation nothing indicates to me that it wouldn’t show up, but even
> added
> > unclean leader election to true for both brokers and controllers and
> > nothing.
> >
> > (set this for all controllers and brokers)
> >   KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE: true
> >
> > Happy to report a Jira, but wanted to figure out if the bug was in the
> > documentation or the metric not being available?
> >
> > Thanks,
> >
> > Neil
> >
> > P.S. I did confirm that others have seen and wondered about this,
> > https://github.com/strimzi/strimzi-kafka-operator/issues/8169, but that
> is
> > about the only other report on this I have found.
>

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-24 Thread Justine Olshan

Hi Satish,

I've done the following:
- Verified signature
- Built from Java 17/Scala 2.13 and Java 8/Scala 2.11
- Run unit + integration tests
- Ran a shorter Trogdor transactional-produce-bench on a single broker
cluster (KRaft and ZK) to verify transactional workloads worked reasonably

Minor thing (we can discuss elsewhere and is non-blocking for the release)
but if ZK has been deprecated since 3.5 we should move up the Kraft setup
in the quickstart guide  here
.

+1 (binding) from me.

Justine

On Sun, Sep 24, 2023 at 7:09 AM Federico Valeri 
wrote:

> Hi Satish, I did the following to verify the release:
>
> - Verified signature and checksum
> - Built from source with Java 17 and Scala 2.13
> - Ran all unit and integration tests
> - Spot checked release notes and documentation
> - Ran a custom client using staging artifacts on a 3-nodes cluster
> - Tested tiered storage with one of the available RSM implementations
>
> +1 (non binding)
>
> Thanks
> Fede
>
>
> On Sun, Sep 24, 2023 at 8:49 AM Luke Chen  wrote:
> >
> > Hi Satish,
> >
> > I verified with:
> > 1. Ran quick start in KRaft for scala 2.12 artifact
> > 2. Making sure the checksum are correct
> > 3. Browsing release notes, documents, javadocs, protocols.
> >
> > I filed KAFKA-15491  >for
> > log output improvement while testing stream application.
> > It won't be blocker in v3.6.0.
> >
> > For KAFKA-15489 , I'm
> > fine if we decide to fix it in v3.6.1/v3.7.0.
> >
> > +1 (binding) from me.
> >
> > Thank you.
> > Luke
> >
> > On Sun, Sep 24, 2023 at 3:38 AM Ismael Juma  wrote:
> >
> > > Given that this is not a regression and there have been no reports for
> over
> > > a year, I think it's ok for this to land in 3.6.1.
> > >
> > > Ismael
> > >
> > > On Sat, Sep 23, 2023 at 9:32 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Luke for reporting KRaft issue[1].
> > > >
> > > > I am not sure whether it is a release blocker for 3.6.0. Need input
> > > > from other KRaft experts also to finalize the decision. Even if we
> > > > adopt a fix, do not we need to bake it for some time before it is
> > > > pushed to production to avoid any regressions as this change is in
> the
> > > > critical paths?
> > > >
> > > > 1. https://issues.apache.org/jira/browse/KAFKA-15489
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Sat, 23 Sept 2023 at 03:08, Luke Chen  wrote:
> > > > >
> > > > > Hi Satish,
> > > > >
> > > > > I found the current KRaft implementation will have "split brain"
> issue
> > > > when
> > > > > network partition happens, which will cause inconsistent metadata
> > > > returned
> > > > > from the controller.
> > > > > Filed KAFKA-15489 <
> https://issues.apache.org/jira/browse/KAFKA-15489>
> > > > for
> > > > > this issue, and PR  is
> > > ready
> > > > > for review.
> > > > >
> > > > > Even though this is not a regression issue (this has already
> existed
> > > > since
> > > > > the 1st release of KRaft feature), I think this is an important
> issue
> > > > since
> > > > > KRaft is announced production ready.
> > > > > Not sure what other people's thoughts are.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Thu, Sep 21, 2023 at 6:33 PM Josep Prat
>  > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > I ran the following validation steps:
> > > > > > - Built from source with Java 11 and Scala 2.13
> > > > > > - Verified Signatures and hashes of the artifacts generated
> > > > > > - Navigated through Javadoc including links to JDK classes
> > > > > > - Run the unit tests
> > > > > > - Run integration tests
> > > > > > - Run the quickstart in KRaft and Zookeeper mode
> > > > > >
> > > > > >
> > > > > > I +1 this release (non-binding)
> > > > > >
> > > > > > Thanks for your efforts!
> > > > > >
> > > > > > On Thu, Sep 21, 2023 at 2:59 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Greg for verifying the release including the earlier
> > > > > > > blocker(KAFKA-15473) verification.
> > > > > > >
> > > > > > > ~Satish.
> > > > > > >
> > > > > > > On Wed, 20 Sept 2023 at 22:30, Greg Harris
> > > >  > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I verified the functionality of KIP-898 and the recent fix
> for
> > > > > > > > KAFKA-15473 with the following steps:
> > > > > > > >
> > > > > > > > 1. I started a 3.5.1 broker, and a 3.5.1 worker with most
> (>400)
> > > > > > > > publicly available plugins installed
> > > > > > > > 2. I captured the output of /connector-plugins
> > > > > > > > 3. I upgraded the worker to 3.6.0-rc1
> > > > > > > > 4. I captured the output of /connector-plugins with various
> > > > settings
> > > > > > > > of

Re: [VOTE] 3.6.0 RC2

2023-10-02 Thread Justine Olshan

Hey all -- I noticed we still have the system tests as something that will
be updated. Did we get a run for this RC?

On Mon, Oct 2, 2023 at 1:24 PM Bill Bejeck  wrote:

> Hi Satish,
>
> Thanks for running the release.
> I performed the following steps:
>
>- Validated all the checksums, signatures, and keys
>- Built the release from source
>- Ran all unit tests
>- Quick start validations
>   - ZK and Kraft
>   - Connect
>   - Kafka Streams
>- Spot checked java docs and documentation
>
> +1 (binding)
>
> - Bill
>
> On Mon, Oct 2, 2023 at 10:23 AM Proven Provenzano
>  wrote:
>
> > Hi,
> >
> > To verify the release of release 3.6.0 RC2 I did the following:
> >
> >- Downloaded the source, built and ran the tests.
> >- Validated SCRAM with KRaft including creating credentials with
> >kafka-storage.
> >- Validated Delegation Tokens with KRaft
> >
> > +1 (non-binding)
> >
> > --Proven
> >
> >
> >
> > On Mon, Oct 2, 2023 at 8:37 AM Divij Vaidya 
> > wrote:
> >
> > > + 1 (non-binding)
> > >
> > > Verifications:
> > > 1. I ran a produce-consume workload with plaintext auth, JDK17, zstd
> > > compression using an open messaging benchmark and found 3.6 to be
> better
> > > than or equal to 3.5.1 across all dimensions. Notably, 3.6 had
> > consistently
> > > 6-7% lower CPU utilization, lesser spikes on P99 produce latencies and
> > > overall lower P99.8 latencies.
> > >
> > > 2. I have verified that detached signature is correct using
> > > https://www.apache.org/info/verification.html and the release manager
> > > public keys are available at
> > > https://keys.openpgp.org/search?q=F65DC3423D4CD7B9
> > >
> > > 3. I have verified that all metrics emitted in 3.5.1 (with Zk) are also
> > > being emitted in 3.6.0 (with Zk).
> > >
> > > Problems (but not blockers):
> > > 1. Metrics added in
> > >
> > >
> >
> https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e
> > > aren't available in the documentation (cc: Justine). I don't consider
> > this
> > > as a release blocker but we should add it as a fast follow-up.
> > >
> > > 2. Metric added in
> > >
> > >
> >
> https://github.com/apache/kafka/commit/a900794ace4dcf1f9dadee27fbd8b63979532a18
> > > isn't available in documentation (cc: David). I don't consider this as
> a
> > > release blocker but we should add it as a fast follow-up.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Mon, Oct 2, 2023 at 9:50 AM Federico Valeri 
> > > wrote:
> > >
> > > > Hi Satish, I did the following to verify the release:
> > > >
> > > > - Built from source with Java 17 and Scala 2.13
> > > > - Ran all unit and integration tests
> > > > - Spot checked documentation
> > > > - Ran custom client applications using staging artifacts on a 3-nodes
> > > > cluster
> > > > - Tested tiered storage with one of the available RSM implementations
> > > >
> > > > +1 (non binding)
> > > >
> > > > Thanks
> > > > Fede
> > > >
> > > > On Mon, Oct 2, 2023 at 8:50 AM Luke Chen  wrote:
> > > > >
> > > > > Hi Satish,
> > > > >
> > > > > I verified with:
> > > > > 1. Ran quick start in KRaft for scala 2.12 artifact
> > > > > 2. Making sure the checksum are correct
> > > > > 3. Browsing release notes, documents, javadocs, protocols.
> > > > > 4. Verified the tiered storage feature works well.
> > > > >
> > > > > +1 (binding).
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 2, 2023 at 5:23 AM Jakub Scholz 
> wrote:
> > > > >
> > > > > > +1 (non-binding). I used the Scala 2.13 binaries and the staged
> > Maven
> > > > > > artifacts and run my tests. Everything seems to work fine for me.
> > > > > >
> > > > > > Thanks
> > > > > > Jakub
> > > > > >
> > > > > > On Fri, Sep 29, 2023 at 8:17 PM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Kafka users, developers and client-developers,
> > > > > > >
> > > > > > > This is the third candidate for the release of Apache Kafka
> > 3.6.0.
> > > > > > > Some of the major features include:
> > > > > > >
> > > > > > > * KIP-405 : Kafka Tiered Storage
> > > > > > > * KIP-868 : KRaft Metadata Transactions
> > > > > > > * KIP-875: First-class offsets support in Kafka Connect
> > > > > > > * KIP-898: Modernize Connect plugin discovery
> > > > > > > * KIP-938: Add more metrics for measuring KRaft performance
> > > > > > > * KIP-902: Upgrade Zookeeper to 3.8.1
> > > > > > > * KIP-917: Additional custom metadata for remote log segment
> > > > > > >
> > > > > > > Release notes for the 3.6.0 release:
> > > > > > >
> > > https://home.apache.org/~satishd/kafka-3.6.0-rc2/RELEASE_NOTES.html
> > > > > > >
> > > > > > > *** Please download, test and vote by Tuesday, October 3, 12pm
> PT
> > > > > > >
> > > > > > > Kafka's KEYS file containing PGP keys we use to sign the
> release:
> > > > > > > https://kafka.apache.org/KEYS
> > > > > > >
> > > > > > > * Release artifacts to be voted upon (source and binary):
>

Re: [VOTE] 3.6.0 RC2

2023-10-02 Thread Justine Olshan

I realized Luke shared the results here for RC1
https://drive.google.com/drive/folders/1S2XYd79f6_AeWj9f9qEkliRg7JtL04AC
Given we had some runs that looked reasonable, and we made a small change,
I'm ok with this. But I wouldn't be upset if we had another set of runs :)

As for the validation:

   - I've compiled from source with java 17, 2.13, run the transactional
   produce bench
   - Run unit tests
   - Validated the checksums
   - Downloaded and ran the 2.12 version of the release
   - Briefly took a look at the documentation
   - I was browsing through the site html files and I noticed the html for
   documentation.html seemed to be for 3.4. Not sure if this is a blocker, but
   wanted to flag it. This seems to be the case for the previous release
   candidates as well. (As well as 3.5 release it seems)


I will hold off on voting until we figure that part out. I will also follow
up with the documentation Divij mentioned outside this thread.

Thanks,
Justine

On Mon, Oct 2, 2023 at 3:05 PM Greg Harris 
wrote:

> Hey Satish,
>
> I verified KIP-898 functionality and the KAFKA-15473 patch.
> +1 (non-binding)
>
> Thanks!
>
> On Mon, Oct 2, 2023 at 1:28 PM Justine Olshan
>  wrote:
> >
> > Hey all -- I noticed we still have the system tests as something that
> will
> > be updated. Did we get a run for this RC?
> >
> > On Mon, Oct 2, 2023 at 1:24 PM Bill Bejeck  wrote:
> >
> > > Hi Satish,
> > >
> > > Thanks for running the release.
> > > I performed the following steps:
> > >
> > >- Validated all the checksums, signatures, and keys
> > >- Built the release from source
> > >- Ran all unit tests
> > >- Quick start validations
> > >   - ZK and Kraft
> > >   - Connect
> > >   - Kafka Streams
> > >- Spot checked java docs and documentation
> > >
> > > +1 (binding)
> > >
> > > - Bill
> > >
> > > On Mon, Oct 2, 2023 at 10:23 AM Proven Provenzano
> > >  wrote:
> > >
> > > > Hi,
> > > >
> > > > To verify the release of release 3.6.0 RC2 I did the following:
> > > >
> > > >- Downloaded the source, built and ran the tests.
> > > >- Validated SCRAM with KRaft including creating credentials with
> > > >kafka-storage.
> > > >- Validated Delegation Tokens with KRaft
> > > >
> > > > +1 (non-binding)
> > > >
> > > > --Proven
> > > >
> > > >
> > > >
> > > > On Mon, Oct 2, 2023 at 8:37 AM Divij Vaidya  >
> > > > wrote:
> > > >
> > > > > + 1 (non-binding)
> > > > >
> > > > > Verifications:
> > > > > 1. I ran a produce-consume workload with plaintext auth, JDK17,
> zstd
> > > > > compression using an open messaging benchmark and found 3.6 to be
> > > better
> > > > > than or equal to 3.5.1 across all dimensions. Notably, 3.6 had
> > > > consistently
> > > > > 6-7% lower CPU utilization, lesser spikes on P99 produce latencies
> and
> > > > > overall lower P99.8 latencies.
> > > > >
> > > > > 2. I have verified that detached signature is correct using
> > > > > https://www.apache.org/info/verification.html and the release
> manager
> > > > > public keys are available at
> > > > > https://keys.openpgp.org/search?q=F65DC3423D4CD7B9
> > > > >
> > > > > 3. I have verified that all metrics emitted in 3.5.1 (with Zk) are
> also
> > > > > being emitted in 3.6.0 (with Zk).
> > > > >
> > > > > Problems (but not blockers):
> > > > > 1. Metrics added in
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e
> > > > > aren't available in the documentation (cc: Justine). I don't
> consider
> > > > this
> > > > > as a release blocker but we should add it as a fast follow-up.
> > > > >
> > > > > 2. Metric added in
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/kafka/commit/a900794ace4dcf1f9dadee27fbd8b63979532a18
> > > > > isn't available in documentation (cc: David). I don't consider
> this as
> > > a
> > > > > release blocker but we should add it as a fast follow-up.
> > > > >
> > > > > -

Re: [VOTE] 3.6.0 RC2

2023-10-03 Thread Justine Olshan

Thanks folks for following up. Given my previous testing and the results
you've provided, I'm +1 (binding)

I will also follow up with the non-blocking metrics documentation.

Thanks!
Justine

On Tue, Oct 3, 2023 at 8:17 AM Chris Egerton 
wrote:

> Hi Satish,
>
> Thanks for running this release!
>
> To verify, I:
> - Built from source using Java 11 with both:
> - - the 3.6.0-rc2 tag on GitHub
> - - the kafka-3.6.0-src.tgz artifact from
> https://home.apache.org/~satishd/kafka-3.6.0-rc2/
> - Checked signatures and checksums
> - Ran the quickstart using the kafka_2.13-3.6.0.tgz artifact from
> https://home.apache.org/~satishd/kafka-3.6.0-rc2/ with Java 11 and Scala
> 13
> in KRaft mode
> - Ran all unit tests
> - Ran all integration tests for Connect and MM2
> - Verified that the connect-test-plugins module is present in the staging
> Maven artifacts (https://issues.apache.org/jira/browse/KAFKA-15249)
>
> Everything looks good to me!
>
> +1 (binding)
>
> Cheers,
>
> Chris
>
> On Tue, Oct 3, 2023 at 6:43 AM Satish Duggana 
> wrote:
>
> > Thanks Luke for helping on running system tests on RCs and updating
> > the status on this email thread.
> >
> > ~Satish.
> >
> > On Tue, 3 Oct 2023 at 05:04, Luke Chen  wrote:
> > >
> > > Hi Justine and all,
> > >
> > > The system test result for 3.6.0 RC2 can be found below.
> > > In short, no failed tests. The flaky tests will pass in the 2nd run.
> > >
> >
> https://drive.google.com/drive/folders/1qwIKg-B4CBrswUeo5fBRv65KWpDsGUiS?usp=sharing
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Tue, Oct 3, 2023 at 7:08 AM Justine Olshan
> > 
> > > wrote:
> > >
> > > > I realized Luke shared the results here for RC1
> > > >
> > https://drive.google.com/drive/folders/1S2XYd79f6_AeWj9f9qEkliRg7JtL04AC
> > > > Given we had some runs that looked reasonable, and we made a small
> > change,
> > > > I'm ok with this. But I wouldn't be upset if we had another set of
> > runs :)
> > > >
> > > > As for the validation:
> > > >
> > > >- I've compiled from source with java 17, 2.13, run the
> > transactional
> > > >produce bench
> > > >- Run unit tests
> > > >- Validated the checksums
> > > >- Downloaded and ran the 2.12 version of the release
> > > >- Briefly took a look at the documentation
> > > >- I was browsing through the site html files and I noticed the
> html
> > for
> > > >documentation.html seemed to be for 3.4. Not sure if this is a
> > blocker,
> > > > but
> > > >wanted to flag it. This seems to be the case for the previous
> > release
> > > >candidates as well. (As well as 3.5 release it seems)
> > > >
> > > >
> > > > I will hold off on voting until we figure that part out. I will also
> > follow
> > > > up with the documentation Divij mentioned outside this thread.
> > > >
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Mon, Oct 2, 2023 at 3:05 PM Greg Harris
> > 
> > > > wrote:
> > > >
> > > > > Hey Satish,
> > > > >
> > > > > I verified KIP-898 functionality and the KAFKA-15473 patch.
> > > > > +1 (non-binding)
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Mon, Oct 2, 2023 at 1:28 PM Justine Olshan
> > > > >  wrote:
> > > > > >
> > > > > > Hey all -- I noticed we still have the system tests as something
> > that
> > > > > will
> > > > > > be updated. Did we get a run for this RC?
> > > > > >
> > > > > > On Mon, Oct 2, 2023 at 1:24 PM Bill Bejeck 
> > wrote:
> > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > Thanks for running the release.
> > > > > > > I performed the following steps:
> > > > > > >
> > > > > > >- Validated all the checksums, signatures, and keys
> > > > > > >- Built the release from source
> > > > > > >- Ran all unit tests
> > > > > > >- Quick start validations
> > > > > > >   - ZK and Kraft
> > > > > > >   - Connect
> >

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-10-03 Thread Justine Olshan

Hey Artem,

Thanks for the KIP. I had a question about epoch bumping.

Previously when we send an InitProducerId request on Producer startup, we
bump the epoch and abort the transaction. Is it correct to assume that we
will still bump the epoch, but just not abort the transaction?
If we still bump the epoch in this case, how does this interact with
KIP-890 where we also bump the epoch on every transaction. (I think this
means that we may skip epochs and the data itself will all have the same
epoch)

I may have follow ups depending on the answer to this. :)

Thanks,
Justine

On Thu, Sep 7, 2023 at 9:51 PM Artem Livshits
 wrote:

> Hi Alex,
>
> Thank you for your questions.
>
> > the purpose of having broker-level transaction.two.phase.commit.enable
>
> The thinking is that 2PC is a bit of an advanced construct so enabling 2PC
> in a Kafka cluster should be an explicit decision.  If it is set to 'false'
> InitiProducerId (and initTransactions) would
> return TRANSACTIONAL_ID_AUTHORIZATION_FAILED.
>
> > WDYT about adding an AdminClient method that returns the state of
> transaction.two.phase.commit.enable
>
> I wonder if the client could just try to use 2PC and then handle the error
> (e.g. if it needs to fall back to ordinary transactions).  This way it
> could uniformly handle cases when Kafka cluster doesn't support 2PC
> completely and cases when 2PC is restricted to certain users.  We could
> also expose this config in describeConfigs, if the fallback approach
> doesn't work for some scenarios.
>
> -Artem
>
>
> On Tue, Sep 5, 2023 at 12:45 PM Alexander Sorokoumov
>  wrote:
>
> > Hi Artem,
> >
> > Thanks for publishing this KIP!
> >
> > Can you please clarify the purpose of having broker-level
> > transaction.two.phase.commit.enable config in addition to the new ACL? If
> > the brokers are configured with
> transaction.two.phase.commit.enable=false,
> > at what point will a client configured with
> > transaction.two.phase.commit.enable=true fail? Will it happen at
> > KafkaProducer#initTransactions?
> >
> > WDYT about adding an AdminClient method that returns the state of t
> > ransaction.two.phase.commit.enable? This way, clients would know in
> advance
> > if 2PC is enabled on the brokers.
> >
> > Best,
> > Alex
> >
> > On Fri, Aug 25, 2023 at 9:40 AM Roger Hoover 
> > wrote:
> >
> > > Other than supporting multiplexing transactional streams on a single
> > > producer, I don't see how to improve it.
> > >
> > > On Thu, Aug 24, 2023 at 12:12 PM Artem Livshits
> > >  wrote:
> > >
> > > > Hi Roger,
> > > >
> > > > Thank you for summarizing the cons.  I agree and I'm curious what
> would
> > > be
> > > > the alternatives to solve these problems better and if they can be
> > > > incorporated into this proposal (or built independently in addition
> to
> > or
> > > > on top of this proposal).  E.g. one potential extension we discussed
> > > > earlier in the thread could be multiplexing logical transactional
> > > "streams"
> > > > with a single producer.
> > > >
> > > > -Artem
> > > >
> > > > On Wed, Aug 23, 2023 at 4:50 PM Roger Hoover  >
> > > > wrote:
> > > >
> > > > > Thanks.  I like that you're moving Kafka toward supporting this
> > > > dual-write
> > > > > pattern.  Each use case needs to consider the tradeoffs.  You
> already
> > > > > summarized the pros very well in the KIP.  I would summarize the
> cons
> > > > > as follows:
> > > > >
> > > > > - you sacrifice availability - each write requires both DB and
> Kafka
> > to
> > > > be
> > > > > available so I think your overall application availability is 1 -
> > p(DB
> > > is
> > > > > unavailable)*p(Kafka is unavailable).
> > > > > - latency will be higher and throughput lower - each write requires
> > > both
> > > > > writes to DB and Kafka while holding an exclusive lock in DB.
> > > > > - you need to create a producer per unit of concurrency in your app
> > > which
> > > > > has some overhead in the app and Kafka side (number of connections,
> > > poor
> > > > > batching).  I assume the producers would need to be configured for
> > low
> > > > > latency (linger.ms=0)
> > > > > - there's some complexity in managing stable transactional ids for
> > each
> > > > > producer/concurrency unit in your application.  With k8s
> deployment,
> > > you
> > > > > may need to switch to something like a StatefulSet that gives each
> > pod
> > > a
> > > > > stable identity across restarts.  On top of that pod identity which
> > you
> > > > can
> > > > > use as a prefix, you then assign unique transactional ids to each
> > > > > concurrency unit (thread/goroutine).
> > > > >
> > > > > On Wed, Aug 23, 2023 at 12:53 PM Artem Livshits
> > > > >  wrote:
> > > > >
> > > > > > Hi Roger,
> > > > > >
> > > > > > Thank you for the feedback.  You make a very good point that we
> > also
> > > > > > discussed internally.  Adding support for multiple concurrent
> > > > > > transactions in one producer could be valuable but it seems to
> be a
> > > > > fairly
> > > > > > large

Re: Unsubscribe :

2023-10-04 Thread Justine Olshan

Hey Girish,

You may need to confirm the unsubscription with a second email.

When I was switching subscription emails, I sent one to the unsubscribe
email and then I got a reply.
In the reply it asked me to send to a unique email address to confirm. Look
for one from dev-h...@kafka.apache.org.

It should have directions on how to unsubscribe. Let me know if you do not
get this second email to confirm the unsubscription.

Justine

On Wed, Oct 4, 2023 at 8:03 AM Girish L  wrote:

> Dear Team
>
> I am repeatedly sending email to dev-unsubscr...@kafka.apache.org to
> unsubscribe this email address of mine from the email notifications
> received from dev@kafka.apache.org.
> Could one of you please help me with the correct process?
>
> Regards
> Girish
>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Justine Olshan

Sorry -- not MV but software version.

On Wed, Oct 4, 2023 at 9:51 AM Justine Olshan  wrote:

> Catching up with this discussion.
>
> I was just curious -- have we had other instances where downgrading MV is
> not supported? I think Kafka typically tries to support downgrades, and I
> couldn't think of other examples.
>
> Thanks,
> Justine
>
> On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu 
> wrote:
>
>> Hi Jun,
>> 54. Marked the software downgrading is not supported. As the old
>> controller
>> will not understand the new PartitionRecord and PartitionChangeRecord.
>> Thanks!
>>
>> On Wed, Oct 4, 2023 at 9:12 AM Jun Rao  wrote:
>>
>> > Hi, Calvin,
>> >
>> > Thanks for the reply. Just one more comment.
>> >
>> > 54. It seems that downgrading MV is supported. Is downgrading the
>> software
>> > version supported? It would be useful to document that.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits
>> >  wrote:
>> >
>> > > Hi Colin,
>> > >
>> > > I think in your example "do_unclean_recovery" would need to do
>> different
>> > > things depending on the strategy.
>> > >
>> > > do_unclean_recovery() {
>> > >if (unclean.recovery.manager.enabled) {
>> > > if (strategy == Aggressive)
>> > >   use UncleanRecoveryManager(waitLastKnownERL=false)  // just
>> inspect
>> > > logs from whoever is available
>> > > else
>> > >   use  UncleanRecoveryManager(waitLastKnownERL=true)  // must wait
>> > for
>> > > at least last known ELR
>> > >   } else {
>> > > if (strategy == Aggressive)
>> > >   choose the last known leader if that is available, or a random
>> > leader
>> > > if not)
>> > > else
>> > >   wait for last known leader to get back
>> > >   }
>> > > }
>> > >
>> > > The idea is that the Aggressive strategy would kick in as soon as we
>> lost
>> > > the leader and would pick a leader from whoever is available; but the
>> > > Balanced will only kick in when ELR is empty and will wait for the
>> > brokers
>> > > that likely have most data to be available.
>> > >
>> > > On Tue, Oct 3, 2023 at 3:04 PM Colin McCabe 
>> wrote:
>> > >
>> > > > On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote:
>> > > > > Hi, Calvin,
>> > > > >
>> > > > > Thanks for the update KIP. A few more comments.
>> > > > >
>> > > > > 41. Why would a user choose the option to select a random replica
>> as
>> > > the
>> > > > > leader instead of using unclean.recovery.strateg=Aggressive? It
>> seems
>> > > > that
>> > > > > the latter is strictly better? If that's not the case, could we
>> fold
>> > > this
>> > > > > option under unclean.recovery.strategy instead of introducing a
>> > > separate
>> > > > > config?
>> > > >
>> > > > Hi Jun,
>> > > >
>> > > > I thought the flow of control was:
>> > > >
>> > > > If there is no leader for the partition {
>> > > >   If (there are unfenced ELR members) {
>> > > > choose_an_unfenced_ELR_member
>> > > >   } else if (there are fenced ELR members AND strategy=Aggressive) {
>> > > > do_unclean_recovery
>> > > >   } else if (there are no ELR members AND strategy != None) {
>> > > > do_unclean_recovery
>> > > >   } else {
>> > > > do nothing about the missing leader
>> > > >   }
>> > > > }
>> > > >
>> > > > do_unclean_recovery() {
>> > > >if (unclean.recovery.manager.enabled) {
>> > > > use UncleanRecoveryManager
>> > > >   } else {
>> > > > choose the last known leader if that is available, or a random
>> > leader
>> > > > if not)
>> > > >   }
>> > > > }
>> > > >
>> > > > However, I think this could be clarified, especially the behavior
>> when
>> > > > unclean.recovery.manager.enabled=false. Inuitively the goal for
>> > > >

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Justine Olshan

Catching up with this discussion.

I was just curious -- have we had other instances where downgrading MV is
not supported? I think Kafka typically tries to support downgrades, and I
couldn't think of other examples.

Thanks,
Justine

On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu 
wrote:

> Hi Jun,
> 54. Marked the software downgrading is not supported. As the old controller
> will not understand the new PartitionRecord and PartitionChangeRecord.
> Thanks!
>
> On Wed, Oct 4, 2023 at 9:12 AM Jun Rao  wrote:
>
> > Hi, Calvin,
> >
> > Thanks for the reply. Just one more comment.
> >
> > 54. It seems that downgrading MV is supported. Is downgrading the
> software
> > version supported? It would be useful to document that.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits
> >  wrote:
> >
> > > Hi Colin,
> > >
> > > I think in your example "do_unclean_recovery" would need to do
> different
> > > things depending on the strategy.
> > >
> > > do_unclean_recovery() {
> > >if (unclean.recovery.manager.enabled) {
> > > if (strategy == Aggressive)
> > >   use UncleanRecoveryManager(waitLastKnownERL=false)  // just
> inspect
> > > logs from whoever is available
> > > else
> > >   use  UncleanRecoveryManager(waitLastKnownERL=true)  // must wait
> > for
> > > at least last known ELR
> > >   } else {
> > > if (strategy == Aggressive)
> > >   choose the last known leader if that is available, or a random
> > leader
> > > if not)
> > > else
> > >   wait for last known leader to get back
> > >   }
> > > }
> > >
> > > The idea is that the Aggressive strategy would kick in as soon as we
> lost
> > > the leader and would pick a leader from whoever is available; but the
> > > Balanced will only kick in when ELR is empty and will wait for the
> > brokers
> > > that likely have most data to be available.
> > >
> > > On Tue, Oct 3, 2023 at 3:04 PM Colin McCabe 
> wrote:
> > >
> > > > On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote:
> > > > > Hi, Calvin,
> > > > >
> > > > > Thanks for the update KIP. A few more comments.
> > > > >
> > > > > 41. Why would a user choose the option to select a random replica
> as
> > > the
> > > > > leader instead of using unclean.recovery.strateg=Aggressive? It
> seems
> > > > that
> > > > > the latter is strictly better? If that's not the case, could we
> fold
> > > this
> > > > > option under unclean.recovery.strategy instead of introducing a
> > > separate
> > > > > config?
> > > >
> > > > Hi Jun,
> > > >
> > > > I thought the flow of control was:
> > > >
> > > > If there is no leader for the partition {
> > > >   If (there are unfenced ELR members) {
> > > > choose_an_unfenced_ELR_member
> > > >   } else if (there are fenced ELR members AND strategy=Aggressive) {
> > > > do_unclean_recovery
> > > >   } else if (there are no ELR members AND strategy != None) {
> > > > do_unclean_recovery
> > > >   } else {
> > > > do nothing about the missing leader
> > > >   }
> > > > }
> > > >
> > > > do_unclean_recovery() {
> > > >if (unclean.recovery.manager.enabled) {
> > > > use UncleanRecoveryManager
> > > >   } else {
> > > > choose the last known leader if that is available, or a random
> > leader
> > > > if not)
> > > >   }
> > > > }
> > > >
> > > > However, I think this could be clarified, especially the behavior
> when
> > > > unclean.recovery.manager.enabled=false. Inuitively the goal for
> > > > unclean.recovery.manager.enabled=false is to be "the same as now,
> > mostly"
> > > > but it's very underspecified in the KIP, I agree.
> > > >
> > > > >
> > > > > 50. ElectLeadersRequest: "If more than 20 topics are included, only
> > the
> > > > > first 20 will be served. Others will be returned with
> > DesiredLeaders."
> > > > Hmm,
> > > > > not sure that I understand this. ElectLeadersResponse doesn't have
> a
> > > > > DesiredLeaders field.
> > > > >
> > > > > 51. GetReplicaLogInfo: "If more than 2000 partitions are included,
> > only
> > > > the
> > > > > first 2000 will be served" Do we return an error for the remaining
> > > > > partitions? Actually, should we include an errorCode field at the
> > > > partition
> > > > > level in GetReplicaLogInfoResponse to cover non-existing partitions
> > and
> > > > no
> > > > > authorization, etc?
> > > > >
> > > > > 52. The entry should matches => The entry should match
> > > > >
> > > > > 53. ElectLeadersRequest.DesiredLeaders: Should it be nullable
> since a
> > > > user
> > > > > may not specify DesiredLeaders?
> > > > >
> > > > > 54. Downgrade: Is that indeed possible? I thought earlier you said
> > that
> > > > > once the new version of the records are in the metadata log, one
> > can't
> > > > > downgrade since the old broker doesn't know how to parse the new
> > > version
> > > > of
> > > > > the metadata records?
> > > > >
> > > >
> > > > MetadataVersion downgrade is currently broken but we have fixing it
> on
> > > our
> > > > plate for Kafka 3.7.
> > > >
> > > >

Re: [VOTE]KIP-966: Eligible Leader Replicas

2023-09-20 Thread Justine Olshan

Thanks Calvin.
I think this will be very helpful going forward to minimize data loss.

+1 from me (binding)

Justine

On Wed, Sep 20, 2023 at 3:42 PM Calvin Liu 
wrote:

> Hi all,
> I'd like to call for a vote on KIP-966 which includes a series of
> enhancements to the current ISR model.
>
>- Introduce the new HWM advancement requirement which enables the system
>to have more potentially data-safe replicas.
>- Introduce Eligible Leader Replicas(ELR) to represent the above
>data-safe replicas.
>- Introduce Unclean Recovery process which will deterministically choose
>the best replica during an unclean leader election.
>
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
>
> Discussion thread:
> https://lists.apache.org/thread/gpbpx9kpd7c62dm962h6kww0ghgznb38
>

Re: UncleanLeaderElectionsPerSec metric and Raft

2023-10-23 Thread Justine Olshan

Hey Neil,

I was taking a look at this code, and noticed that some unclean leader
election params were not implemented.
https://github.com/apache/kafka/blob/4612fe42af0df0a4c1affaf66c55d01eb6267ce3/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L499

I know you mentioned setting the non-topic config, but I wonder if the
feature is generally not built out. I think that once KIP-966 is
implemented, it will likely replace the old notion of unclean leader
election.

Still, if KRaft mode doesn't have unclean leader election, it should be
documented. I will get back to you on this.

Justine

On Wed, Oct 18, 2023 at 10:30 AM Neil Buesing  wrote:

> Development,
>
> with Raft controllers, is the unclean leader election / sec metric supose
> to be available?
>
> kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
>
> Nothing in documentation indicates that it isn’t as well as in code
> navigation nothing indicates to me that it wouldn’t show up, but even added
> unclean leader election to true for both brokers and controllers and
> nothing.
>
> (set this for all controllers and brokers)
>   KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE: true
>
> Happy to report a Jira, but wanted to figure out if the bug was in the
> documentation or the metric not being available?
>
> Thanks,
>
> Neil
>
> P.S. I did confirm that others have seen and wondered about this,
> https://github.com/strimzi/strimzi-kafka-operator/issues/8169, but that is
> about the only other report on this I have found.
>

Re: requesting permissions to contribute to Apache Kafka

2023-08-20 Thread Justine Olshan

Hey Neil,

I've given you permissions for wiki access. You should be able to create a
KIP now. Let me know if you have any other issues.

Justine

On Sun, Aug 20, 2023 at 5:46 AM Neil Buesing  wrote:

> Wiki ID: neil.buesing
>
> JIRA ID: nbuesing
>
> when I click on the signup link through in the KIP page I get the following
> message
> --
> (Public signup for this instance is disabled. Go to our Self serve sign up
> page to request an account.)
> >You can't sign up right now
> --
> I am not sure if I have to go through that process or if someone allows
> 'nbuesing' account to have access.
>
> Thanks,
> Neil
>

Re: Request to Get Edit Permission

2023-08-18 Thread Justine Olshan

Hey Hailey,

Can you share your wiki ID so I can grant you access? If you don't yet have
one you may need to create an account.

Justine

On Fri, Aug 18, 2023 at 3:58 PM Hailey Ni  wrote:

> Hi,
>
> Can I get edit access to Apache Kafka's Wiki please?
>
> Thanks,
> Hailey
>

Re: Justine Olshan / thank you (wiki access)

2023-08-25 Thread Justine Olshan

Hmmm. That's a bit strange if you are subscribed to  dev@kafka.apache.org,
you should be getting responses.

Let me know if this one also doesn't work.

Justine

On Fri, Aug 25, 2023 at 6:04 AM Neil Buesing  wrote:

> Justine,
>
> Thanks for taking care of the wiki access; weird in that response to the
> email never forward to me, I had to see it in the email archive (
> https://lists.apache.org/list.html?dev@kafka.apache.org). I get other
> emails in dev. (I subscribed https://kafka.apache.org/contact)
>
> I am looking to see how to get email responses, or if someone else knows of
> this issue with apache email lists, from the archive page I cannot send
> emails.
>
> Thanks,
>
> Neil
>

Re: Apache Kafka 3.6.0 release

2023-08-25 Thread Justine Olshan

Hey Satish,
Everything should be in 3.6, and I will update the release plan wiki.
Thanks!

On Fri, Aug 25, 2023 at 4:08 AM Satish Duggana 
wrote:

> Hi Justine,
> Adding KIP-890 part-1 to 3.6.0 seems reasonable to me. This part looks
> to be addressing a critical issue of consumers getting stuck. Please
> update the release plan wiki and merge all the required changes to 3.6
> branch.
>
> Thanks,
> Satish.
>
> On Thu, 24 Aug 2023 at 22:19, Justine Olshan
>  wrote:
> >
> > Hey Satish,
> > Does it make sense to include KIP-890 part 1? It prevents hanging
> > transactions for older clients. (An optimization and stronger EOS
> > guarantees will be included in part 2)
> >
> > Thanks,
> > Justine
> >
> > On Mon, Aug 21, 2023 at 3:29 AM Satish Duggana  >
> > wrote:
> >
> > > Hi,
> > > 3.6 branch is created. Please make sure any PRs targeted for 3.6.0
> > > should be merged to 3.6 branch once those are merged to trunk.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Wed, 16 Aug 2023 at 15:58, Satish Duggana  >
> > > wrote:
> > > >
> > > > Hi,
> > > > Please plan to merge PRs(including the major features) targeted for
> > > > 3.6.0 by the end of Aug 20th UTC. Starting from August 21st, any pull
> > > > requests intended for the 3.6.0 release must include the changes
> > > > merged into the 3.6 branch as mentioned in the release plan.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Fri, 4 Aug 2023 at 18:39, Chris Egerton 
> > > wrote:
> > > > >
> > > > > Thanks for adding KIP-949, Satish!
> > > > >
> > > > > On Fri, Aug 4, 2023 at 7:06 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > Myself and Divij discussed and added the wiki for Kafka
> TieredStorage
> > > > > > Early Access Release[1]. If you have any comments or feedback,
> please
> > > > > > feel free to share them.
> > > > > >
> > > > > > 1.
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Fri, 4 Aug 2023 at 08:40, Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Chris,
> > > > > > > Thanks for the update. This looks to be a minor change and is
> also
> > > > > > > useful for backward compatibility. I added it to the release
> plan
> > > as
> > > > > > > an exceptional case.
> > > > > > >
> > > > > > > ~Satish.
> > > > > > >
> > > > > > > On Thu, 3 Aug 2023 at 21:34, Chris Egerton
>  > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Satish,
> > > > > > > >
> > > > > > > > Would it be possible to include KIP-949 (
> > > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> > > > > > )
> > > > > > > > in the 3.6.0 release? It passed voting yesterday, and is a
> very
> > > small,
> > > > > > > > low-risk change that we'd like to put out as soon as
> possible in
> > > order
> > > > > > to
> > > > > > > > patch an accidental break in backwards compatibility caused
> a few
> > > > > > versions
> > > > > > > > ago.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Chris
> > > > > > > >
> > > > > > > > On Fri, Jul 28, 2023 at 2:35 AM Satish Duggana <
> > > > > > satish.dugg...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > >

Re: Apache Kafka 3.6.0 release

2023-08-24 Thread Justine Olshan

Hey Satish,
Does it make sense to include KIP-890 part 1? It prevents hanging
transactions for older clients. (An optimization and stronger EOS
guarantees will be included in part 2)

Thanks,
Justine

On Mon, Aug 21, 2023 at 3:29 AM Satish Duggana 
wrote:

> Hi,
> 3.6 branch is created. Please make sure any PRs targeted for 3.6.0
> should be merged to 3.6 branch once those are merged to trunk.
>
> Thanks,
> Satish.
>
> On Wed, 16 Aug 2023 at 15:58, Satish Duggana 
> wrote:
> >
> > Hi,
> > Please plan to merge PRs(including the major features) targeted for
> > 3.6.0 by the end of Aug 20th UTC. Starting from August 21st, any pull
> > requests intended for the 3.6.0 release must include the changes
> > merged into the 3.6 branch as mentioned in the release plan.
> >
> > Thanks,
> > Satish.
> >
> > On Fri, 4 Aug 2023 at 18:39, Chris Egerton 
> wrote:
> > >
> > > Thanks for adding KIP-949, Satish!
> > >
> > > On Fri, Aug 4, 2023 at 7:06 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > Myself and Divij discussed and added the wiki for Kafka TieredStorage
> > > > Early Access Release[1]. If you have any comments or feedback, please
> > > > feel free to share them.
> > > >
> > > > 1.
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Fri, 4 Aug 2023 at 08:40, Satish Duggana <
> satish.dugg...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi Chris,
> > > > > Thanks for the update. This looks to be a minor change and is also
> > > > > useful for backward compatibility. I added it to the release plan
> as
> > > > > an exceptional case.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Thu, 3 Aug 2023 at 21:34, Chris Egerton  >
> > > > wrote:
> > > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > Would it be possible to include KIP-949 (
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> > > > )
> > > > > > in the 3.6.0 release? It passed voting yesterday, and is a very
> small,
> > > > > > low-risk change that we'd like to put out as soon as possible in
> order
> > > > to
> > > > > > patch an accidental break in backwards compatibility caused a few
> > > > versions
> > > > > > ago.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Fri, Jul 28, 2023 at 2:35 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > > Whoever has KIP entries in the 3.6.0 release plan. Please
> update it
> > > > > > > with the latest status by tomorrow(end of the day 29th Jul UTC
> ).
> > > > > > >
> > > > > > > Thanks
> > > > > > > Satish.
> > > > > > >
> > > > > > > On Fri, 28 Jul 2023 at 12:01, Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Ismael and Divij for the suggestions.
> > > > > > > >
> > > > > > > > One way was to follow the earlier guidelines that we set for
> any
> > > > early
> > > > > > > > access release. It looks Ismael already mentioned the
> example of
> > > > > > > > KRaft.
> > > > > > > >
> > > > > > > > KIP-405 mentions upgrade/downgrade and limitations sections.
> We can
> > > > > > > > clarify that in the release notes for users on how this
> feature
> > > > can be
> > > > > > > > used for early access.
> > > > > > > >
> > > > > > > > Divij, We do not want users to enable this feature on
> production
> > > > > > > > environments in early access release. Let us work together
> on the
> > > > > > > > followups Ismael suggested.
> > > > > > > >
> > > > > > > > ~Satish.
> > > > > > > >
> > > > > > > > On Fri, 28 Jul 2023 at 02:24, Divij Vaidya <
> > > > divijvaidy...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Those are great suggestions, thank you. We will continue
> this
> > > > > > > discussion
> > > > > > > > > forward in a separate KIP for release plan for Tiered
> Storage.
> > > > > > > > >
> > > > > > > > > On Thu 27. Jul 2023 at 21:46, Ismael Juma <
> m...@ismaeljuma.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Divij,
> > > > > > > > > >
> > > > > > > > > > I think the points you bring up for discussion are all
> good.
> > > > My main
> > > > > > > > > > feedback is that they should be discussed in the context
> of
> > > > KIPs vs
> > > > > > > the
> > > > > > > > > > release template. That's why we have a backwards
> compatibility
> > > > > > > section for
> > > > > > > > > > every KIP, it's precisely to ensure we think carefully
> about
> > > > some of
> > > > > > > the
> > > > > > > > > > points you're bringing up. When it comes to defining the
> > > > meaning of
> > > > > > > early
> > > > > > > > > > access, we have two options:
> > > > > > > > > >
> > > > > > > > > > 1. Have a KIP specifically for tiered storage.
> > > > > > > > > >

Re: Need Access to create KIP & Jira Tickets

2023-08-28 Thread Justine Olshan

Hey Raghu,

I've added your ID to give you permissions to the wiki.

I'm not sure if committers can change your jira ID. You may want to try to
create a new account or file a ticket with apache for that.

Let me know if there are any issues.

Justine

On Mon, Aug 28, 2023 at 11:54 AM Raghu Baddam  wrote:

> Hi Team,
>
> Please find wiki ID and Jira ID and help me by providing me access to
> create KIP's and Jira Tickets on Apache Kafka space.
>
> wiki ID: rbaddam
> Jira ID: raghu98...@gmail.com
>
> Also If possible I also need help with changing my Jira ID same as wiki ID
> i.e. *rbaddam*
>
> Thanks,
> Raghu
>

Re: Requesting permission to contribute to Apache Kafka

2023-08-21 Thread Justine Olshan

Hey Hailey,
You should have permissions now!

Justine

On Mon, Aug 21, 2023 at 2:11 PM Hailey Ni  wrote:

> Hi,
>
> This is Hailey. Wiki ID: hni. May I request edit permission to the Kafka
> Wiki please?
>
> Thanks,
> Hailey
>

Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

2022-07-08 Thread Justine Olshan

Hi David,
Thanks for sharing this KIP! Really exciting to hear how we are changing
the protocol! The motivation section really made me realize how useful this
change will be.

I've done a first pass of the KIP, and may have more questions, but thought
I'd start with a few I thought of already.

   - I saw some usages of topic IDs in the new
   protocols/records/interfaces, but wasn't sure if they were used everywhere.
   Are you planning on relying on topic IDs for the new protocol?
   - I saw the section about using a feature flag first before integrating
   the feature with ibp/metadata version. I understand the logic for testing
   with the flag, but it also seems like a bit of work to deprecate and switch
   to the ibp/metadata version approach. What was the reasoning behind
   switching the enablement mechanism?
   - Generally, are there implications for KRaft here? (IBP/metadata
   version is something that I think of) And if so, will both cluster types be
   supported?

Thanks again to everyone who worked on this KIP!
Justine

On Wed, Jul 6, 2022 at 1:45 AM David Jacot 
wrote:

> Hi all,
>
> I would like to start a discussion thread on KIP-848: The Next
> Generation of the Consumer Rebalance Protocol. With this KIP, we aim
> to make the rebalance protocol (for consumers) more reliable, more
> scalable, easier to implement for clients, and easier to debug for
> operators.
>
> The KIP is here: https://cwiki.apache.org/confluence/x/HhD1D.
>
> Please take a look and let me know what you think.
>
> Best,
> David
>
> PS: I will be away from July 18th to August 8th. That gives you a bit
> of time to read and digest this long KIP.
>

Re: [DISCUSS] KIP-847: Add ProducerCount metrics

2022-06-30 Thread Justine Olshan

Hi Artem,
Thanks for the update to include motivation. Makes sense to me.
Justine

On Wed, Jun 29, 2022 at 6:51 PM Luke Chen  wrote:

> Hi Artem,
>
> Thanks for the update.
> LGTM.
>
> Luke
>
> On Thu, Jun 30, 2022 at 6:51 AM Artem Livshits
>  wrote:
>
> > Thank you for your feedback. I've updated the KIP to elaborate on the
> > motivation and provide some background on producer ids and how we measure
> > them.
> >
> > Also, after some thinking and discussing it offline with some folks, I
> > think that we don't really need partitioner level metrics.  We can use
> > existing tools to do granular debugging.  I've moved partition level
> > metrics to the rejected alternatives section.
> >
> > -Artem
> >
> > On Wed, Jun 29, 2022 at 1:57 AM Luke Chen  wrote:
> >
> > > Hi Artem,
> > >
> > > Could you elaborate more in the motivation section?
> > > I'm interested to know what kind of scenarios this metric can benefit
> > for.
> > > What could it bring to us when a topic partition has 100
> ProducerIdCount
> > VS
> > > another topic partition has 10 ProducerIdCount?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Jun 29, 2022 at 6:30 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Artem,
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > Could you explain the partition level ProducerIdCount a bit more?
> Does
> > > that
> > > > reflect the number of PIDs ever produced to a partition since the
> > broker
> > > is
> > > > started? Do we reduce the count after a PID expires?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Jun 22, 2022 at 1:08 AM David Jacot
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi Artem,
> > > > >
> > > > > The KIP LGTM.
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > > On Tue, Jun 21, 2022 at 9:32 PM Artem Livshits
> > > > >  wrote:
> > > > > >
> > > > > > If there is no other feedback I'm going to start voting in a
> couple
> > > > days.
> > > > > >
> > > > > > -Artem
> > > > > >
> > > > > > On Fri, Jun 17, 2022 at 3:50 PM Artem Livshits <
> > > alivsh...@confluent.io
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for your feedback.  Updated the KIP and added the
> > > Rejected
> > > > > > > Alternatives section.
> > > > > > >
> > > > > > > -Artem
> > > > > > >
> > > > > > > On Fri, Jun 17, 2022 at 1:16 PM Ismael Juma  >
> > > > wrote:
> > > > > > >
> > > > > > >> If we don't track them separately, then it makes sense to keep
> > it
> > > as
> > > > > one
> > > > > > >> metric. I'd probably name it ProducerIdCount in that case.
> > > > > > >>
> > > > > > >> Ismael
> > > > > > >>
> > > > > > >> On Fri, Jun 17, 2022 at 12:04 PM Artem Livshits
> > > > > > >>  wrote:
> > > > > > >>
> > > > > > >> > Do you propose to have 2 metrics instead of one?  Right now
> we
> > > > don't
> > > > > > >> track
> > > > > > >> > if the producer id was transactional or idempotent and for
> > > metric
> > > > > > >> > collection we'd either have to pay the cost of iterating
> over
> > > > > producer
> > > > > > >> ids
> > > > > > >> > (which could be a lot) or split the producer map into 2 or
> > cache
> > > > the
> > > > > > >> > counts, which complicates the code.
> > > > > > >> >
> > > > > > >> > From the monitoring perspective, I think one metric should
> be
> > > > good,
> > > > > but
> > > > > > >> > maybe I'm missing some scenarios.
> > > > > > >> >
> > > > > > >> > -Artem
> > > > > > >> >
> > > > > > >> > On Fri, Jun 17, 2022 at 12:28 AM Ismael Juma <
> > ism...@juma.me.uk
> > > >
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > I like the suggestion to have IdempotentProducerCount and
> > > > > > >> > > TransactionalProducerCount metrics.
> > > > > > >> > >
> > > > > > >> > > Ismael
> > > > > > >> > >
> > > > > > >> > > On Thu, Jun 16, 2022 at 2:27 PM Artem Livshits
> > > > > > >> > >  wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi Ismael,
> > > > > > >> > > >
> > > > > > >> > > > Thank you for your feedback.  Yes, this is counting the
> > > number
> > > > > of
> > > > > > >> > > producer
> > > > > > >> > > > ids tracked by the partition and broker.  Another
> options
> > I
> > > > was
> > > > > > >> > thinking
> > > > > > >> > > of
> > > > > > >> > > > are the following:
> > > > > > >> > > >
> > > > > > >> > > > - IdempotentProducerCount
> > > > > > >> > > > - TransactionalProducerCount
> > > > > > >> > > > - ProducerIdCount
> > > > > > >> > > >
> > > > > > >> > > > Let me know if one of these seems better, or I'm open to
> > > other
> > > > > name
> > > > > > >> > > > suggestions as well.
> > > > > > >> > > >
> > > > > > >> > > > -Artem
> > > > > > >> > > >
> > > > > > >> > > > On Wed, Jun 15, 2022 at 11:49 PM Ismael Juma <
> > > > ism...@juma.me.uk
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Thanks for the KIP.
> > > > > > >> > > > >
> > > > > > >> > > > > ProducerCount seems like a misleading name since
> > producers
> > > > > > >> without a
> > > > > > >> > > > > producer id are not counted. Is this meant

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-25 Thread Justine Olshan

Hey Bill,
Thanks! I was just going to say that hopefully
transactional.id.expiration.ms would also be over the delivery timeout. :)
Thanks for the +1!

Justine

On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck  wrote:

> Hi Justine,
>
> I just took another look at the KIP, and I realize my question/suggestion
> about default values has already been addressed in the `Compatibility`
> section.
>
> I'm +1 on the KIP.
>
> -Bill
>
> On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck  wrote:
>
> > Hi Justine,
> >
> > Thanks for the well written KIP, this looks like it will be a useful
> > addition.
> >
> > Overall the KIP looks good to me, I have one question/comment.
> >
> > You mentioned that setting the `producer.id.expiration.ms` less than the
> > delivery timeout could lead to duplicates, which makes sense.  To help
> > avoid this situation, do we want to consider a default value that is the
> > same as the delivery timeout?
> >
> > Thanks again for the KIP.
> >
> > Bill
> >
> > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> >  wrote:
> >
> >> Hey all!
> >>
> >> I'd like to start a discussion on my proposal to separate time-based
> >> producer ID expiration from transactional ID expiration by introducing a
> >> new configuration.
> >>
> >> The KIP Is pretty small and simple, but will be helpful in controlling
> >> memory usage in brokers -- especially now that by default producers are
> >> idempotent and create producer ID state.
> >>
> >> Please take a look and leave any comments you may have!
> >>
> >> KIP:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> >>
> >> Thanks!
> >> Justine
> >>
> >
>

1 2 3 4 5 >

1 - 100 of 460 matches

Mail list logo