Re: [VOTE] KIP-759: Unneeded repartition canceling

2023-08-03 Thread Shay Lin
Hi all,

Thanks to everyone who participated in the vote and the discussion. I'll
close this since it has been open for over 72 hours and we have a
sufficient number of votes. KIP-793 has been accepted with the following +1
votes (binding): Matthias, Walker, Bill, Bruno, Sophie.

I'll start working on the implementation.

Best,
Shay

On Wed, Aug 2, 2023 at 3:56 PM Sophie Blee-Goldman 
wrote:

> +1 (binding)
>
> thanks Shay!
>
> On Wed, Aug 2, 2023 at 1:19 AM Bruno Cadonna  wrote:
>
> > Hi,
> >
> > +1 (binding)
> >
> > Thanks for the KIP!
> >
> > Best,
> > Bruno
> >
> > On 8/2/23 1:19 AM, Bill Bejeck wrote:
> > > I caught up on the discussion thread and the KIP LGTM.
> > >
> > > +1(binding)
> > >
> > > On Tue, Aug 1, 2023 at 3:07 PM Walker Carlson
> > 
> > > wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> On Mon, Jul 31, 2023 at 10:43 PM Matthias J. Sax 
> > wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> On 7/11/23 11:16 AM, Shay Lin wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> I'd like to call a vote on KIP-759: Unneeded repartition canceling
> > >>>> The KIP has been under discussion for quite some time(two years).
> This
> > >>> is a
> > >>>> valuable optimization for advanced users. I hope we can push this
> > >> toward
> > >>>> the finish line this time.
> > >>>>
> > >>>> Link to the KIP:
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-759%3A+Unneeded+repartition+canceling
> > >>>>
> > >>>> Best,
> > >>>> Shay
> > >>>>
> > >>>
> > >>
> > >
> >
>


Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-31 Thread Shay Lin
Hi all,

It's been a few days, if there is no further comments or questions I'd like
to call for a vote. There is an existing VOTE thread if you search for
KIP-759.

Thank you,
Shay

On Wed, Jul 26, 2023 at 7:30 PM Shay Lin  wrote:

> Very good catch, Matthias. I updated the KIP to state that the new
> DSLOperation will return a new, mutated KStream.
>
> Thank you,
> Shay
>
> On Wed, Jul 26, 2023 at 6:13 PM Matthias J. Sax  wrote:
>
>> One last question. What should happen for the following case:
>>
>> KStream myStream = build.stream(...).map(...);
>> myStream.markAsPartiitoned().groupByKey().aggregate(...);
>> myStream.join(...)
>>
>> The question is about the "fan-out" pattern. `myStream`, which is marked
>> for partitioning, is fed into two downstream operations. Thus, it's
>> clear that the aggregation won't trigger a rebalance. However, the
>> fan-out happens before `markAsRepartiitoned` and thus I would assume
>> that the join would trigger a repartitioning?
>>
>> This question is important, because if we follow what I said above,
>> `markAsRepartiitoned` returns a new KStream object, but does mutate the
>> upstream KStream object, what is semantically two different things. It
>> also has an impact on how we need to implement the feature. The KIP
>> should explicitly explain this case.
>>
>>
>> -Matthias
>>
>> On 7/26/23 4:58 PM, Shay Lin wrote:
>> > Hi John,
>> >
>> > Thanks for your reply. I updated the KIP to reflect the changes we
>> > discussed in the thread today.
>> > #1 is duly noted, I learned from the examples Sophie sent earlier! =)
>> >
>> > In the new version, I also talked about why IQ and joins will not work
>> with
>> > the interface and talked about the mitigation. The proposal
>> > now specifically states we are solving the unneeded partition problem
>> when
>> > IQ or join does not coexist in the kafka streams. In the concerns
>> section,
>> > the proposal talks about having a reverse mapping would make this new
>> > interface compatible with IQ and join again but is subject to demand.
>> >
>> > Let me know what you think. Thanks!
>> > Shay
>> >
>> >
>> >
>> > On Wed, Jul 26, 2023 at 2:35 PM John Roesler 
>> wrote:
>> >
>> >> Hello Shay,
>> >>
>> >> Thanks for the KIP!
>> >>
>> >> I just took a look in preparation to vote, and there are two small-ish
>> >> things that I'd like to fix first. Apologies if this stuff has already
>> come
>> >> up in the discussion thread; I only skimmed it.
>> >>
>> >> 1. The KIP only mentions the name of the method instead of providing a
>> >> code snippet showing exactly what the method signature will be in the
>> >> interface. Normally, KIPs do the latter because it removes all
>> ambiguity
>> >> from the proposal. It also gives you an opportunity to write down the
>> >> Javadoc you would add to the method instead of just mentioning the
>> points
>> >> that you plan to document.
>> >>
>> >> 2. The KIP lists some concerns, but not what you will do to mitigate
>> them.
>> >> For example, the concern about IQ not behaving correctly. Will you
>> disable
>> >> the use of the implicit partitioner downstream of one of these
>> >> cancellations? Or provide a new interface to supply the "reverse
>> mapping"
>> >> you mentioned? Or include documentation in the Javadoc for how to deal
>> with
>> >> the situation? I think there are a range of options for each of those
>> >> concerns, and we should state up front what we plan to do.
>> >>
>> >> Thanks again!
>> >> -John
>> >>
>> >> On 2023/07/24 20:33:05 Sophie Blee-Goldman wrote:
>> >>> Thanks Shay! You and Matthias have convinced me, I'm happy with the
>> >> current
>> >>> proposal. I think once you make the minor
>> >>> updates to the KIP document this will be ready for voting again.
>> >>>
>> >>> Cheers,
>> >>> Sophie
>> >>>
>> >>> On Mon, Jul 24, 2023 at 8:26 AM Shay Lin  wrote:
>> >>>
>> >>>> Hi Sophie and Matthias, thanks for your comments and replies.
>> >>>>
>> >>>> 1. Scope of change: KStreams only or KStreams/KTable
>> >>>&

Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-26 Thread Shay Lin
Very good catch, Matthias. I updated the KIP to state that the new
DSLOperation will return a new, mutated KStream.

Thank you,
Shay

On Wed, Jul 26, 2023 at 6:13 PM Matthias J. Sax  wrote:

> One last question. What should happen for the following case:
>
> KStream myStream = build.stream(...).map(...);
> myStream.markAsPartiitoned().groupByKey().aggregate(...);
> myStream.join(...)
>
> The question is about the "fan-out" pattern. `myStream`, which is marked
> for partitioning, is fed into two downstream operations. Thus, it's
> clear that the aggregation won't trigger a rebalance. However, the
> fan-out happens before `markAsRepartiitoned` and thus I would assume
> that the join would trigger a repartitioning?
>
> This question is important, because if we follow what I said above,
> `markAsRepartiitoned` returns a new KStream object, but does mutate the
> upstream KStream object, what is semantically two different things. It
> also has an impact on how we need to implement the feature. The KIP
> should explicitly explain this case.
>
>
> -Matthias
>
> On 7/26/23 4:58 PM, Shay Lin wrote:
> > Hi John,
> >
> > Thanks for your reply. I updated the KIP to reflect the changes we
> > discussed in the thread today.
> > #1 is duly noted, I learned from the examples Sophie sent earlier! =)
> >
> > In the new version, I also talked about why IQ and joins will not work
> with
> > the interface and talked about the mitigation. The proposal
> > now specifically states we are solving the unneeded partition problem
> when
> > IQ or join does not coexist in the kafka streams. In the concerns
> section,
> > the proposal talks about having a reverse mapping would make this new
> > interface compatible with IQ and join again but is subject to demand.
> >
> > Let me know what you think. Thanks!
> > Shay
> >
> >
> >
> > On Wed, Jul 26, 2023 at 2:35 PM John Roesler 
> wrote:
> >
> >> Hello Shay,
> >>
> >> Thanks for the KIP!
> >>
> >> I just took a look in preparation to vote, and there are two small-ish
> >> things that I'd like to fix first. Apologies if this stuff has already
> come
> >> up in the discussion thread; I only skimmed it.
> >>
> >> 1. The KIP only mentions the name of the method instead of providing a
> >> code snippet showing exactly what the method signature will be in the
> >> interface. Normally, KIPs do the latter because it removes all ambiguity
> >> from the proposal. It also gives you an opportunity to write down the
> >> Javadoc you would add to the method instead of just mentioning the
> points
> >> that you plan to document.
> >>
> >> 2. The KIP lists some concerns, but not what you will do to mitigate
> them.
> >> For example, the concern about IQ not behaving correctly. Will you
> disable
> >> the use of the implicit partitioner downstream of one of these
> >> cancellations? Or provide a new interface to supply the "reverse
> mapping"
> >> you mentioned? Or include documentation in the Javadoc for how to deal
> with
> >> the situation? I think there are a range of options for each of those
> >> concerns, and we should state up front what we plan to do.
> >>
> >> Thanks again!
> >> -John
> >>
> >> On 2023/07/24 20:33:05 Sophie Blee-Goldman wrote:
> >>> Thanks Shay! You and Matthias have convinced me, I'm happy with the
> >> current
> >>> proposal. I think once you make the minor
> >>> updates to the KIP document this will be ready for voting again.
> >>>
> >>> Cheers,
> >>> Sophie
> >>>
> >>> On Mon, Jul 24, 2023 at 8:26 AM Shay Lin  wrote:
> >>>
> >>>> Hi Sophie and Matthias, thanks for your comments and replies.
> >>>>
> >>>> 1. Scope of change: KStreams only or KStreams/KTable
> >>>> I took some time to digest your points, looking through how KStreams
> >>>> triggers repartitions today. I noticed that `repartitionRequired`is a
> >> flag
> >>>> in KStreamImpl etc and not in KTableImpl etc. When I look further, in
> >> the
> >>>> case of KTable, instead of passing in a boolean flag, a repartition
> >> node `
> >>>> TableRepartitionMapNode` is directly created. I went back and
> >> referenced
> >>>> the two issue tickets KAFKA-10844 and KAFKA-4835, both requests were
> >>>> focused on KStreams, i.e. not to c

Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-26 Thread Shay Lin
Hi John,

Thanks for your reply. I updated the KIP to reflect the changes we
discussed in the thread today.
#1 is duly noted, I learned from the examples Sophie sent earlier! =)

In the new version, I also talked about why IQ and joins will not work with
the interface and talked about the mitigation. The proposal
now specifically states we are solving the unneeded partition problem when
IQ or join does not coexist in the kafka streams. In the concerns section,
the proposal talks about having a reverse mapping would make this new
interface compatible with IQ and join again but is subject to demand.

Let me know what you think. Thanks!
Shay



On Wed, Jul 26, 2023 at 2:35 PM John Roesler  wrote:

> Hello Shay,
>
> Thanks for the KIP!
>
> I just took a look in preparation to vote, and there are two small-ish
> things that I'd like to fix first. Apologies if this stuff has already come
> up in the discussion thread; I only skimmed it.
>
> 1. The KIP only mentions the name of the method instead of providing a
> code snippet showing exactly what the method signature will be in the
> interface. Normally, KIPs do the latter because it removes all ambiguity
> from the proposal. It also gives you an opportunity to write down the
> Javadoc you would add to the method instead of just mentioning the points
> that you plan to document.
>
> 2. The KIP lists some concerns, but not what you will do to mitigate them.
> For example, the concern about IQ not behaving correctly. Will you disable
> the use of the implicit partitioner downstream of one of these
> cancellations? Or provide a new interface to supply the "reverse mapping"
> you mentioned? Or include documentation in the Javadoc for how to deal with
> the situation? I think there are a range of options for each of those
> concerns, and we should state up front what we plan to do.
>
> Thanks again!
> -John
>
> On 2023/07/24 20:33:05 Sophie Blee-Goldman wrote:
> > Thanks Shay! You and Matthias have convinced me, I'm happy with the
> current
> > proposal. I think once you make the minor
> > updates to the KIP document this will be ready for voting again.
> >
> > Cheers,
> > Sophie
> >
> > On Mon, Jul 24, 2023 at 8:26 AM Shay Lin  wrote:
> >
> > > Hi Sophie and Matthias, thanks for your comments and replies.
> > >
> > > 1. Scope of change: KStreams only or KStreams/KTable
> > > I took some time to digest your points, looking through how KStreams
> > > triggers repartitions today. I noticed that `repartitionRequired`is a
> flag
> > > in KStreamImpl etc and not in KTableImpl etc. When I look further, in
> the
> > > case of KTable, instead of passing in a boolean flag, a repartition
> node `
> > > TableRepartitionMapNode` is directly created. I went back and
> referenced
> > > the two issue tickets KAFKA-10844 and KAFKA-4835, both requests were
> > > focused on KStreams, i.e. not to change the partition why the input
> streams
> > > are already correctly keyed. Is it possible that in the case of KTable,
> > > users always intend to repartition (change key) when they call on
> > > aggregate? -- (this was written before I saw Matthias's comment)
> > >
> > > Overall, based on the tickets, I see the benefit of doing a contained
> > > change focusing on KStreams, i.e. repartitionRequired, which would
> solve
> > > the pain points nicely. If we ran into similar complaints/optimization
> > > requests for KTable down the line, we can address them on top of
> this(let
> > > me know if we have these requests already, I might just be negligent).
> > >
> > > 2. API: markAsPartitioned() vs config
> > > If we go with the KStreams only scope, markAsPartition() is more
> > > adequate, i.e. maps nicely to repartitionRequired. There is a list of
> > > NamedOperations that may or may not trigger repartition based on its
> > > context(KStreams or KTable) which would make the implementation more
> > > confusing.
> > >
> > > 3. KIP documentation: Thanks for providing the links to previous KIPs.
> I
> > > will be adding the three use cases and javadoc. I will also document
> the
> > > risks when it relates to IQ and Join.
> > >
> > > Best,
> > > Shay
> > >
> > > On Fri, Jul 21, 2023 at 5:55 PM Matthias J. Sax 
> wrote:
> > >
> > > > I agree that it could easily be misused. There is a few Jira tickets
> for
> > > > cases when people want to "cancel" a repartition step. I would hope
> > > > those tickets are linked to the KIP (if not, we sh

Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-24 Thread Shay Lin
changing
> >> operators,
> >>> would you need to add markAsPartitioned after each one? If not, what
> are
> >>> the semantics of that?  These are the main questions that got me
> thinking
> >>> here, and will definitely need to be clarified in the KIP if we do go
> >> with
> >>> the current proposal. But I wanted to throw out another idea for an
> API I
> >>> think would help with some of this awkwardness by having clearly
> defined
> >>> semantics:
> >>>
> >>> Fundamentally it seems to me that these issues are arising from that
> >> "being
> >>> partitioned" is conceptually a property of other operations applied to
> a
> >>> KStream/KTable, rather than an operation itself. So rather than making
> >> this
> >>> a DSL operator itself, what if we added it to the Grouped and various
> >>> Joined configuration classes? It would allow us to more carefully hit
> >> only
> >>> the relevant parts of the DSL, so there are no questions about
> >> whether/when
> >>> to throw errors when the operator is incorrectly applied -- there would
> >> be
> >>> no way to apply it incorrectly. The main drawback I can think of is
> >> simply
> >>> that this touches on a larger surface area of the API. I personally
> don't
> >>> believe this is a good enough reason to make it a DSL operator as one
> >> could
> >>> make that argument for nearly any kind of KStream or KTable operator
> >>> configuration going forward, and would explode the KStream/KTable API
> >>> surface area instead. Perhaps this was discussed during the previous
> >>> iteration of this KIP, or I'm missing something here, so I just wanted
> to
> >>> put this out there and see what people think
> >>>
> >>> Either way, thanks for picking up this KIP. It's been a long time
> coming
> >> :)
> >>>
> >>> -Sophie
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Jul 10, 2023 at 2:05 PM Shay Lin  wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> It's been a few days so I went ahead with editing the KIP, the main
> >> change
> >>>> is on the method name
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-759%3A+Unneeded+repartition+canceling
> >>>> .
> >>>> I will follow up with a VOTE separately.
> >>>>
> >>>> Best,
> >>>> Shay
> >>>>
> >>>> On Thu, Jun 29, 2023 at 4:52 PM Matthias J. Sax 
> >> wrote:
> >>>>
> >>>>> Shay,
> >>>>>
> >>>>> thanks for picking up this KIP. It's a pity that the discussion
> stalled
> >>>>> for such a long time.
> >>>>>
> >>>>> As expressed previously, I am happy with the name
> `markAsPartitioned()`
> >>>>> and also believe it's ok to just document the impact and leave it to
> >> the
> >>>>> user to do the right thing.
> >>>>>
> >>>>> If we really get a lot of users that ask about it, because they did
> not
> >>>>> do the right thing, we could still add something (eg, a
> reverse-mapper
> >>>>> function) in a follow-up KIP. But we don't know if it's necessary;
> >> thus,
> >>>>> making a small incremental step sounds like a good approach to me.
> >>>>>
> >>>>> Let's see if others agree or not.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 6/28/23 5:29 PM, Shay Lin wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Great discussion thread. May I take this KIP up? If it’s alright my
> >>>> plan
> >>>>> is
> >>>>>> to update the KIP with the operator `markAsPartitioned()`.
> >>>>>>
> >>>>>> As you have discussed and pointed out, there are implications to
> >>>>> downstream
> >>>>>> joins or aggregation operations. Still, the operator is intended for
> >>>>>> advanced users so my two cents is it would be a valuable addition
> >>>>>> nonetheless. We could add this as a caution/consideration as part of
> >>>> the
> >>>>>> java doc.
> >>>>>>
> >>>>>> Let me know, thanks.
> >>>>>> Shay
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>


[VOTE] KIP-759: Unneeded repartition canceling

2023-07-11 Thread Shay Lin
Hi all,

I'd like to call a vote on KIP-759: Unneeded repartition canceling
The KIP has been under discussion for quite some time(two years). This is a
valuable optimization for advanced users. I hope we can push this toward
the finish line this time.

Link to the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-759%3A+Unneeded+repartition+canceling

Best,
Shay


Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-07-10 Thread Shay Lin
Hi all,

It's been a few days so I went ahead with editing the KIP, the main change
is on the method name
https://cwiki.apache.org/confluence/display/KAFKA/KIP-759%3A+Unneeded+repartition+canceling.
I will follow up with a VOTE separately.

Best,
Shay

On Thu, Jun 29, 2023 at 4:52 PM Matthias J. Sax  wrote:

> Shay,
>
> thanks for picking up this KIP. It's a pity that the discussion stalled
> for such a long time.
>
> As expressed previously, I am happy with the name `markAsPartitioned()`
> and also believe it's ok to just document the impact and leave it to the
> user to do the right thing.
>
> If we really get a lot of users that ask about it, because they did not
> do the right thing, we could still add something (eg, a reverse-mapper
> function) in a follow-up KIP. But we don't know if it's necessary; thus,
> making a small incremental step sounds like a good approach to me.
>
> Let's see if others agree or not.
>
>
> -Matthias
>
> On 6/28/23 5:29 PM, Shay Lin wrote:
> > Hi all,
> >
> > Great discussion thread. May I take this KIP up? If it’s alright my plan
> is
> > to update the KIP with the operator `markAsPartitioned()`.
> >
> > As you have discussed and pointed out, there are implications to
> downstream
> > joins or aggregation operations. Still, the operator is intended for
> > advanced users so my two cents is it would be a valuable addition
> > nonetheless. We could add this as a caution/consideration as part of the
> > java doc.
> >
> > Let me know, thanks.
> > Shay
> >
>


Permission to contribute to Apache Kafka

2023-07-06 Thread Shay Lin
Hi all,

Could you give me the appropriate access (edit Wiki/KIP, Jira etc) to
contribute to AK?

My confluence and Jira IDs are the same: lqxshay

Thanks in advance,
Shay


Re: [DISCUSS] KIP-759: Unneeded repartition canceling

2023-06-28 Thread Shay Lin
Hi all,

Great discussion thread. May I take this KIP up? If it’s alright my plan is
to update the KIP with the operator `markAsPartitioned()`.

As you have discussed and pointed out, there are implications to downstream
joins or aggregation operations. Still, the operator is intended for
advanced users so my two cents is it would be a valuable addition
nonetheless. We could add this as a caution/consideration as part of the
java doc.

Let me know, thanks.
Shay


[jira] [Created] (KAFKA-13699) ProcessorContext does not expose Stream Time

2022-02-28 Thread Shay Lin (Jira)
Shay Lin created KAFKA-13699:


 Summary: ProcessorContext does not expose Stream Time
 Key: KAFKA-13699
 URL: https://issues.apache.org/jira/browse/KAFKA-13699
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.7.0
Reporter: Shay Lin


As a KS developer, I would like to leverage 
[KIP-622|https://cwiki.apache.org/confluence/display/KAFKA/KIP-622%3A+Add+currentSystemTimeMs+and+currentStreamTimeMs+to+ProcessorContext]
 and access stream time in Processor Context.

 

However, the methods currentStreamTimeMs or currentSystemTimeMs is missing from 
for KStreams 2.7.0 (Java).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)