Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-07 Thread Matthias J. Sax
Thanks for the background. Was just curious about the details. I agree that we should not add a new backoff config at this point. -Matthias On 12/2/22 4:47 PM, Sophie Blee-Goldman wrote: I missed the default config values as they were put into comments... You don't read code comments?

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Sophie Blee-Goldman
> > I missed the default config values as they were put into comments... You don't read code comments? (jk...sorry, wasn't sure where the best place for this would be, suppose I could've just included the full config definition About the default timeout: what is the follow up rebalance

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Matthias J. Sax
Thanks Sophie. Good catch on the default partitioner issue! I missed the default config values as they were put into comments... About the default timeout: what is the follow up rebalance cadence (I though it would be 10 minutes?). For this case, a default timeout of 15 minutes would imply

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Sophie Blee-Goldman
Thanks again for the responses -- just want to say up front that I realized the concept of a default partitioner is actually substantially more complicated than I first assumed due to key/value typing, so I pulled it from this KIP and filed a ticket for it for now. Bruno, What is exactly the

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-01 Thread Matthias J. Sax
Thanks for updating the KIP Sophie. I have the same question as Bruno. How can the user use the failure metric and what actions can be taken to react if the metric increases? Plus a few more: (1) Do we assume that user can reason about `subtopology-parallelism` metric to figure out if

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-11-28 Thread Bruno Cadonna
Hi Sophie, Thanks for the updates! I also feel the KIP is much cleaner now. I have one question: What is exactly the motivation behind metric num-autoscaling-failures? Actually, to realise that autoscaling did not work, we only need to monitor subtopology-parallelism over

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-11-19 Thread Sophie Blee-Goldman
Thanks for the feedback everyone. I went back to the drawing board with a different guiding philosophy: that the users of this feature will generally be fairly advanced, and we should give them full flexibility to implement whatever they need while trusting them to know what they are doing. With

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-11-07 Thread Matthias J. Sax
Thanks for the KIP Sophie. Seems there is a lively discussion going on. I tried to read up on the history and I hope I don't repeat what was already discussed. And sorry for the quite long email... (1) Stateless vs Stateful I agree that stateless apps should be supported, even if I am not

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-11-01 Thread Luke Chen
Hi Sophie, Thanks for the KIP. A very useful proposal! Some questions: 1. the staticPartition method in the interface is commented out. 2. For error handling, as you can imagine, there could be errors happening during partition expansion.That means, the operation would be (1) take long time to

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-31 Thread Bruno Cadonna
Hi Sophie, Thank you for the KIP! 1. I do not understand how autoscaling should work with a Streams topology with a stateful sub-topology that reads from the input topics. The simplest example is a topology that consists of only one stateful sub-topology. As far as I understand the upstream

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-27 Thread Sophie Blee-Goldman
Thanks all! I'll try to address everything but don't hesitate to call me out if anything is missed Colt/Lucas: Thanks for clarifying, I think I understand your example now. Something I didn't think to mention earlier but hopefully clears up how this would be used in practice is that the

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-27 Thread Sagar
Hey Sophie, This looks like a very nice feature. Going through the comments, I agree with Bill above that there could be a case for skew on keys given the earlier partitions would have the data which it already had and get some more. Do you think that's a concern/side-effect that this feature

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-25 Thread Walker Carlson
Hey Sophie, Thanks for the KIP. I think this could be useful for a lot of cases. I also think that this could cause a lot of confusion. Just to make sure we are doing our best to prevent people from misusing this feature, I wanted to clarify a couple of things. 1) There will be only an interface

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-25 Thread Bill Bejeck
Hi Sophie, Thanks for the KIP! I think this is a worthwhile feature to add. I have two main questions about how this new feature will work. 1. You mention that for stateless applications auto-scaling is a sticker situation. But I was thinking that the auto-scaling would actually benefit

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-21 Thread Lucas Brutschy
Hi all, thanks, Sophie, this makes sense. I suppose then the way to help the user not apply this in the wrong setting is having good documentation and a one or two examples of good use cases. I think Colt's time-based partitioning is a good example of how to use this. It actually doesn't have to

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-21 Thread Colt McNealy
Sophie, Regarding item "3" (my last paragraph from the previous email), perhaps I should give a more general example now that I've had more time to clarify my thoughts: In some stateful applications, certain keys have to be findable without any information about when the relevant data was

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-20 Thread Sophie Blee-Goldman
Thanks for the responses guys! I'll get the easy stuff out of the way first: 1) Fixed the KIP so that StaticStreamPartitioner extends StreamPartitioner 2) I totally agree with you Colt, the record value might have valuable (no pun) information in it that is needed to compute the partition without

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-20 Thread Colt McNealy
Sophie, Thank you for your detailed response. That makes sense (one partition per user seems like a lot of extra metadata if you've got millions of users, but I'm guessing that was just for illustrative purposes). In this case I'd like to question one small detail in your kip. The

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-20 Thread Lucas Brutschy
Hi Sophie, This looks like a good improvement (given my limited knowledge, at least). As I understand it, in the subset of use cases where it can be used, it will make scaling up the #partitions basically frictionless. Three questions, and forgive me if something doesn't make sense at all: 1)

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-19 Thread Sophie Blee-Goldman
Thanks for your questions, I would say that your understanding sounds correct based on what you described but I'll try to add some clarity. The basic idea is that, as you said, any keys that are processed before time T will go to partition 1. All of those keys should then continue to be routed to

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-19 Thread Colt McNealy
Sophie, Thank you for the KIP! Choosing the number of partitions in a Streams app is a tricky task because of how difficult it is to re-partition; I'm glad you're working on an improvement. I've got two questions: First, `StaticStreamsPartitioner` is an interface that we (Streams users) must