Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Yi Pan
@Dong, thanks for the updates. +1 On Thu, Jun 22, 2017 at 3:36 PM, Dong Lin wrote: > Hey Yi, > > Thanks for the detailed comment and the summary! > > To address your comments: > > 1) The current names are GroupByPartitionWithFixedTaskNum and >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh
Yi, Thanks for summarizing. I think we should deal with further code related changes/discussions in the PR directly since this SEP has been open for a while. Let's try to wrap up the discussions by today. @Dong: Thanks for updating the SEP. I think the TestPlan section is TBD right now. You can

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-22 Thread Dong Lin
Hey Yi, Thanks for the detailed comment and the summary! To address your comments: 1) The current names are GroupByPartitionWithFixedTaskNum and GroupBySystemStreamPartitionWithFixedTaskNum. Instead of FixedTasksGroupByPartition and FixedTasksGroupBySystemStreamPartition, how about

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-22 Thread Yi Pan
Hi, Dong and everyone, Thanks for the detailed discussion on SEP-5! Really appreciate the thorough consideration on this issue. I also noticed that Dong has updated the SEP-5 wiki to clarify: 1) SEP-5 provides a solution to retain the same number of task/state w/o re-partitioning (as illustrated

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Dong Lin
Hey Navina, I appreciate all the comments you have provided! I have updated the wiki to remove the task expansion from Rejected Alternative section and put it only in the Future Work section. Thanks much! Dong On Wed, Jun 21, 2017 at 5:11 PM, Navina Ramesh (Apache) wrote: >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
> But IMO it is the best available solution towards the support of partition expansion in comparison to alternative, no? At this time, relative to the other alternatives you have listed, this is a path of least effort to solving this problem. I agree to that. :) > I can merge those two sections

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Dong Lin
Thanks much for the reply Navina. Please see my reply inline. On Wed, Jun 21, 2017 at 2:57 PM, Navina Ramesh (Apache) wrote: > Thanks to Jake, Dong and Kartik for keeping the discussion going. > > > Here are the pros and cons of the extra re-partitioning stage in > comparison

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
Thanks to Jake, Dong and Kartik for keeping the discussion going. > Here are the pros and cons of the extra re-partitioning stage in comparison to SEP-5. I think that is good summarization of pros/cons for the repartitioning stage based solution. Can you please include it in your SEP? It seems

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-18 Thread Dong Lin
BTW, I will update the SEP-5 wiki with our latest discussion after I have got the wiki edit access. On Sat, Jun 17, 2017 at 11:36 PM, Dong Lin wrote: > Thanks everyone for the comment! > > I am currently leaning towards the current approach. I think Kartik raised > a good

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-18 Thread Dong Lin
Thanks everyone for the comment! I am currently leaning towards the current approach. I think Kartik raised a good point that the extra repartitoning stage will also incur additional throughput on Kafka in addition to the potential storage cost. Any other Samza developers also chime in and

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-15 Thread Kartik Paramasivam
Great discussion ! Here are some more thoughts The point that repartitioning is a more general purpose solution is surely spot on. For many source systems (Kinesis, Google Pub-Sub, any of the older queuing systems (rabbitMQ etc. etc.), repartitioning is anyways functionally required to do even

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-15 Thread Jacob Maes
Thanks, Dong. The summary looks accurate. I'll let the others chime in, as I believe my perspective has been adequately captured in this thread. -Jake On Wed, Jun 14, 2017 at 12:12 PM, Dong Lin wrote: > Hey Jacob, > > Thank you for taking so much time to discuss with me!

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-14 Thread Dong Lin
Hey Jacob, Thank you for taking so much time to discuss with me! I appreciate the discussion and the insight. I will summarize our discussion below. 1) Whether it is reasonable to store partition-to-task mapping. We agree that this partition-to-task mapping will be reasonable if we allow user

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-13 Thread Jacob Maes
Hey Dong, I appreciate your thoughtful responses. Let's do one more round :-) > Here are my current concern with the three alternatives you described > earlier: > - The first alternative requires support from input system which is > currently not available. It will limit the usage of partition

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Thanks for the reply Jacob. Please see my comment inline. On Mon, Jun 12, 2017 at 7:51 PM, Jacob Maes wrote: > > > > - For users that need partition expansion of the input streams for > stateful > > job, they have a really big headache in the sense that Samza does not >

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Jacob Maes
> > - For users that need partition expansion of the input streams for stateful > job, they have a really big headache in the sense that Samza does not allow > partition expansion for stateful job. SEP-5 addresses this headache for > them. > You are right that SEP-5 requires user to understand and

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Thanks Xinyu offering a solution. Yeah, we have actually listed it as the third rejected alternative in SEP-5 . I can move this to future work. I think it is actually a great idea to support more

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Hey Jacob, Thanks for the explanation. It seems that your biggest concern is with the generality of the proposal. Let me try to address this and other comments below. 1) ... it will cause headaches for Samza users ... I am not sure I understand why this proposal causes headache for Samza users.

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread xinyu liu
How about making the partition mapping function a pluggable component in the partition expansion? Mathematically, this is a mapping function which is able to map the new partitions to the old ones: *f (new partition) -> old partition* If the function is a surjective function (

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Jacob Maes
Hey Dong, I'm opposed (or a +0, at best) to this limited, Kafka-specific solution. I understand that the proposal is relatively simple to implement, but I think it will cause headaches for Samza users. They will not only have to understand all the limitations (increase only, double partitions

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-07 Thread Dong Lin
Hey Jacob, Navina, Yi, I am wondering if my answer has addressed your concern. Can you let me know if there is any concern with SEP? Thanks, Dong On Tue, Jun 6, 2017 at 11:06 PM, Dong Lin wrote: > Hey Jacob, > > Thanks for taking time to review the SEP. > > I agree with

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-05 Thread Jacob Maes
Hey Dong, Thanks for the SEP. Supporting partition changes is critically important for stateful Samza jobs, so it's great to see some ideas on that front! Sorry for the late feedback, but I have a few thoughts to contribute. Big +1 on Navina's comment: > My biggest gripe with this SEP is that

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-04 Thread Dong Lin
Hey Yi, Navina, I have updated the SEP-5 document based on our discussion. The difference can be found here . Here is the summary of changes: - Add new interface that extends the existing interface

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-01 Thread Dong Lin
Hey Yi, Thanks much for the comment. I have updated the doc to address all your comments except the one related to the interface. I am not sure I understand your suggestion of the new interface. Will discuss tomorrow. Thanks, Dong On Wed, May 31, 2017 at 4:29 PM, Yi Pan

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-01 Thread Dong Lin
Hey Navina, My point is that, suppose the underlying system allows user to select arbitrary partition number during partition expansion, which I assume is applicable to all input systems that Samza will use, then we can easily enforce the rule that expansion of partitions should always happen by

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
Dong, Thanks for your prompt responses. > And usually the underlying system allows user to select arbitrary partition number if it supports partition expansion. Do you know any system that does not meet these two requirement? I am not aware of a system that won't meet the modulo requirement. I

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Yi Pan
Hi, Don, Thanks for the detailed design doc for a long-waited feature in Samza! Really appreciate it! I did a quick pass and have the following comments: - minor: "limit the maximum size of partition" ==> "limit the maximum size of each partition" - "However, Samza currently is not able to

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Dong Lin
Hey Navina, Thanks much for the comment. Please see my response below. Regarding your biggest gripe with the SEP, I personally think the operational requirement proposed in the KIP are pretty general and could be easily enforced by other systems. The reason is that the module operation is pretty

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
Hey Dong, > I have updated the motivation section to clarify this. Thanks for updating the motivation. Couple of notes here: 1. > "The motivation of increasing partition number of Kafka topic includes 1) limit the maximum size of a partition in order to improve broker performance and 2)

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-29 Thread Dong Lin
Hey Navina, I have updated the wiki based on your suggestion. More specifically, I have made the following changes: - Improved Problem section and Motivation section to describe why we use the solution in this proposal instead of tackling the problem of task expansion directly. - Illustrate the

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-25 Thread Dong Lin
Hey Navina, Thanks much for your comments. Please see my reply inline. On Wed, May 24, 2017 at 10:22 AM, Navina Ramesh (Apache) wrote: > Thanks for the SEP, Dong. I have a couple of questions to understand your > proposal better: > > * Under motivation, you mention that "_We

[DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-23 Thread Dong Lin
Hi all, We created SEP-5: Enable partition expansion of input streams. Please find the SEP wiki in the link https://cwiki.apache.org/confluence/display/SAMZA/SEP-5%3A+Enable+partition+expansion+of+input+streams . You feedback is appreciated! Thanks, Dong