+1, we should do it. The implementation could be something on these line: - While assigning Kafka partitions to each source split during the first run, assign them deterministically. - Current round-robin assignment works fine for single topic. But is not deterministic while reading from more than one topic. We need to tweak the assignment to work well in that case. - On the worker, each reader should check the partitions for input topic (this can be part of existing periodic threads that checks backlog) - When partitions are added: - The readers (source splits) that new partitions belong to will start consuming from it. This is straight forward. - What if the new partition's watermark is older the current watermark? Can't do much about it since a watermark can not go back. - When the partitions are deleted: - This is a bit more tricky. - We need to handle the case a source split might not have any partitions assigned. - What should the watermark be? I think current wall time makes sense. Note that there could be new partitions added later.
On Wed, Jan 2, 2019 at 7:59 AM Alexey Romanenko <aromanenko....@gmail.com> wrote: > I just wanted to mention that there is quite old open issue about that: > https://issues.apache.org/jira/browse/BEAM-727 > > Fell free to take this one if anyone is interested. > > On 2 Jan 2019, at 15:22, Juan Carlos Garcia <jcgarc...@gmail.com> wrote: > > +1 > > Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer <quadeer....@gmail.com> > geschrieben: > >> +1 >> >> On Tue, 1 Jan 2019 at 12:45, <jan.d...@gmail.com> wrote: >> >>> +1 from my side too :-) >>> And ideally I would want to have some hooks to let me know the extra >>> partitions have been picked up (or a way to query it). >>> >>> Although if that can't be provided I can work around it myself by >>> sending some specific message to the partition that somewhere results in a >>> visible state change in the pipeline. >>> >>> Also, as a quick (semi related) heads up: I will very likely soon >>> contribute a change to the LogAppendTimePolicy so that the idle partition >>> behavior (automatic watermark generation) can be disabled. >>> >>> (of course all related to my streamy-db project) >>> >>> Kind regards, >>> Jan >>> >>> >>> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ramesh.ne...@gmail.com> >>> wrote: >>> >>>> +1 for this capability. This would enable pipelines to continue to run >>>> when such changes need to be made. >>>> >>>> regards >>>> Ramesh >>>> >>>> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <rang...@google.com> wrote: >>>> >>>>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <rang...@google.com> >>>>> wrote: >>>>> >>>>>> - New partitions will be ignored during runtime. >>>>>> - Update will not succeed either. Error message on the workers should >>>>>> explain the mismatch. >>>>>> >>>>> >>>>> This is the current state. Supporting changes to number of partition >>>>> is quite doable if there is enough user interested (even in the current >>>>> UnnoundedSource API framework). >>>>> >>>>>> >>>>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jozo.vil...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> just wanted to check how does Beam KafkaIO behaves when partitions >>>>>>> are added to the topic. >>>>>>> Will they be picked up or ignored during the runtime? >>>>>>> Will they be picked up on restart with state restore? >>>>>>> >>>>>>> Thanks, >>>>>>> Jozef >>>>>>> >>>>>> >