+1, we should do it.
The implementation could be something on these line:

   - While assigning Kafka partitions to each source split during the first
   run, assign them deterministically.
      - Current round-robin assignment works fine for single topic. But is
      not deterministic while reading from more than one topic. We
need to tweak
      the assignment to work well in that case.
   - On the worker, each reader should check the partitions for input topic
   (this can be part of existing periodic threads that checks backlog)
   - When partitions are added:
      - The readers (source splits) that new partitions belong to will
      start consuming from it. This is straight forward.
      - What if the new partition's watermark is older the current
      watermark? Can't do much about it since a watermark can not go back.
   - When the partitions are deleted:
      - This is a bit more tricky.
      - We need to handle the case a source split might not have any
      partitions assigned.
         - What should the watermark be? I think current wall time makes
         sense. Note that there could be new partitions added later.


On Wed, Jan 2, 2019 at 7:59 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

> I just wanted to mention that there is quite old open issue about that:
> https://issues.apache.org/jira/browse/BEAM-727
>
>  Fell free to take this one if anyone is interested.
>
> On 2 Jan 2019, at 15:22, Juan Carlos Garcia <jcgarc...@gmail.com> wrote:
>
> +1
>
> Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer <quadeer....@gmail.com>
> geschrieben:
>
>> +1
>>
>> On Tue, 1 Jan 2019 at 12:45, <jan.d...@gmail.com> wrote:
>>
>>> +1 from my side too :-)
>>> And ideally I would want to have some hooks to let me know the extra
>>> partitions have been picked up (or a way to query it).
>>>
>>> Although if that can't be provided I can work around it myself by
>>> sending some specific message to the partition that somewhere results in a
>>> visible state change in the pipeline.
>>>
>>> Also, as a quick (semi related) heads up: I will very likely soon
>>> contribute a change to the LogAppendTimePolicy so that the idle partition
>>> behavior (automatic watermark generation) can be disabled.
>>>
>>> (of course all related to my streamy-db project)
>>>
>>> Kind regards,
>>> Jan
>>>
>>>
>>> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ramesh.ne...@gmail.com>
>>> wrote:
>>>
>>>> +1 for this capability.  This would enable pipelines to continue to run
>>>> when such changes need to be made.
>>>>
>>>> regards
>>>> Ramesh
>>>>
>>>> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <rang...@google.com> wrote:
>>>>
>>>>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <rang...@google.com>
>>>>> wrote:
>>>>>
>>>>>> - New partitions will be ignored during runtime.
>>>>>> - Update will not succeed either. Error message on the workers should
>>>>>> explain the mismatch.
>>>>>>
>>>>>
>>>>> This is the current state. Supporting changes to number of partition
>>>>> is quite doable if there is enough user interested (even in the current
>>>>> UnnoundedSource API framework).
>>>>>
>>>>>>
>>>>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jozo.vil...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> just wanted to check how does Beam KafkaIO behaves when partitions
>>>>>>> are added to the topic.
>>>>>>> Will they be picked up or ignored during the runtime?
>>>>>>> Will they be picked up on restart with state restore?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jozef
>>>>>>>
>>>>>>
>

Reply via email to