Beam and its Flink Runner do not allow setting the parallelism at the
operator level. The wish to configure per-operator came up numerous
times over the years. I'm not opposed to allowing  for special cases,
e.g. via a pipeline option.

It doesn't look like it is necessary for the use case discussed here.

-Max

On 06.05.20 18:27, Alexey Romanenko wrote:
> One of the option when reading from topics with a small number of
> partitions could be to do a Reshuffle right after read transform to
> parallelize better other pipeline steps.
> 
> We had a discussion in this Jira about that a while ago:
> https://issues.apache.org/jira/browse/BEAM-8121
> 
>> On 30 Apr 2020, at 03:56, Eleanore Jin <eleanore....@gmail.com
>> <mailto:eleanore....@gmail.com>> wrote:
>>
>> Thanks all for the information! 
>>
>> Eleanore 
>>
>> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goe...@google.com
>> <mailto:goe...@google.com>> wrote:
>>
>>     Beam does support parallelism for the job which applies to all the
>>     transforms in the job when executing on Flink using the
>>     "--parallelism" flag.
>>
>>     From the usecase you mentioned, Kafka read operations will be over
>>     parallelised but it should be ok as they will only have a small
>>     amount of memory impact in loading some state for kafka client etc.
>>     Also flink can run multiple operations for the same Job in a
>>     single task slot so having higher parallelism for lightweight
>>     operations should not be a problem.
>>
>>     On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com
>>     <mailto:lc...@google.com>> wrote:
>>
>>         Beam doesn't expose such a thing directly but the FlinkRunner
>>         may be able to take some pipeline options to configure this.
>>
>>         On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin
>>         <eleanore....@gmail.com <mailto:eleanore....@gmail.com>> wrote:
>>
>>             Hi Kyle, 
>>
>>             I am using Flink Runner (v1.8.2)
>>
>>             Thanks!
>>             Eleanore
>>
>>             On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver
>>             <kcwea...@google.com <mailto:kcwea...@google.com>> wrote:
>>
>>                 Which runner are you using?
>>
>>                 On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin
>>                 <eleanore....@gmail.com
>>                 <mailto:eleanore....@gmail.com>> wrote:
>>
>>                     Hi all, 
>>
>>                     I just wonder can Beam allow to set
>>                     parallelism for each operator (PTransform)
>>                     separately? Flink provides such feature. 
>>
>>                     The usecase I have is the source is kafka topics,
>>                     which has less partitions, while we have heavy
>>                     PTransform and would like to scale it with more
>>                     parallelism. 
>>
>>                     Thanks a lot!
>>                     Eleanore
>>
> 

Reply via email to