Beam and its Flink Runner do not allow setting the parallelism at the operator level. The wish to configure per-operator came up numerous times over the years. I'm not opposed to allowing for special cases, e.g. via a pipeline option.
It doesn't look like it is necessary for the use case discussed here. -Max On 06.05.20 18:27, Alexey Romanenko wrote: > One of the option when reading from topics with a small number of > partitions could be to do a Reshuffle right after read transform to > parallelize better other pipeline steps. > > We had a discussion in this Jira about that a while ago: > https://issues.apache.org/jira/browse/BEAM-8121 > >> On 30 Apr 2020, at 03:56, Eleanore Jin <eleanore....@gmail.com >> <mailto:eleanore....@gmail.com>> wrote: >> >> Thanks all for the information! >> >> Eleanore >> >> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goe...@google.com >> <mailto:goe...@google.com>> wrote: >> >> Beam does support parallelism for the job which applies to all the >> transforms in the job when executing on Flink using the >> "--parallelism" flag. >> >> From the usecase you mentioned, Kafka read operations will be over >> parallelised but it should be ok as they will only have a small >> amount of memory impact in loading some state for kafka client etc. >> Also flink can run multiple operations for the same Job in a >> single task slot so having higher parallelism for lightweight >> operations should not be a problem. >> >> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com >> <mailto:lc...@google.com>> wrote: >> >> Beam doesn't expose such a thing directly but the FlinkRunner >> may be able to take some pipeline options to configure this. >> >> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin >> <eleanore....@gmail.com <mailto:eleanore....@gmail.com>> wrote: >> >> Hi Kyle, >> >> I am using Flink Runner (v1.8.2) >> >> Thanks! >> Eleanore >> >> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver >> <kcwea...@google.com <mailto:kcwea...@google.com>> wrote: >> >> Which runner are you using? >> >> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin >> <eleanore....@gmail.com >> <mailto:eleanore....@gmail.com>> wrote: >> >> Hi all, >> >> I just wonder can Beam allow to set >> parallelism for each operator (PTransform) >> separately? Flink provides such feature. >> >> The usecase I have is the source is kafka topics, >> which has less partitions, while we have heavy >> PTransform and would like to scale it with more >> parallelism. >> >> Thanks a lot! >> Eleanore >> >