Hi, The Nemo runner supports setting parallelism for each PTransform. You can configure a Nemo optimization pass that traverses the operators of your Beam program, and annotates the ParallelismProperty of operators of your choice.
Here is an example pass for setting parallelism: https://github.com/apache/incubator-nemo/blob/master/compiler/optimizer/src/main/java/org/apache/nemo/compiler/optimizer/pass/compiletime/annotating/DefaultParallelismPass.java <https://github.com/apache/incubator-nemo/blob/master/compiler/optimizer/src/main/java/org/apache/nemo/compiler/optimizer/pass/compiletime/annotating/DefaultParallelismPass.java> Thanks, John On 2020/05/06 16:27:19, Alexey Romanenko <[email protected]> wrote: > One of the option when reading from topics with a small number of partitions > could be to do a Reshuffle right after read transform to parallelize better > other pipeline steps.> > > We had a discussion in this Jira about that a while ago:> > https://issues.apache.org/jira/browse/BEAM-8121 > <https://issues.apache.org/jira/browse/BEAM-8121>> > > > On 30 Apr 2020, at 03:56, Eleanore Jin <[email protected]> wrote:> > > > > > Thanks all for the information! > > > > > > Eleanore > > > > > > On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <[email protected] > > <[email protected]>> wrote:> > > Beam does support parallelism for the job which applies to all the > > transforms in the job when executing on Flink using the "--parallelism" > > flag.> > > > > > From the usecase you mentioned, Kafka read operations will be over > > parallelised but it should be ok as they will only have a small amount of > > memory impact in loading some state for kafka client etc.> > > Also flink can run multiple operations for the same Job in a single task > > slot so having higher parallelism for lightweight operations should not be > > a problem.> > > > > > On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <[email protected] > > <[email protected]>> wrote:> > > Beam doesn't expose such a thing directly but the FlinkRunner may be able > > to take some pipeline options to configure this.> > > > > > On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <[email protected] > > <[email protected]>> wrote:> > > Hi Kyle, > > > > > > I am using Flink Runner (v1.8.2)> > > > > > Thanks!> > > Eleanore> > > > > > On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <[email protected] > > <[email protected]>> wrote:> > > Which runner are you using?> > > > > > On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <[email protected] > > <[email protected]>> wrote:> > > Hi all, > > > > > > I just wonder can Beam allow to set parallelism for each operator > > (PTransform) separately? Flink provides such feature. > > > > > > The usecase I have is the source is kafka topics, which has less > > partitions, while we have heavy PTransform and would like to scale it with > > more parallelism. > > > > > > Thanks a lot!> > > Eleanore> > >
