Re: Set parallelism for each operator

John Youngseok Yang Wed, 06 May 2020 23:38:03 -0700

Hi,

The Nemo runner supports setting parallelism for each PTransform.
You can configure a Nemo optimization pass that traverses the operators of your 
Beam program, and annotates the ParallelismProperty of operators of your choice.


Here is an example pass for setting parallelism:
https://github.com/apache/incubator-nemo/blob/master/compiler/optimizer/src/main/java/org/apache/nemo/compiler/optimizer/pass/compiletime/annotating/DefaultParallelismPass.java
 
<https://github.com/apache/incubator-nemo/blob/master/compiler/optimizer/src/main/java/org/apache/nemo/compiler/optimizer/pass/compiletime/annotating/DefaultParallelismPass.java>

Thanks,
John

On 2020/05/06 16:27:19, Alexey Romanenko <[email protected]> wrote: 
> One of the option when reading from topics with a small number of partitions 
> could be to do a Reshuffle right after read transform to parallelize better 
> other pipeline steps.> 
> 
> We had a discussion in this Jira about that a while ago:> 
> https://issues.apache.org/jira/browse/BEAM-8121 
> <https://issues.apache.org/jira/browse/BEAM-8121>> 
> 
> > On 30 Apr 2020, at 03:56, Eleanore Jin <[email protected]> wrote:> 
> > > 
> > Thanks all for the information! > 
> > > 
> > Eleanore > 
> > > 
> > On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <[email protected] 
> > <[email protected]>> wrote:> 
> > Beam does support parallelism for the job which applies to all the 
> > transforms in the job when executing on Flink using the "--parallelism" 
> > flag.> 
> > > 
> > From the usecase you mentioned, Kafka read operations will be over 
> > parallelised but it should be ok as they will only have a small amount of 
> > memory impact in loading some state for kafka client etc.> 
> > Also flink can run multiple operations for the same Job in a single task 
> > slot so having higher parallelism for lightweight operations should not be 
> > a problem.> 
> > > 
> > On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <[email protected] 
> > <[email protected]>> wrote:> 
> > Beam doesn't expose such a thing directly but the FlinkRunner may be able 
> > to take some pipeline options to configure this.> 
> > > 
> > On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <[email protected] 
> > <[email protected]>> wrote:> 
> > Hi Kyle, > 
> > > 
> > I am using Flink Runner (v1.8.2)> 
> > > 
> > Thanks!> 
> > Eleanore> 
> > > 
> > On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <[email protected] 
> > <[email protected]>> wrote:> 
> > Which runner are you using?> 
> > > 
> > On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <[email protected] 
> > <[email protected]>> wrote:> 
> > Hi all, > 
> > > 
> > I just wonder can Beam allow to set parallelism for each operator 
> > (PTransform) separately? Flink provides such feature. > 
> > > 
> > The usecase I have is the source is kafka topics, which has less 
> > partitions, while we have heavy PTransform and would like to scale it with 
> > more parallelism. > 
> > > 
> > Thanks a lot!> 
> > Eleanore> 
> 
>

Re: Set parallelism for each operator

Reply via email to