Re: How does setMaxParallelism work

2018-03-30 Thread Nico Kruber
ans.com] > Sent: Wednesday, March 28, 2018 8:54 AM > To: Data Engineer <dataenginee...@gmail.com> > Cc: Jörn Franke <jornfra...@gmail.com>; user@flink.apache.org > Subject: Re: How does setMaxParallelism work > > Flink does not decide the parallelism based on y

RE: How does setMaxParallelism work

2018-03-29 Thread NEKRASSOV, ALEXEI
-artisans.com] Sent: Wednesday, March 28, 2018 8:54 AM To: Data Engineer <dataenginee...@gmail.com> Cc: Jörn Franke <jornfra...@gmail.com>; user@flink.apache.org Subject: Re: How does setMaxParallelism work Flink does not decide the parallelism based on your job. There is a default parallelism

Re: How does setMaxParallelism work

2018-03-28 Thread Nico Kruber
Flink does not decide the parallelism based on your job. There is a default parallelism (configured via parallelism.default [1], by default 1) which is used if you do not specify it yourself. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#common-options On

Re: How does setMaxParallelism work

2018-03-28 Thread Data Engineer
Agreed. But how did Flink decide that it should allot 1 subtask? Why not 2 or 3? I am trying to understand the implications of using setMaxParallelism vs setParallelism On Wed, Mar 28, 2018 at 2:58 PM, Nico Kruber wrote: > Hi James, > the number of subtasks being used is

Re: How does setMaxParallelism work

2018-03-28 Thread Nico Kruber
Hi James, the number of subtasks being used is defined by the parallelism, the max parallelism, however, "... determines the maximum parallelism to which you can scale operators" [1]. That is, once set, you cannot ever (even after restarting your program from a savepoint) increase the operator's

Re: How does setMaxParallelism work

2018-03-28 Thread Data Engineer
I have a sample application that reads around 2 GB of csv files, converts each record into Avro object and sends it to kafka. I use a custom FileReader that reads the files in a directory. I have set taskmanager.numberOfTaskSlots to 4. I see that if I use setParallelism(3), 3 subtasks are created.

Re: How does setMaxParallelism work

2018-03-28 Thread Jörn Franke
What was the input format, the size and the program that you tried to execute > On 28. Mar 2018, at 08:18, Data Engineer wrote: > > I went through the explanation on MaxParallelism in the official docs here: >

How does setMaxParallelism work

2018-03-28 Thread Data Engineer
I went through the explanation on MaxParallelism in the official docs here: https://ci.apache.org/projects/flink/flink-docs-master/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly However, I am not able to figure out how Flink decides the parallelism value. For instance,