Re: Execution graph

Michele Bertoni Tue, 30 Jun 2015 07:49:34 -0700

Hi everybody and thanks for the answer

So if I understood you said that
apart from some operation, most of them are executed at the default parallelism 
value (that is what I expected)
but the viewer will always show 1 if something different is not set via 
setParallelism


is it right?

I don’t have particular need, the higher is the parallelism the better
I am able to bin my data in more groups than the number of workers in the 
cluster, is it better to explicitly write the degree of parallelism or can I 
leave it blank (so = to default)?


thanks
Michele


Il giorno 30/giu/2015, alle ore 10:41, Fabian Hueske 
<fhue...@gmail.com<mailto:fhue...@gmail.com>> ha scritto:

As an addition, some operators can only be run with a parallelism of 1. For 
example data sources based on collections and (un-grouped) all reduces. In some 
cases, the parallelism of the following operators will as well be set to 1 to 
avoid a network shuffle.

If you do:

env.fromCollection(myCollection).map(new MyMapper()).groupBy(0).reduce(new 
MyReduce()).writeToFile();

the data source and mapper will be run with a parallelism of 1, the reducer and 
sink will be executed with the default parallelism.

Best, Fabian

2015-06-30 10:25 GMT+02:00 Maximilian Michels 
<m...@apache.org<mailto:m...@apache.org>>:
Hi Michele,

If you don't set the parallelism, the default parallelism is used. For the 
visualization in the web client, a parallelism of one is used. When you run 
your example from your IDE, the default parallelism is set to the number of 
(virtual) cores of your CPU.

Moreover, Flink will currently not automatically set the parallelism in a 
cluster environment. It will use the default parallelism or the user-set 
parallelism. In your example, if you set the parallelism explicitly then it 
will also show up in the visualization.

Best,
Max

On Tue, Jun 30, 2015 at 7:11 AM, Michele Bertoni 
<michele1.bert...@mail.polimi.it<mailto:michele1.bert...@mail.polimi.it>> wrote:
Hi, I was trying to run my program in the flink web environment (the local one)
when I run it I get the graph of the planned execution but in each node there 
is a "parallelism = 1”, instead i think it runs with par = 8 (8 core, i  always 
get 8 output)

what does that mean?
is that wrong or is it really running with 1 degree of par?

just a note: I never do any setParallelism() command, i leave it automatical

thanks
Best
Michele

Re: Execution graph

Reply via email to