Hi Andrea,

the number of nodes usually depends on the work that you do within your Functions.

E.g. if you have a computation intensive machine learning library in a MapFunction and takes 10 seconds per element, it might make sense to paralellize this in order to increase your throughput. Or if you have to save state of several GBs per key which would not fit on one machine.

Flink does not only parallelize per node but also per "slot". If you start your application with a parallelism of 2 (and have not configured custom parallelisms per operator), you will have two pipelines that process elements (so two MapFunctions are running in parallel one in each pipeline). 2 slots are occupied in this case. There are operations (like keyBy) that break this pipeline and repartition your data.

If you want to run operators in separate slots you can start a new chain (see here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups)

If you set parallelism to 'N' but I have less than 'N' SLOTS available, you cannot execute the job.

I hope my explanation helps.

Regards,
Timo


Am 22.06.17 um 16:54 schrieb AndreaKinn:
Hello,
I'm developing a Flink toy-application on my local machine before to deploy
the real one on a real cluster.
Now I have to determine how many nodes I need to set the cluster.

I already read these documents:
jobs and scheduling
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html>
programming model
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/programming-model.html>
parallelism
<https://flink.apache.org/faq.html#what-is-the-parallelism-how-do-i-set-it>

But I'm still a bit confused about how many nodes I have to consider to
execute my application.

For example if I have the following code (from the doc):
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13927/Screen_Shot_2017-06-22_at_16.png>

- This means that operations "on same line" are executed on same node? (It
sounds a bit strange to me)

Some confirms:
- If the answer to previous question is yes and if I set parallelism to '1'
I can establish how many nodes I need counting how many operations I have to
perform ?
- If I set parallelism to 'N' but I have less than 'N' nodes available Flink
automatically scales the elaboration on available nodes?

My throughput and data load is not relevant I think, it is not heavy.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/About-nodes-number-on-Flink-tp13927.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Reply via email to