Re: About nodes number on Flink

Timo Walther Fri, 23 Jun 2017 05:15:34 -0700

Hi Andrea,

the number of nodes usually depends on the work that you do within yourFunctions.

E.g. if you have a computation intensive machine learning library in aMapFunction and takes 10 seconds per element, it might make sense toparalellize this in order to increase your throughput. Or if you have tosave state of several GBs per key which would not fit on one machine.

Flink does not only parallelize per node but also per "slot". If youstart your application with a parallelism of 2 (and have not configuredcustom parallelisms per operator), you will have two pipelines thatprocess elements (so two MapFunctions are running in parallel one ineach pipeline). 2 slots are occupied in this case. There are operations(like keyBy) that break this pipeline and repartition your data.

If you want to run operators in separate slots you can start a new chain(see here:https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups)

If you set parallelism to 'N' but I have less than 'N' SLOTS available,you cannot execute the job.


I hope my explanation helps.

Regards,
Timo


Am 22.06.17 um 16:54 schrieb AndreaKinn:

Hello,
I'm developing a Flink toy-application on my local machine before to deploy
the real one on a real cluster.
Now I have to determine how many nodes I need to set the cluster.

I already read these documents:
jobs and scheduling
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html>
programming model
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/programming-model.html>
parallelism
<https://flink.apache.org/faq.html#what-is-the-parallelism-how-do-i-set-it>

But I'm still a bit confused about how many nodes I have to consider to
execute my application.

For example if I have the following code (from the doc):
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13927/Screen_Shot_2017-06-22_at_16.png>

- This means that operations "on same line" are executed on same node? (It
sounds a bit strange to me)

Some confirms:
- If the answer to previous question is yes and if I set parallelism to '1'
I can establish how many nodes I need counting how many operations I have to
perform ?
- If I set parallelism to 'N' but I have less than 'N' nodes available Flink
automatically scales the elaboration on available nodes?

My throughput and data load is not relevant I think, it is not heavy.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/About-nodes-number-on-Flink-tp13927.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: About nodes number on Flink

Reply via email to