All:
I'm running a Flink 0.10.2 App by submitting to YARN as an application.
I'm using an AWS EMR cluster of 1 Master and 10 d2.8xlarge. When I submit
the job using:
bin/flink run \
-m yarn-cluster \
-yjm 20480 \
-yn 10 \
-ytm 80960 \
-ys 36 \
-yD taskmanager.network.numberOfBuffers=*51840* \
...
I'm seeing this error:
Caused by: java.io.IOException: Insufficient number of network buffers:
required *360*, but only *315* available. The total number of network
buffers is currently set to *51840*. You can increase this number by
setting the configuration key '*taskmanager.network.numberOfBuffers*'.
at
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
at
org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
at java.lang.Thread.run(Thread.java:745)
The error message does not seem to be conveying the correct information.
Can someone explain to me, what are reasonable numbers to use for
*taskmanager.network.numberOfBuffers* and *t*
*askmanager.network.bufferSizeInBytes*
I've read this:
https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers
and this:
http://stackoverflow.com/questions/33589710/flink-cluster-params-how-to-set
But I am still unclear of the calculus is it supposed to be?
#cores ^ 2 * #machines * 4
So, in my case 36 ^ 2 * 10 * 4 = 51840
Thanks in advance for you help that you can provide.
--
*Gna Phetsarath*System Architect // AOL Platforms // Data Services //
Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 aim: sphetsarath20 t: @sourigna
* <http://www.aolplatforms.com>*