Hi Amir! When all happens on one node, you probably set the parallelism so low that all can be put onto one node. For example each node offers 8 slots (16 slots total), but your parallelism is only 8.
You can think of it in similar terms as in YARN: A node can offer 8 containers, and when your job wants only 8 containers, it can happen that all are on one node. In order to spread computation out, you can do one of those two things: (1) Increase the parallelism (in this example to 16) - that gives best resource utilization (2) If you want to keep a parallelism of 8 and be sure that everything is on two nodes, you can decrease the slots per node to 4 so that there are 8 across two nodes. Greetings, Stephan On Wed, Jul 6, 2016 at 7:18 PM, amir bahmanyari <[email protected]> wrote: > Hi Colleagues, > I get the same time-to-time issue with Beam FlinkRunner in a Flink cluster > (2 nodes). > Also, 98% of the time, all happens in only one node (no LB in the cluster). > FYI. > Thanks > Amir- > > > ------------------------------ > *From:* Stephan Ewen <[email protected]> > *To:* [email protected] > *Sent:* Wednesday, July 6, 2016 7:22 AM > *Subject:* Re: [FlinkRunner] Exception "Insufficient number of network > buffers" thrown from time to time. > > The number of network buffers is a parameter one sometimes need to > configure in Flink. > > For Flink's own API, you can explicitly create a > LocalStreamExecutionEnvironment from a configuration and use that one for > the execution. > > For the FlinkRunner in Beam, it would make sense to be able to pass an > ExecutionEnvironment object to use for the program execution. > Then the fix would be to create a LocalStreamExecutionEnvironment from a > config and pass that to the Flink Runner. > > > On Wed, Jul 6, 2016 at 3:36 PM, Aljoscha Krettek <[email protected]> > wrote: > > Could you maybe send me a minimal version of that such that I can > reproduce it? > > On Wed, 6 Jul 2016 at 12:29 Pawel Szczur <[email protected]> wrote: > > Verified, I've started getting it consistently this morning after few > more PTransform. > > 2016-07-06 12:27 GMT+02:00 Aljoscha Krettek <[email protected]>: > > Strange, and you're saying you only sometimes get this exception? Not > reproducibly? > > On Wed, 6 Jul 2016 at 12:02 Pawel Szczur <[email protected]> wrote: > > I have 8 cores. > Just IDE. > > 2016-07-06 12:00 GMT+02:00 Aljoscha Krettek <[email protected]>: > > Hi, > are you running this in an IDE or on an actual cluster? > > - > Aljoscha > > On Wed, 6 Jul 2016 at 11:57 Jean-Baptiste Onofré <[email protected]> wrote: > > Hi Pawel, > > I'm pretty sure that our Flink experts will answer. > > I'm assuming you are using a single JVM for Flink, right ? > I think it could be related to Flink StreamExecutionEnvironment and the > number of core on your machine. > More your machine has cores, more you should increase the numberOfBuffers. > > Regards > JB > > On 07/06/2016 11:26 AM, Pawel Szczur wrote: > > When running my simple pipeline from time to time I'm getting below > > exception: > > > > Caused by: java.io.IOException: Insufficient number of network buffers: > > required 1, but only 0 available. The total number of network buffers is > > currently set to 2048. You can increase this number by setting the > > configuration key 'taskmanager.network.numberOfBuffers'. > > at > > org.apache.flink.runtime.io > .network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196) > > at > > org.apache.flink.runtime.io > .network.NetworkEnvironment.registerTask(NetworkEnvironment.java:298) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:469) > > at java.lang.Thread.run(Thread.java:745) > > > > > > I'm using trunk of Beam with FlinkRunner. I guess it's well known Flink > > problem? Idea how to prevent it? > > > > Pawel > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > > > > > > >
