Hi, This is a known problem and I don’t think there is an easy solution to this. Please refer to the: http://mail-archives.apache.org/mod_mbox/flink-user/201704.mbox/%3c5486a7fd-41c3-4131-5100-272825088...@gaborhermann.com%3E <http://mail-archives.apache.org/mod_mbox/flink-user/201704.mbox/%3c5486a7fd-41c3-4131-5100-272825088...@gaborhermann.com%3E> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132>
Thanks, Piotrek > On 25 Jan 2018, at 05:36, Ken Krugler <kkrugler_li...@transpac.com> wrote: > > Hi all, > > We’ve run into deadlocks with two different streaming workflows that have > iterations. > > In both cases, the issue is with fan-out; if any operation in the loop can > emit more records than consumed, eventually a network buffer fills up, and > then everyone in the iteration loop is blocked. > > One pattern we can use, when the operator that’s causing the fan-out has the > ability to decide how much to emit, is to have it behave as an async > function, emitting from a queue with multiple threads. If threads start > blocking because of back pressure, then the queue begins to fill up, and the > function can throttle back how much data it queues up. So this gives us a > small (carefully managed) data reservoir we can use to avoid the deadlock. > > Is there a better approach? I didn’t see any way to determine how “full” the > various network buffers are, and use that for throttling. Plus there’s the > issue of partitioning, where it would be impossible in many cases to know the > impact of a record being emitted. So even if we could monitor buffers, I > don’t think it’s a viable solution. > > Thanks, > > — Ken > > -------------------------------------------- > http://about.me/kkrugler <http://about.me/kkrugler> > +1 530-210-6378 >