Hi Sergey, As Andrey noted, it’s a known issue with (currently) no good solution.
I talk a bit about how we worked around it on slide 26 of my Flink Forward talk <https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-ken-krugler-building-a-scalable-focused-web-crawler-with-flink> on a Flink-based web crawler. Basically we do some cheesy approximate monitoring of in-flight data, and throttle the key producer so that (hopefully) network buffers don’t fill up to the point of deadlock. — Ken > On Dec 24, 2018, at 8:46 AM, Andrey Zagrebin <and...@da-platform.com> wrote: > > Hi Sergey, > > It seems to be a known issue. Community will hopefully work on this but I do > not see more updates since the last answer to the similar question [1], see > also [2] and [3]. > > Best, > Andrey > > [1] > http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E > > <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E> > [2] > http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E > > <http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E> > [3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 > <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132> > On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <s.pogans...@slice.com > <mailto:s.pogans...@slice.com>> wrote: > We've tried using iterations feature and in case of significant load the job > sometimes stalls and stops processing events due to high back pressure both > in tasks that produces records for iteration and all the other inputs to this > task. It looks like a back pressure loop the task can't handle all the > incoming records, iteration sink loops back into this task and also gets back > pressured. This is basically a "back pressure loop" which causes a complete > job stoppage. > > Is there a way to mitigate this (to guarantee such issue does not occur)? -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra