Hey Ken,

thanks for your message. Both your comments are correct (see inline).

On Fri, Nov 10, 2017 at 10:31 PM, Ken Krugler
<kkrugler_li...@transpac.com> wrote:
> 1. A downstream function in the iteration was (significantly) increasing the
> number of tuples - it would get one in, and sometimes emit 100+.
>
> The output would loop back as input via the iteration.
>
> This eventually caused the network buffers to fill up, and that’s why the
> job got stuck.
>
> I had to add my own tracking/throttling in one of my custom function, to
> avoid having too many “active” tuples.
>
> So maybe something to note in documentation on iterations, if it’s not there
> already.

Yes, iterations are prone to deadlock due to the way that data is
exchanged between the sink and head nodes. There have been multiple
attempts to fix these shortcomings, but I don't know what the latest
state is. Maybe Aljoscha (CC'd) has some input...

> 2. The back pressure calculation doesn’t take into account AsyncIO

Correct, the back pressure monitoring only takes the main task thread
into account. Every operator that uses a separate thread to emit
records (like Async I/O oder Kafka source) is therefore not covered by
the back pressure monitoring.

– Ufuk

Reply via email to