Hi all,

We’ve run into deadlocks with two different streaming workflows that have 
iterations.

In both cases, the issue is with fan-out; if any operation in the loop can emit 
more records than consumed, eventually a network buffer fills up, and then 
everyone in the iteration loop is blocked.

One pattern we can use, when the operator that’s causing the fan-out has the 
ability to decide how much to emit, is to have it behave as an async function, 
emitting from a queue with multiple threads. If threads start blocking because 
of back pressure, then the queue begins to fill up, and the function can 
throttle back how much data it queues up. So this gives us a small (carefully 
managed) data reservoir we can use to avoid the deadlock.

Is there a better approach? I didn’t see any way to determine how “full” the 
various network buffers are, and use that for throttling. Plus there’s the 
issue of partitioning, where it would be impossible in many cases to know the 
impact of a record being emitted. So even if we could monitor buffers, I don’t 
think it’s a viable solution.

Thanks,

— Ken

--------------------------------------------
http://about.me/kkrugler <http://about.me/kkrugler>
+1 530-210-6378

Reply via email to