Limit the number of DoFn instances per worker?

Derek Hao Hu Mon, 16 Oct 2017 17:49:37 -0700

Hi,

Is there an easy way to limit the number of DoFn instances per worker?


The use case is like this: we are calling TensorFlow in our DoFn and each
TensorFlow call would automatically try to allocate the available CPU
resources. So in a streaming pipeline, what I'm seeing is the inference
time will become longer over time if autoscaling didn't catch up. My
hypothesis is that Beam is trying to allocate a specific number of elements
(maybe the number of cores?) on each worker for a particular DoFn and then
these TensorFlow threads contend for CPU cycles. Therefore, I would like to
know whether it's possible to limit the number of threads a pipeline runner
can allocate for a DoFn per worker. By doing this, we can ensure we are
accumulating backlogs in the streaming pipeline earlier and autoscaling
would probably happen earlier as well.

Thanks!
-- 
Derek Hao Hu

Software Engineer | Snapchat
Snap Inc.

Limit the number of DoFn instances per worker?

Reply via email to