Re: Limit the number of DoFn instances per worker?

Rafal Wojdyla Mon, 16 Oct 2017 18:28:11 -0700

Hi.
To answer your question: if we limit ourselves to DataflowRunner, you could
use `numberOfWorkerHarnessThreads`. See more here
<https://github.com/spotify/scio/wiki/FAQ#how-do-i-control-concurrency-number-of-dofn-threads-in-dataflow-workers>.
That said, I'm not gonna comment whether that is a good remedy for your
actual problem.
- rav


On Mon, Oct 16, 2017 at 8:48 PM, Derek Hao Hu <[email protected]>
wrote:

> Hi,
>
> Is there an easy way to limit the number of DoFn instances per worker?
>
> The use case is like this: we are calling TensorFlow in our DoFn and each
> TensorFlow call would automatically try to allocate the available CPU
> resources. So in a streaming pipeline, what I'm seeing is the inference
> time will become longer over time if autoscaling didn't catch up. My
> hypothesis is that Beam is trying to allocate a specific number of elements
> (maybe the number of cores?) on each worker for a particular DoFn and then
> these TensorFlow threads contend for CPU cycles. Therefore, I would like to
> know whether it's possible to limit the number of threads a pipeline runner
> can allocate for a DoFn per worker. By doing this, we can ensure we are
> accumulating backlogs in the streaming pipeline earlier and autoscaling
> would probably happen earlier as well.
>
> Thanks!
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.
>

Re: Limit the number of DoFn instances per worker?

Reply via email to