Hi.
To answer your question: if we limit ourselves to DataflowRunner, you could
use `numberOfWorkerHarnessThreads`. See more here
<https://github.com/spotify/scio/wiki/FAQ#how-do-i-control-concurrency-number-of-dofn-threads-in-dataflow-workers>.
That said, I'm not gonna comment whether that is a good remedy for your
actual problem.
- rav

On Mon, Oct 16, 2017 at 8:48 PM, Derek Hao Hu <[email protected]>
wrote:

> Hi,
>
> ​Is there an easy way to limit the number of DoFn instances per worker?
>
> The use case is like this: we are calling TensorFlow in our DoFn and each
> TensorFlow call would automatically try to allocate the available CPU
> resources. So in a streaming pipeline, what I'm seeing is the inference
> time will become longer over time if autoscaling didn't catch up. My
> hypothesis is that Beam is trying to allocate a specific number of elements
> (maybe the number of cores?) on each worker for a particular DoFn and then
> these TensorFlow threads contend for CPU cycles. Therefore, I would like to
> know whether it's possible to limit the number of threads a pipeline runner
> can allocate for a DoFn per worker. By doing this, we can ensure we are
> accumulating backlogs in the streaming pipeline earlier and autoscaling
> would probably happen earlier as well.
>
> Thanks!​
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.
>

Reply via email to