Hi. To answer your question: if we limit ourselves to DataflowRunner, you could use `numberOfWorkerHarnessThreads`. See more here <https://github.com/spotify/scio/wiki/FAQ#how-do-i-control-concurrency-number-of-dofn-threads-in-dataflow-workers>. That said, I'm not gonna comment whether that is a good remedy for your actual problem. - rav
On Mon, Oct 16, 2017 at 8:48 PM, Derek Hao Hu <[email protected]> wrote: > Hi, > > Is there an easy way to limit the number of DoFn instances per worker? > > The use case is like this: we are calling TensorFlow in our DoFn and each > TensorFlow call would automatically try to allocate the available CPU > resources. So in a streaming pipeline, what I'm seeing is the inference > time will become longer over time if autoscaling didn't catch up. My > hypothesis is that Beam is trying to allocate a specific number of elements > (maybe the number of cores?) on each worker for a particular DoFn and then > these TensorFlow threads contend for CPU cycles. Therefore, I would like to > know whether it's possible to limit the number of threads a pipeline runner > can allocate for a DoFn per worker. By doing this, we can ensure we are > accumulating backlogs in the streaming pipeline earlier and autoscaling > would probably happen earlier as well. > > Thanks! > -- > Derek Hao Hu > > Software Engineer | Snapchat > Snap Inc. >
