Hi, Is there an easy way to limit the number of DoFn instances per worker?
The use case is like this: we are calling TensorFlow in our DoFn and each TensorFlow call would automatically try to allocate the available CPU resources. So in a streaming pipeline, what I'm seeing is the inference time will become longer over time if autoscaling didn't catch up. My hypothesis is that Beam is trying to allocate a specific number of elements (maybe the number of cores?) on each worker for a particular DoFn and then these TensorFlow threads contend for CPU cycles. Therefore, I would like to know whether it's possible to limit the number of threads a pipeline runner can allocate for a DoFn per worker. By doing this, we can ensure we are accumulating backlogs in the streaming pipeline earlier and autoscaling would probably happen earlier as well. Thanks! -- Derek Hao Hu Software Engineer | Snapchat Snap Inc.