Excerpt for Autoscaling on Streaming mode "Currently, PubsubIO is the only source that supports autoscaling on streaming pipelines. All SDK-provided sinks are supported. In this Beta release, Autoscaling works smoothest when reading from Cloud Pub/Sub subscriptions tied to topics published with small batches and when writing to sinks with low latency. In extreme cases (i.e. Cloud Pub/Sub subscriptions with large publishing batches or sinks with very high latency), autoscaling is known to become coarse-grained. This will be improved in future releases."
For our use case we cannot write messages in PubSub but we are writing file paths in PubSub and then after reading the FilePath we want to do FileIO.match and go from there. But for these use cases Autoscaling is not kicking in Streaming mode. Is there a way to override the 'Backlog' (hint for autcoscaling to kick in) that in this case one message in pub sub is indicator of millions of message so that Autoscaling treats as if one message in PubSub is equal to amount of 1 million pending messages and if if it sees there are 100 messages in PubSub then it knows it has to process 100 million records corresponding to these 100 messages and start kicking in Autoscaling. Or may be I am thinking on completely wrong line. Basically any way where we can trigger Autoscaling in streaming mode after FileIO.match operation Thanks Aniruddh
