Streaming mode Dataflow - how to make Autoscaling kick in for FileIO.match operations

asharma . gd Wed, 29 Aug 2018 08:38:05 -0700

Excerpt for Autoscaling on Streaming mode
"Currently, PubsubIO is the only source that supports autoscaling on streaming 
pipelines. All SDK-provided sinks are supported. In this Beta release, 
Autoscaling works smoothest when reading from Cloud Pub/Sub subscriptions tied 
to topics published with small batches and when writing to sinks with low 
latency. In extreme cases (i.e. Cloud Pub/Sub subscriptions with large 
publishing batches or sinks with very high latency), autoscaling is known to 
become coarse-grained. This will be improved in future releases."


For our use case we cannot write messages in PubSub but we are writing file 
paths in PubSub and then after reading the FilePath we want to do FileIO.match 
and go from there. But for these use cases Autoscaling is not kicking in 
Streaming mode.

Is there a way to override the 'Backlog' (hint for autcoscaling to kick in) 
that in this case one message in pub sub is indicator of millions of message so 
that Autoscaling treats as if one message in PubSub is equal to amount of 1 
million pending messages and if if it sees there are 100 messages in PubSub 
then it knows it has to process 100 million records corresponding to these 100 
messages and start kicking in Autoscaling. 

Or may be I am thinking on completely wrong line. Basically any way where we 
can trigger Autoscaling in streaming mode after FileIO.match operation

Thanks
Aniruddh

Streaming mode Dataflow - how to make Autoscaling kick in for FileIO.match operations

Reply via email to