Dataflow isn't parallelizing

Alan Krumholz Fri, 11 Sep 2020 09:41:19 -0700

Hi DataFlow team,
I have a simple pipeline that I'm trying to speed up using DataFlow:


[image: image.png]

As you can see the bottleneck is the "transcribe mp3" step. I was hoping
DataFlow would be able to run many of these in parallel to speed up the
total execution time.

However it seems it doesn't do that... and instead keeps executing it all
independent inputs sequentially....
Even when I tried to force it to start with many workers it rapidly shuts
down most of them and only keeps one alive and doesn't ever seem to
parallelize this step :(

Any advice on what else to try to make it do this?

Thanks so much!

Dataflow isn't parallelizing

Reply via email to