I'm attempting to deploy a fairly simple job on the Dataflow runner that reads from PubSub and writes to BigQuery using file loads, but I have so far not been able to tune it to keep up with the incoming data rate.
I have configured BigQueryIO.write to trigger loads every 5 minutes, and I've let the job autoscale up to a max of 60 workers (which it has done). I'm using dynamic destinations to hit 2 field-partitioned tables. Incoming data per table is ~10k events/second, so every 5 minutes each table should be ingesting on order 200k records of ~20 kB apiece. We don't get many knobs to turn in BigQueryIO. I have tested numShards between 10 and 1000, but haven't seen obvious differences in performance. Potentially relevant: I see a high rate of warnings on the shuffler. They consist mostly of LevelDB warnings about "Too many L0 files". There are occasionally some other warnings relating to memory as well. Would using larger workers potentially help? Does anybody have experience with tuning BigQueryIO writing? It's quite a complicated transform under the hood and it looks like there are several steps of grouping and shuffling data that could be limiting throughput.