Hi,
I am using BigQueryIO from Apache Beam 2.3.0 and Scio 0.47 to load data
into BQ from Dataflow using jobs (Write.Method.FILE_LOADS). Here is the
code:
val timePartitioning = new
TimePartitioning().setField("partition_day").setType("DAY")
BigQueryIO.write[Event]
.to("some-table")
.withCreateDisposition(Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(Write.WriteDisposition.WRITE_APPEND)
.withMethod(Write.Method.FILE_LOADS)
.withFormatFunction((input: Event) =>
BigQueryType[Event].toTableRow(input))
.withSchema(BigQueryType[Event].schema)
.withTriggeringFrequency(Duration.standardMinutes(15))
.withNumFileShards(XXX)
.withTimePartitioning(timePartitioning)
My question is related to the "numFileShards", which is a mandatory
parameter to set when using a "triggeringFrequency". I have been trying to
find information and reading the source code to understand what it does but
I couldn't find anything relevant.
Considering there is gonna be a throughput of 300-1000 events per second,
what would be the recommended value?
Thanks!