Hi,

I am using BigQueryIO from Apache Beam 2.3.0 and Scio 0.47 to load data
into BQ from Dataflow using jobs (Write.Method.FILE_LOADS). Here is the
code:

    val timePartitioning = new
TimePartitioning().setField("partition_day").setType("DAY")

    BigQueryIO.write[Event]
      .to("some-table")
      .withCreateDisposition(Write.CreateDisposition.CREATE_IF_NEEDED)
      .withWriteDisposition(Write.WriteDisposition.WRITE_APPEND)
      .withMethod(Write.Method.FILE_LOADS)
      .withFormatFunction((input: Event) =>
BigQueryType[Event].toTableRow(input))
      .withSchema(BigQueryType[Event].schema)
      .withTriggeringFrequency(Duration.standardMinutes(15))
      .withNumFileShards(XXX)
      .withTimePartitioning(timePartitioning)

My question is related to the "numFileShards", which is a mandatory
parameter to set when using a "triggeringFrequency". I have been trying to
find information and reading the source code to understand what it does but
I couldn't find anything relevant.

Considering there is gonna be a throughput of 300-1000 events per second,
what would be the recommended value?

Thanks!

Reply via email to