Dear all,
I am using the kafkaIO sdk in my project (Beam 2.0.0 with Direct runner).
With using this sdk, there are a situation about data latency, and the
description of situation is in the following.
The data come from kafak with a fixed speed: 100 data size/ 1 sec.
I create a fixed window wi
E_NEVER)
> .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
>
> where ToDestination is a:
>
> SerializableFunction, TableDestination>
>
> which returns a:
>
> new TableDestination(tableName, "")
>
> where tableName looks like "m
You should use a side input and not an empty PCollection that you flatten.
Since
ReadA --> Flatten --> ParDo
ReadB -/
can be equivalently executed as:
ReadA --> ParDo
ReadB --> ParDo
Make sure you access the side input in case a runner evaluates the side
input lazily.
So your pipeline would look
Hi,
What is the best way to run code before the pipeline starts? Anything in the
`main` function doesn't get called when the pipeline is ran on Dataflow via a
template - only the pipeline. If you're familiar with Spark, then I'm thinking
of code that might be ran in the driver.
Alternatively,