Hi,
What is the best way to run code before the pipeline starts? Anything in the
`main` function doesn't get called when the pipeline is ran on Dataflow via a
template - only the pipeline. If you're familiar with Spark, then I'm thinking
of code that might be ran in the driver.
Alternatively, is there a way I can run part of a pipeline first, then run
another part once it's completed? Not sure that makes sense, so to illustrate
with a poor attempt at an ascii diagram, if I have something like this:
events
/\
/ \
| group by key
| |
| do some action
| /
| /
once action is complete,
process all original elements
I can presumably achieve this by having `do some action` either generating an
empty side input or an empty PCollection which I can then use to create a
PCollectionList along with the original and pass to Flatten.pCollections()
before continuing. Not sure if that's the best way to do it though.
Thanks,
Andrew