Hi,

What is the best way to run code before the pipeline starts? Anything in the 
`main` function doesn't get called when the pipeline is ran on Dataflow via a 
template - only the pipeline. If you're familiar with Spark, then I'm thinking 
of code that might be ran in the driver.

Alternatively, is there a way I can run part of a pipeline first, then run 
another part once it's completed? Not sure that makes sense, so to illustrate 
with a poor attempt at an ascii diagram, if I have something like this:

           events
                 /\
             /        \
             |        group by key
             |                     |
             |        do some action
             |                    /
             |                /
 once action is complete,
process all original elements

I can presumably achieve this by having `do some action` either generating an 
empty side input or an empty PCollection which I can then use to create a 
PCollectionList along with the original and pass to Flatten.pCollections() 
before continuing. Not sure if that's the best way to do it though.

Thanks,
Andrew

Reply via email to