Hi,
When using the python sdk I'm a little confused as to when the pipeline object
is actually needed. I gather one needs it initially to create a pcollection,
just because this is when I most often see it consistently used ex:
with beam.Pipeline() as pipeline:
dict_pc = (
pipeline
| beam.io.fileio.MatchFiles("./*.csv")
| 'Read matched files' >> beam.io.fileio.ReadMatches()
| 'Get CSV data as a dict' >> beam.FlatMap(my_csv_reader)
)
# do stuff with dict_pc and other operations
But beyond this when do one need the pipeline object? It seems like the
transforms expect a pcollection and output a pcollection so I'm confused and
not finding documentation that addresses this. thank you.