Hi,  

  

When using the python sdk I'm a little confused as to when the pipeline object
is actually needed. I gather one needs it initially to create a pcollection,
just because this is when I most often see it consistently used ex:  

  

with beam.Pipeline() as pipeline:  

dict_pc = (  

pipeline  

| beam.io.fileio.MatchFiles("./*.csv")  

| 'Read matched files' >> beam.io.fileio.ReadMatches()  

| 'Get CSV data as a dict' >> beam.FlatMap(my_csv_reader)  

)  

  

# do stuff with dict_pc and other operations  

  

But beyond this when do one need the pipeline object? It seems like the
transforms expect a pcollection and output a pcollection so I'm confused and
not finding documentation that addresses this. thank you.

  

Reply via email to