I realized that you can just use normal Python control flow while constructing your pipeline. No additional Beam functionality necessary.
On Thu, Feb 20, 2020 at 2:14 PM Xander Song <[email protected]> wrote: > I have written a feature extraction pipeline in which I extract two > features using ParDos and combine the results with a CoGroupByKey. > > with beam.Pipeline() as p: > > input = p | 'read input' >> beam.io.ReadFromText(input_path) > > first_feature = input | 'extract first feature' >> > beam.ParDo(ExtractFirstFeatureFn()) > > second_feature = input | 'extract second feature' >> > beam.ParDo(ExtractSecondFeatureFn()) > > combined = {'first feature': first_feature, 'second feature': > second_feature} | 'combine' >> beam.CoGroupByKey() > > > > I'd like to extend the pipeline to extract an arbitrary number of features > while still aggregating them at the end with a CoGroupByKey. I'd also like > to be able to decide at runtime (via command line arguments or a > configuration file) which features will be extracted (e.g., extract > features 1 and 3, but not feature 2). How could I write such a pipeline? > > > Thanks in advance, > > - Xander >
