I realized that you can just use normal Python control flow while
constructing your pipeline. No additional Beam functionality necessary.

On Thu, Feb 20, 2020 at 2:14 PM Xander Song <[email protected]> wrote:

> I have written a feature extraction pipeline in which I extract two
> features using ParDos and combine the results with a CoGroupByKey.
>
> with beam.Pipeline() as p:
>
>     input = p | 'read input' >> beam.io.ReadFromText(input_path)
>
>     first_feature = input | 'extract first feature' >>
> beam.ParDo(ExtractFirstFeatureFn())
>
>     second_feature = input | 'extract second feature' >>
> beam.ParDo(ExtractSecondFeatureFn())
>
>     combined = {'first feature': first_feature, 'second feature':
> second_feature} | 'combine' >> beam.CoGroupByKey()
>
>
>
> I'd like to extend the pipeline to extract an arbitrary number of features
> while still aggregating them at the end with a CoGroupByKey. I'd also like
> to be able to decide at runtime (via command line arguments or a
> configuration file) which features will be extracted (e.g., extract
> features 1 and 3, but not feature 2). How could I write such a pipeline?
>
>
> Thanks in advance,
>
> - Xander
>

Reply via email to