Is there builtin support for writing partitioned Collections. For example: PCollection<KV<Enum,GenericRecord>> data;
We want to write the GenericRecords in data in to different files based on the enum. We've did this in flink by making a proxy hadoopoutputformat which has the N real outputformats and the write method checks the enum and forwards the write call for the genericrecord to the correct outputformat. Given the lack of beam parquet support, the final writer we want to use with beam is Avro. We used the proxy outputformat trick in flink because performance was very poor using a filter to split it and then a map to convert from Enum,GenericRecord to just GenericRecord. I'm nervous to use side outputs in beam given I think they will be implemented as described here which performs poorly. So basically, has anyone implemented a demuxing AvroIO.Writer? Thanks
