Hey Bryan, This comes up often enough that we need to prioritize the use case-- what we really want is a Target that would take in a PTable<String, T> and would be able to write an output file/directory for each String key. I'll create a JIRA to track this.
Josh On Tue, Nov 26, 2013 at 11:25 AM, Bryan Baugher <[email protected]> wrote: > Hi everyone, > > I have a PCollection of avro based objects and I want to categorize these > avro objects by a certain property by writing each category into a > different avro file. The number of distinct categories should be small > (hundreds) and the property I am categorizing on is a String. I was hoping > there was some way to end up with a Map<String, PCollection> but there > didn't seem to be any obvious choice. For now I have gone with a simple > approach of > > - Find all categories (DoFn that returns PCollection<String>) > - Materialize and iterate over this collection > - For each category use a FilterFn to create desired categorized > PCollection > - Write this to avro file > > This works but it seems like there should be a better way to do it. Any > thoughts? > > -Bryan > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
