Since your using a small number of shards, add a Partition transform which uses a deterministic hash of the key to choose one of 4 partitions. Write each partition with a single shard.
(Fixed width diagram below) Pipeline -> AvroIO(numShards = 4) Becomes: Pipeline -> Partition --> AvroIO(numShards = 1) |-> AvroIO(numShards = 1) |-> AvroIO(numShards = 1) \-> AvroIO(numShards = 1) On Wed, May 24, 2017 at 1:05 AM, Josh <jof...@gmail.com> wrote: > Hi, > > I am using a FileBasedSink (AvroIO.write) on an unbounded stream > (withWindowedWrites, hourly windows, numShards=4). > > I would like to partition the stream by some key in the element, so that > all elements with the same key will get processed by the same shard writer, > and therefore written to the same file. Is there a way to do this? Note > that in my stream the number of keys is very large (most elements have a > unique key, while a few elements share a key). > > Thanks, > Josh >