You could use a StatefulDoFn to buffer up 10k worth of data and write it. There is an example for batched RPC[1] that could be re-used using the FileSystems API to just create files. You'll want to use a reshuffle + a fixed number of keys where the fixed number of keys controls the write parallelization. Pipeline would look like:
--> ParDo(assign deterministic random key in fixed key space) --> Reshuffle --> ParDo(10k buffering StatefulDoFn) You'll need to provide more details around any other constraints you have around writing 10k. 1: https://beam.apache.org/blog/2017/08/28/timely-processing.html On Mon, Nov 26, 2018 at 9:39 AM Vinay Patil <[email protected]> wrote: > Hi, > > I have a use case of writing only 10k in a single file after which a new > file should be created. > > Is there a way in Beam to roll a file after the specified number of > records ? > > Regards, > Vinay Patil >
