You could use a StatefulDoFn to buffer up 10k worth of data and write it.
There is an example for batched RPC[1] that could be re-used using the
FileSystems API to just create files. You'll want to use a reshuffle + a
fixed number of keys where the fixed number of keys controls the write
parallelization. Pipeline would look like:

--> ParDo(assign deterministic random key in fixed key space) --> Reshuffle
--> ParDo(10k buffering StatefulDoFn)

You'll need to provide more details around any other constraints you have
around writing 10k.

1: https://beam.apache.org/blog/2017/08/28/timely-processing.html


On Mon, Nov 26, 2018 at 9:39 AM Vinay Patil <[email protected]> wrote:

> Hi,
>
> I have a use case of writing only 10k in a single file after which a new
> file should be created.
>
> Is there a way in Beam to roll a file after the specified number of
> records ?
>
> Regards,
> Vinay Patil
>

Reply via email to