Since the only guarantee for a unit of atomicity is an individual element,
you could group your records per time interval using a GBK thus producing
KV<Interval, Iterable<Record>>. In a DoFn, for each KV<Interval,
Iterable<Record>> you would write out the Records to a file based upon the
interval key.

On Tue, Nov 22, 2016 at 10:18 AM, Bergmann, Rico (GfK External) <
[email protected]> wrote:

> The requirement is to have a set of files with avro records written to
> hdfs, where the avro records are sorted by a time field of this record.
>
>
>
> It would also suffice if I could partition the output with a custom
> partition function (for example daywise)…
>
>
>
> Thanks, Rico.
>
>
>
> *Von:* Lukasz Cwik [mailto:[email protected]]
> *Gesendet:* Dienstag, 22. November 2016 16:00
> *An:* [email protected]
> *Betreff:* Re: Support for sorting output in Beam?
>
>
>
> There is not explicit support for sorting in the Beam model today because
> the problem space is large and typically the usecases people have generally
> suffice to do a global combine and sort in memory or do a combine per key
> with a radix like scheme and sort each radix individually.
>
>
>
> Can you give more details about your usecase?
>
> Maybe you don't need to do any sorting or there is an alternative.
>
>
>
>
>
>
>
> On Tue, Nov 22, 2016 at 9:41 AM, Bergmann, Rico (GfK External) <
> [email protected]> wrote:
>
> Hi!
>
>
>
> Looking at the Java API Doc I didn’t find anything for sorting. How would
> I do this with Beam?
>
>
>
> Best, Rico.
>
>
>
>
> ------------------------------
>
>
>
> GfK SE, Nuremberg, Germany, commercial register at the local court
> Amtsgericht Nuremberg HRB 25014; Management Board: Dr. Gerhard
> Hausruckinger (Speaker of the Management Board), Christian Diedrich (CFO),
> Matthias Hartmann, David Krajicek, Alessandra Cama; Chairman of the
> Supervisory Board: Ralf Klein-Bölting This email and any attachments may
> contain confidential or privileged information. Please note that
> unauthorized copying, disclosure or distribution of the material in this
> email is not permitted.
>
>
>
> ------------------------------
>
>
> GfK SE, Nuremberg, Germany, commercial register at the local court
> Amtsgericht Nuremberg HRB 25014; Management Board: Dr. Gerhard
> Hausruckinger (Speaker of the Management Board), Christian Diedrich (CFO),
> Matthias Hartmann, David Krajicek, Alessandra Cama; Chairman of the
> Supervisory Board: Ralf Klein-Bölting This email and any attachments may
> contain confidential or privileged information. Please note that
> unauthorized copying, disclosure or distribution of the material in this
> email is not permitted.
>

Reply via email to