We've tossed around the idea of "metadata-driven" triggers which would
essentially let you provide a mapping element -> metadata and a
monotonic CombineFn metadata* -> bool that would allow for this (the
AfterCount being a special case of this, with the mapping fn being _
-> 1, and the CombineFn being sum(...) >= N, for size one would
provide a (perhaps approximate) sizing mapping fn).

Note, however, that there's no guarantee that the trigger fire as soon
as possible; due to runtime characteristics a significant amount of
data may be buffered (or come in at once) before the trigger is
queried. One possibility would be to follow your triggering with a
DoFn that breaks up large value streams into multiple manageable sized
ones as needed.

On Tue, Jan 9, 2018 at 11:43 AM, Carlos Alonso <[email protected]> wrote:
> Hi everyone!!
>
> I was wondering if there is an option to trigger window panes based on the
> size of the pane itself (rather than the number of elements).
>
> To provide a little bit more of context we're backing up a PubSub topic into
> GCS with the "special" feature that, depending on the "type" of the message,
> the GCS destination is one or another.
>
> Messages' 'shape' published there is quite random, some of them are very
> frequent and small, some others very big but sparse... We have around 150
> messages per second (in total) and we're firing every 15 minutes and
> experiencing OOM errors, we've considered firing based on the number of
> items as well, but given the randomness of the input, I don't think it will
> be a final solution either.
>
> Having a trigger based on size would be great, another option would be to
> have a dynamic shards number for the PTransform that actually writes the
> files.
>
> What is your recommendation for this use case?
>
> Thanks!!

Reply via email to