We've tossed around the idea of "metadata-driven" triggers which would essentially let you provide a mapping element -> metadata and a monotonic CombineFn metadata* -> bool that would allow for this (the AfterCount being a special case of this, with the mapping fn being _ -> 1, and the CombineFn being sum(...) >= N, for size one would provide a (perhaps approximate) sizing mapping fn).
Note, however, that there's no guarantee that the trigger fire as soon as possible; due to runtime characteristics a significant amount of data may be buffered (or come in at once) before the trigger is queried. One possibility would be to follow your triggering with a DoFn that breaks up large value streams into multiple manageable sized ones as needed. On Tue, Jan 9, 2018 at 11:43 AM, Carlos Alonso <[email protected]> wrote: > Hi everyone!! > > I was wondering if there is an option to trigger window panes based on the > size of the pane itself (rather than the number of elements). > > To provide a little bit more of context we're backing up a PubSub topic into > GCS with the "special" feature that, depending on the "type" of the message, > the GCS destination is one or another. > > Messages' 'shape' published there is quite random, some of them are very > frequent and small, some others very big but sparse... We have around 150 > messages per second (in total) and we're firing every 15 minutes and > experiencing OOM errors, we've considered firing based on the number of > items as well, but given the randomness of the input, I don't think it will > be a final solution either. > > Having a trigger based on size would be great, another option would be to > have a dynamic shards number for the PTransform that actually writes the > files. > > What is your recommendation for this use case? > > Thanks!!
