I'm looking for any information anyone can provide on the strategy hinted at here http://stackoverflow.com/questions/38775173/can-a-once-firing-trigger-be-used-to-reduce-data-volume for using CombinePerKey as a poor man's state API. The only thing I can think of is modifying the AccumT object inside of the extractOutput method, but that feels a little dangerous and I want to confirm that I won't get any surprises.
The issue is that our dataflow job is consuming the binlog of a database, where most of the data update events don't actually update any field that would affect the calculation, with most of the aggregation points having a global window that triggers on each new element, which means our output is currently correct, but we are updating many orders of magnitude more times than is required.
