Since processing can happen out of order, for example if the input was:
```
{"id": "2", parent_id: "a", "timestamp": 2, "amount": 3}
{"id": "1", parent_id: "a", "timestamp": 1. "amount": 1}
{"id": "1", parent_id: "a", "timestamp": 3, "amount": 2}
```
would the output be 3 and then 5 or would you still want 1, 4, and then 5?On Mon, Dec 4, 2017 at 2:13 PM, Vilhelm von Ehrenheim < [email protected]> wrote: > Hi all! > First of all great work on the 2.2.0 release! really excited to start > using it. > > I have a problem with how I should construct a pipeline that should emit a > sum of latest values which I hope someone might have some ideas on how to > solve. > > Here is what I have: > > I have a stateful stream of events that contain updates to a long amonst > other things. These events looks something like this > > ``` > {"id": "1", parent_id: "a", "timestamp": 1. "amount": 1} > {"id": "2", parent_id: "a", "timestamp": 2, "amount": 3} > {"id": "1", parent_id: "a", "timestamp": 3, "amount": 2} > ``` > > I want to emit sums of the `amount` per `parent_id` but only using the > latest record per `id`. Here that would result in sums of 1, 4 and then 5. > > To make it harder I need to do this in a global window with triggering > based on element count. I could maybe combine that w a processing time > trigger though. At least I need a global sum over all events. > > I have tried to do this with Latest.perKey and Sum.perKey but as you > probably realize that will give some strange results as the downstream sum > will not discard elements that are replaced by newer updates in the latest > transform. > > I also though I could write a custom CombineFn for this but I need to do > it for different keys which leaves me really confused. > > Any help or pointers are greatly appreciated. > > Thanks! > Vilhelm von Ehrenheim >
