Since processing can happen out of order, for example if the input was:
```
{"id": "2", parent_id: "a", "timestamp": 2, "amount": 3}
{"id": "1", parent_id: "a", "timestamp": 1. "amount": 1}
{"id": "1", parent_id: "a", "timestamp": 3, "amount": 2}
```
would the output be 3 and then 5 or would you still want 1, 4, and then 5?

On Mon, Dec 4, 2017 at 2:13 PM, Vilhelm von Ehrenheim <
[email protected]> wrote:

> Hi all!
> First of all great work on the 2.2.0 release! really excited to start
> using it.
>
> I have a problem with how I should construct a pipeline that should emit a
> sum of latest values which I hope someone might have some ideas on how to
> solve.
>
> Here is what I have:
>
> I have a stateful stream of events that contain updates to a long amonst
> other things. These events looks something like this
>
> ```
> {"id": "1", parent_id: "a", "timestamp": 1. "amount": 1}
> {"id": "2", parent_id: "a", "timestamp": 2, "amount": 3}
> {"id": "1", parent_id: "a", "timestamp": 3, "amount": 2}
> ```
>
> I want to emit sums of the `amount` per `parent_id` but only using the
> latest record per `id`. Here that would result in sums of 1, 4 and then 5.
>
> To make it harder I need to do this in a global window with triggering
> based on element count. I could maybe combine that w a processing time
> trigger though. At least I need a global sum over all events.
>
> I have tried to do this with Latest.perKey and Sum.perKey but as you
> probably realize that will give some strange results as the downstream sum
> will not discard elements that are replaced by newer updates in the latest
> transform.
>
> I also though I could write a custom CombineFn for this but I need to do
> it for different keys which leaves me really confused.
>
> Any help or pointers are greatly appreciated.
>
> Thanks!
> Vilhelm von Ehrenheim
>

Reply via email to