I would use Reshuffle()[1] with entity id as the key. It internally does a
GroupByKey and sets up windowing such that it does not buffer anything.

[1] :
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L51


On Fri, Jul 6, 2018 at 8:01 AM Niels Basjes <ni...@basjes.nl> wrote:

> Hi,
>
> I have an unbounded stream of change events each of which has the id of
> the entity that is changed.
> To avoid the need for locking in the persistence layer that is needed in
> part of my processing I want to route all events based on this entity id.
> That way I know for sure that all events around a single entity go through
> the same instance of my processing sequentially, hence no need for locking
> or other synchronization regarding this persistence.
>
> At this point my best guess is that I need to use the GroupByKey but that
> seems to need a Window.
> I think I don't want a window because the stream is unbounded and I want
> the lowest possible latency (i.e. a Window of 1 second would be ok for this
> usecase).
> Also I want to be 100% sure that all events for a specific id go to only a
> single instance because I do not want any race conditions.
>
> My simple question is: What does that look like in the Beam Java API?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Reply via email to