Hi Niels, as you have an Unbounded PCollection, you need a Window to GroupByKey and then "forward" the data.
Another option would be to use a DoFn working element per element and eventually batching then. It's what most of the IOs are doing for the Write part. Regards JB On 06/07/2018 17:01, Niels Basjes wrote: > Hi, > > I have an unbounded stream of change events each of which has the id of > the entity that is changed. > To avoid the need for locking in the persistence layer that is needed in > part of my processing I want to route all events based on this entity id. > That way I know for sure that all events around a single entity go > through the same instance of my processing sequentially, hence no need > for locking or other synchronization regarding this persistence. > > At this point my best guess is that I need to use the GroupByKey but > that seems to need a Window. > I think I don't want a window because the stream is unbounded and I want > the lowest possible latency (i.e. a Window of 1 second would be ok for > this usecase). > Also I want to be 100% sure that all events for a specific id go to only > a single instance because I do not want any race conditions. > > My simple question is: What does that look like in the Beam Java API? > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes -- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
