Hi Reza and others, As suggested, I have opened https://issues.apache.org/jira/browse/BEAM-10019 which I think might be a good addition to beam pipeline patterns.
Thanks Mohil On Mon, Apr 6, 2020 at 6:28 PM Mohil Khare <[email protected]> wrote: > Sure thing.. I would love to contribute. > > Thanks > Mohil > > > > On Mon, Apr 6, 2020 at 6:17 PM Reza Ardeshir Rokni <[email protected]> > wrote: > >> Great! BTW if you get the time and wanted to contribute back to beam >> there is a nice section to record cool patterns: >> >> https://beam.apache.org/documentation/patterns/overview/ >> >> This would make a great one! >> >> On Tue, 7 Apr 2020 at 09:12, Mohil Khare <[email protected]> wrote: >> >>> No ... that's a valid answer. Since I wanted to have a long window size >>> per key and since we can't use state with session windows, I am using a >>> sliding window for let's say 72 hrs which starts every hour. >>> >>> Thanks a lot Reza for your input. >>> >>> Regards >>> Mohil >>> >>> On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <[email protected]> >>> wrote: >>> >>>> Depends on the use case, Global state comes with the technical debt of >>>> having to do your own GC, but comes with more control. You could >>>> implement the pattern above with a long FixedWindow as well, which will >>>> take care of the GC within the window bound. >>>> >>>> Sorry, its not a yes / no answer :-) >>>> >>>> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <[email protected]> wrote: >>>> >>>>> Thanks a lot Reza for your quick response. Yeah saving the data in an >>>>> external system after timer expiry makes sense. >>>>> So do you suggest using a global window for maintaining state ? >>>>> >>>>> Thanks and regards >>>>> Mohil >>>>> >>>>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <[email protected]> >>>>> wrote: >>>>> >>>>>> Are you able to make use of the following pattern? >>>>>> >>>>>> Store StateA-metadata until no activity for Duration X, you can use a >>>>>> Timer to check this, then expire the value, but store in an >>>>>> external system. If you get a record that does want this value after >>>>>> expiry, call out to the external system and store the value again in key >>>>>> StateA-metadata. >>>>>> >>>>>> Cheers >>>>>> >>>>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <[email protected]> wrote: >>>>>> >>>>>>> Hello all, >>>>>>> We are attempting a implement a use case where beam (java sdk) reads >>>>>>> two kind of records from data stream like Kafka: >>>>>>> >>>>>>> 1. Records of type A containing key and corresponding metadata. >>>>>>> 2. Records of type B containing the same key, but no metadata. Beam >>>>>>> then needs to fill metadata for records of type B by doing a lookup for >>>>>>> metadata using keys received in records of type A. >>>>>>> >>>>>>> Idea is to save metadata or rather state for keys received in >>>>>>> records of type A and then do a lookup when records of type B are >>>>>>> received. >>>>>>> I have implemented this using the "@State" construct of beam. >>>>>>> However my problem is that we don't know when keys should expire. I >>>>>>> don't >>>>>>> think keeping a global window will be a good idea as there could be many >>>>>>> keys (may be millions over a period of time) to be saved in a state. >>>>>>> >>>>>>> What is the best way to achieve this? I was reading about RedisIO, >>>>>>> but found that it is still in the experimental stage. Can someone please >>>>>>> recommend any solution to achieve this. >>>>>>> >>>>>>> Thanks and regards >>>>>>> Mohil >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>
