Thank you for looking into this! Our current workaround to update the side input data is to restart the pipeline, this hasn't been a frequent requirement but will become more common in the future. We've considered using a Guava cache but a solution within the Beam programming model would be great.
On 2018/12/18 17:20:43, Scott Wegner <[email protected]> wrote: > Hi Lucas, > > Thanks for the explanation and repro example. This is a bug in the Dataflow > service; a fix is in progress and once rolled out will apply to all SDK > versions. I've filed BEAM-6261 to track: > https://issues.apache.org/jira/browse/BEAM-6261 > > On Wed, Dec 12, 2018 at 4:31 PM Bordwell, Lucas-CW < > [email protected]> wrote: > > > Greetings, > > > > > > > > I am trying to implement the “Slowly-changing lookup cache” pattern > > described on this blog post: > > https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1 > > but am experiencing issues where the side inputs do not update with the > > DataflowRunner. I am fine with consistency being eventual on the updates in > > Dataflow. > > > > > > > > I see that there is an existing issue: > > https://issues.apache.org/jira/browse/BEAM-2155 that seems to be related > > but I also saw a comment by Kenn Knowles on this: > > https://stackoverflow.com/a/41600466/2048988 Stack Overflow answer where > > he mentions that there was a side-input caching bug which was fixed. Has > > anyone else gotten side inputs to update on Dataflow using a pattern > > similar to the one above? > > > > > > > > Here is a simplified example pipeline project I created to illustrate the > > issue using Beam 2.8.0: https://github.com/lbordwell/sideinput > > > > > > > > Thank you, > > > > Lucas Bordwell > > > > > -- > > > > > Got feedback? tinyurl.com/swegner-feedback >
