After the first value for a triggered side input, that side input is always "ready" for the particular window, so there is no longer any synchronization with the main input. You probably want to have tests with the updated and the old value. I would expect that your steps 1-4 would work with TestStream to do just what you want, actually. Then you should also test with steps 1&3 merged.
Kenn On Tue, May 29, 2018 at 2:44 PM Pablo Estrada <[email protected]> wrote: > As far as I know, that behavior is not specified. I do not think that > Dataflow streaming supports this sort of updating to side inputs, though > I've added Slava who might have more to add. > > If updating side inputs is really not supported in Dataflow, you may be > able to use a LoadingCache, like so: > https://lists.apache.org/thread.html/%[email protected]%3E > > Best > -P. > > On Tue, May 29, 2018 at 2:36 PM Carlos Alonso <[email protected]> > wrote: > >> Hi Lukasz, many thanks for your responses. >> >> I'm actually using them but I think I'm not being able to synchronise the >> following steps: >> 1: The side input gets its first value (v1) >> 2: The main stream gets that side input applied and finds that v1 value >> 3: The side one gets updated (v2) >> 4: The main stream gets the side input applied again and finds the v2 >> value (along with v1 as this is multimap) >> >> Regards >> >> On Tue, May 29, 2018 at 10:57 PM Lukasz Cwik <[email protected]> wrote: >> >>> Your best bet is to use TestStreams[1] as it is used to validate >>> window/triggering behavior. Note that the transform requires special runner >>> based execution and currently only works with the DirectRunner. All >>> examples are marked with the JUnit category "UsesTestStream", for example >>> [2]. >>> >>> 1: >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java >>> 2: >>> https://github.com/apache/beam/blob/0cbcf4ad1db7d820c5476d636f3a3d69062021a5/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L69 >>> >>> >>> On Tue, May 29, 2018 at 1:05 PM Carlos Alonso <[email protected]> >>> wrote: >>> >>>> Hi all!! >>>> >>>> Basically that's what I'm trying to do. I'm building a pipeline that >>>> has a refreshing, multimap, side input (BQ schemas) that then I apply to >>>> the main stream of data (records that are ultimately saved to the >>>> corresponding BQ table). >>>> >>>> My job, although being of streaming nature, runs on the global window, >>>> and I want to unit test that the side input refreshes and that the updates >>>> are successfully applied. >>>> >>>> I'm using scio and I can't seem to simulate that refreshing behaviour. >>>> These are the relevant bits of the code: >>>> https://gist.github.com/calonso/87d392a65079a66db75a78cb8d80ea98 >>>> >>>> The way I see understand it, the side collection is refreshed before >>>> accessing it so when accessed, it already contains the final (updated) >>>> snapshot of the schemas, is that true? In which case, how can I simulate >>>> that synchronisation? I'm using processing times as I thought that could be >>>> the way to go, but obviously something is wrong there. >>>> >>>> Many thanks!! >>>> >>> -- > Got feedback? go/pabloem-feedback > <https://goto.google.com/pabloem-feedback> >
