Hi, hmm, ok, I will try this approach then. Thanks for the suggestion. Tudor
On Wed, Dec 1, 2021 at 7:47 PM Luke Cwik <[email protected]> wrote: > The purpose of pipeline/transform level testing is to verify outputs based > upon inputs and to not check the internal state of the transform(s). > > For the example that you linked, it would make sense to create a test with > inputs that would cause the timer to fire and clear state and then some > more inputs that would produce output and the output would only be correct > if the state was cleared because of the timer. > > For the timer value scenario, create the inputs that would cause the > specific scenario to happen and then add more inputs based upon what makes > setting the timer unique such that output is produced that would only be > correct had that timer had a specific value. > > On Wed, Dec 1, 2021 at 9:32 AM Tudor Plugaru <[email protected]> wrote: > >> I know about TestStream and I am using it, but, for example, I want to >> test a use case that the timer callback is being called once the watermark >> passes the set time in the timer. Like in this test [1] for example, I want >> to be able to have something like assert bag_state == None at the end of >> the test. Is this possible? As most of the tests from that module are >> returning specific values from time callbacks and then the tests assert >> that those values are being returned, but in a real use case, you don't >> necessarily return values from timer callbacks. >> >> Another use case is when the time is set only in specific scenarios, how >> can I test what the timer value is? >> >> Hope it makes sense what I am describing. >> >> [1] >> https://github.com/apache/beam/blob/8e217ea0d1f383ef5033ef507b14d01edf9c67e6/sdks/python/apache_beam/transforms/userstate_test.py#L487 >> >> On Wed, Dec 1, 2021 at 7:21 PM Luke Cwik <[email protected]> wrote: >> >>> That should have been "TestStream [2, 3, 4]" >>> >>> On Wed, Dec 1, 2021 at 9:20 AM Luke Cwik <[email protected]> wrote: >>> >>>> There is some good information about testing in the Apache Beam >>>> documentation[1] about how you want to test the transforms/pipeline instead >>>> of the DoFn. >>>> >>>> For your use case, TestStream [1, 2, 3] is your best bet combined with >>>> the above advice about transform/pipeline level testing. TestStream is used >>>> to simulate ingestion of data and allows control of watermark and >>>> processing time advancement. >>>> >>>> 1: https://beam.apache.org/documentation/pipelines/test-your-pipeline/ >>>> 2: https://beam.apache.org/blog/test-stream/ >>>> 3: >>>> https://medium.com/@asitkovets/testing-in-apache-beam-part-2-stream-2a9950ba2bc7 >>>> 4: >>>> https://github.com/apache/beam/blob/8e217ea0d1f383ef5033ef507b14d01edf9c67e6/sdks/python/apache_beam/transforms/deduplicate_test.py#L109 >>>> >>>> >>>> On Wed, Dec 1, 2021 at 1:07 AM Tudor Plugaru <[email protected]> wrote: >>>> >>>>> Hi, >>>>> What is the best approach in unit testing a stateful DoFn? I've looked >>>>> over the userstate_test.py in Beam repo, but those examples do not really >>>>> apply to our case. In those tests, the DoFn used for testing are returning >>>>> values from timer callbacks which does not really happen in reality. >>>>> I am more interested in testing if a timer was triggered after the >>>>> watermark advanced, or what is the state bag content at a specific time. >>>>> >>>>> Actually it would really be nice to have some kind of documentation >>>>> regarding testing and best practices in writing unit/integration tests for >>>>> Beam pipelines. >>>>> >>>>> Thanks, >>>>> Tudor >>>>> >>>>
