Re: Unit testing stateful DoFn

Tudor Plugaru Thu, 02 Dec 2021 02:07:35 -0800

Hi,
hmm, ok, I will try this approach then.
Thanks for the suggestion.
Tudor


On Wed, Dec 1, 2021 at 7:47 PM Luke Cwik <[email protected]> wrote:

> The purpose of pipeline/transform level testing is to verify outputs based
> upon inputs and to not check the internal state of the transform(s).
>
> For the example that you linked, it would make sense to create a test with
> inputs that would cause the timer to fire and clear state and then some
> more inputs that would produce output and the output would only be correct
> if the state was cleared because of the timer.
>
> For the timer value scenario, create the inputs that would cause the
> specific scenario to happen and then add more inputs based upon what makes
> setting the timer unique such that output is produced that would only be
> correct had that timer had a specific value.
>
> On Wed, Dec 1, 2021 at 9:32 AM Tudor Plugaru <[email protected]> wrote:
>
>> I know about TestStream and I am using it, but, for example, I want to
>> test a use case that the timer callback is being called once the watermark
>> passes the set time in the timer. Like in this test [1] for example, I want
>> to be able to have something like assert bag_state == None at the end of
>> the test. Is this possible? As most of the tests from that module are
>> returning specific values from time callbacks and then the tests assert
>> that those values are being returned, but in a real use case, you don't
>> necessarily return values from timer callbacks.
>>
>> Another use case is when the time is set only in specific scenarios, how
>> can I test what the timer value is?
>>
>> Hope it makes sense what I am describing.
>>
>> [1]
>> https://github.com/apache/beam/blob/8e217ea0d1f383ef5033ef507b14d01edf9c67e6/sdks/python/apache_beam/transforms/userstate_test.py#L487
>>
>> On Wed, Dec 1, 2021 at 7:21 PM Luke Cwik <[email protected]> wrote:
>>
>>> That should have been "TestStream [2, 3, 4]"
>>>
>>> On Wed, Dec 1, 2021 at 9:20 AM Luke Cwik <[email protected]> wrote:
>>>
>>>> There is some good information about testing in the Apache Beam
>>>> documentation[1] about how you want to test the transforms/pipeline instead
>>>> of the DoFn.
>>>>
>>>> For your use case, TestStream [1, 2, 3] is your best bet combined with
>>>> the above advice about transform/pipeline level testing. TestStream is used
>>>> to simulate ingestion of data and allows control of watermark and
>>>> processing time advancement.
>>>>
>>>> 1: https://beam.apache.org/documentation/pipelines/test-your-pipeline/
>>>> 2: https://beam.apache.org/blog/test-stream/
>>>> 3:
>>>> https://medium.com/@asitkovets/testing-in-apache-beam-part-2-stream-2a9950ba2bc7
>>>> 4:
>>>> https://github.com/apache/beam/blob/8e217ea0d1f383ef5033ef507b14d01edf9c67e6/sdks/python/apache_beam/transforms/deduplicate_test.py#L109
>>>>
>>>>
>>>> On Wed, Dec 1, 2021 at 1:07 AM Tudor Plugaru <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> What is the best approach in unit testing a stateful DoFn? I've looked
>>>>> over the userstate_test.py in Beam repo, but those examples do not really
>>>>> apply to our case. In those tests, the DoFn used for testing are returning
>>>>> values from timer callbacks which does not really happen in reality.
>>>>> I am more interested in testing if a timer was triggered after the
>>>>> watermark advanced, or what is the state bag content at a specific time.
>>>>>
>>>>> Actually it would really be nice to have some kind of documentation
>>>>> regarding testing and best practices in writing unit/integration tests for
>>>>> Beam pipelines.
>>>>>
>>>>> Thanks,
>>>>> Tudor
>>>>>
>>>>

Re: Unit testing stateful DoFn

Reply via email to