[ https://issues.apache.org/jira/browse/BEAM-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yichi Zhang resolved BEAM-6723. ------------------------------- Resolution: Fixed > Reshuffle in streaming pipeline does not work on dataflowrunner python > ---------------------------------------------------------------------- > > Key: BEAM-6723 > URL: https://issues.apache.org/jira/browse/BEAM-6723 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core > Reporter: Brecht Coghe > Priority: Major > Fix For: 2.16.0 > > > When using a Reshuffle after windowing, the dataflowrunner gives the > following error: > "org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to > org.apache.beam.sdk.transforms.windowing.GlobalWindow" > This makes it impossible to prevent fusion of operations and to distribute > our workload nicely. > > For more context, see following post where [~robertwb] linked this Jira board: > [https://stackoverflow.com/questions/54764081/google-dataflow-streaming-pipeline-is-not-distributing-workload-over-several-wor] > Quick summary: Our use case is actually video analytics. We want to use the > windowing to get small intervals of videos and the grouping to group per > video stream. The group by key is thus a game id. What comes out the window > and grouping is several windows within one game so with 1 game_id key and > this does not distribute over several workers. The proposed workaround would > be to add the window_id to the key after windowing, then reshuffle and then > the processing would be able to run in parallel per window in one > videostream. Can someone confirm this approach (if reshuffle would work). > -- This message was sent by Atlassian Jira (v8.3.2#803003)