[ 
https://issues.apache.org/jira/browse/BEAM-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang resolved BEAM-6723.
-------------------------------
    Resolution: Fixed

> Reshuffle in streaming pipeline does not work on dataflowrunner python
> ----------------------------------------------------------------------
>
>                 Key: BEAM-6723
>                 URL: https://issues.apache.org/jira/browse/BEAM-6723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-py-core
>            Reporter: Brecht Coghe
>            Priority: Major
>             Fix For: 2.16.0
>
>
> When using a Reshuffle after windowing, the dataflowrunner gives the 
> following error:
> "org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow"
> This makes it impossible to prevent fusion of operations and to distribute 
> our workload nicely.
>  
> For more context, see following post where [~robertwb] linked this Jira board:
> [https://stackoverflow.com/questions/54764081/google-dataflow-streaming-pipeline-is-not-distributing-workload-over-several-wor]
> Quick summary: Our use case is actually video analytics. We want to use the 
> windowing to get small intervals of videos and the grouping to group per 
> video stream. The group by key is thus a game id. What comes out the window 
> and grouping is several windows within one game so with 1 game_id key and 
> this does not distribute over several workers. The proposed workaround would 
> be to add the window_id to the key after windowing, then reshuffle and then 
> the processing would be able to run in parallel per window in one 
> videostream. Can someone confirm this approach (if reshuffle would work).
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to