Yes, apparently it might have been an error related to the Dataflow runner, Google is investigating the case. They gave a few suggestions that solved the issue but I am not sure which was the real solution and why.
On Wed, 24 Feb 2021 at 03:11, Yichi Zhang <[email protected]> wrote: > Reshuffle doesn't change your windowing or grouping, it simply > redistributes the elements to different workers. The output should match > the input of the Reshuffle step. Are you seeing fewer elements coming out > of Reshuffle comparing to the input? > > On Wed, Feb 17, 2021 at 9:11 AM Manninger, Matyas < > [email protected]> wrote: > >> Dear Beam users, >> >> I have a problem running a python pipeline in Dataflow. Because of many >> side inputs and a complicated architecture Google told us that their >> optimization algorithm gets messed up and adding reshuffle to the pipeline >> solves the issue. Unfortunately, it seems like the Reshuffle step is not >> working properly. I added a 60 sec fixed window in front of it as this is a >> streaming pipeline. It seems like elements get added to the step but they >> remain grouped or something like that as there are only a very few elements >> coming out of the step. Any ideas what I might be doing wrong? The code is >> very long and complicated, I also wouldn't share it, but are there any >> typical mistakes regarding the reshuffling? >> >> Thanks for any tips, >> Matyas >> >
